Dataset Collector #1, Bykova Ekaterina - 19FPL2 #38

ffmiil · 2021-03-06T11:05:10Z

No description provided.

dmitry-uraev

It is right time to implement scrapper. The deadline is coming.

dmitry-uraev

Good for now. Waiting for green PR

dmitry-uraev · 2021-03-11T19:49:48Z

constants.py

@@ -7,3 +7,6 @@
 PROJECT_ROOT = os.path.dirname(os.path.realpath(__file__))
 ASSETS_PATH = os.path.join(PROJECT_ROOT, 'tmp', 'articles')
 CRAWLER_CONFIG_PATH = os.path.join(PROJECT_ROOT, 'crawler_config.json')
+HEADERS = {


dmitry-uraev · 2021-03-11T19:49:59Z

crawler_config.json

-    "max_number_articles_to_get_from_one_seed": 0
+    "base_urls": ["https://express-kamchatka1.ru/sobytiya.html"],
+    "total_articles_to_find_and_parse": 15,
+    "max_number_articles_to_get_from_one_seed": 15


dmitry-uraev · 2021-03-11T19:50:22Z

scrapper.py

+from datetime import datetime
+from bs4 import BeautifulSoup
+from article import Article
+from constants import CRAWLER_CONFIG_PATH


you may import them in one line

dmitry-uraev · 2021-03-11T19:51:06Z

scrapper.py

+            response = requests.get(url, headers=HEADERS)
+            if response:
+                content = response.text
+                links = self._extract_url(BeautifulSoup(content, 'html.parser'))


you may try "lxml" option here

dmitry-uraev · 2021-03-11T19:52:53Z

Nice commits namings BTW)

dmitry-uraev · 2021-03-11T19:53:09Z

'puk'

dmitry-uraev

I see you have different sites specified in crawler and in our table: http://express-kamchatka1.ru/ and https://www.e1.ru/news/ is this correct?

ao

dae378d

ffmiil changed the title ao Dataset Collector #1, Bykova Ekaterina - 19FPL2 Mar 6, 2021

dmitry-uraev added the Uraev D.Y. label Mar 6, 2021

dmitry-uraev self-assigned this Mar 6, 2021

dmitry-uraev suggested changes Mar 9, 2021

View reviewed changes

dmitry-uraev added the Changes required Reviewer has comments you need to apply. Once you are ready, replace it with Review Required label Mar 9, 2021

ffmiil added 6 commits March 11, 2021 07:57

try

277f0b9

maybe...

4782ec2

please

5eb7668

op,otkat

e8c318a

moya popitka nomer pyat

425c480

meow'

739dcee

ffmiil requested a review from dmitry-uraev March 11, 2021 12:22

ffmiil added 3 commits March 11, 2021 15:32

fuki-mazfuki

6d5aa10

uzhas

d174872

'puk'

a6a5e24

dmitry-uraev suggested changes Mar 11, 2021

View reviewed changes

dmitry-uraev and others added 10 commits March 12, 2021 19:08

Merge remote-tracking branch 'upstream/main' into HEAD

4da2e4e

zzz

747f0ec

no coment'

a25e84a

f

144a74c

uspeshno?

d3382ba

uspeh

57a712d

test

89ff700

popitaemsya snova

3d3c72d

n

9647f82

g

1a9d86c

dmitry-uraev added the Missed crawler deadline label Mar 16, 2021

ffmiil added 3 commits March 21, 2021 19:38

p

a3ac3f1

pp

ee1e496

goo

e651fd0

ffmiil removed the Changes required Reviewer has comments you need to apply. Once you are ready, replace it with Review Required label Mar 21, 2021

dmitry-uraev and others added 13 commits March 26, 2021 21:19

Merge remote-tracking branch 'upstream/main' into HEAD

2249878

start pipe

e436895

start pip

a702ced

target score 6

ea8cbae

pymorphy try

0411522

fix lint

d3b073c

pls

70e0459

maybe

0f7d4a9

pofig

77ec680

fix scrapper lint

b225c89

may be

5725b05

pp

c9f1c59

is this win?

3b4bdb8

ffmiil added the Review Required You are ready for next iteration of review label Apr 2, 2021

ffmiil requested a review from dmitry-uraev April 2, 2021 14:13

dmitry-uraev added Changes required Reviewer has comments you need to apply. Once you are ready, replace it with Review Required and removed Review Required You are ready for next iteration of review labels Apr 2, 2021

dmitry-uraev reviewed Apr 2, 2021

View reviewed changes

dmitry-uraev added 🏆 Pipeline accepted 🕷️Crawler accepted and removed Changes required Reviewer has comments you need to apply. Once you are ready, replace it with Review Required Missed crawler deadline labels Apr 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset Collector #1, Bykova Ekaterina - 19FPL2 #38

Dataset Collector #1, Bykova Ekaterina - 19FPL2 #38

ffmiil commented Mar 6, 2021

dmitry-uraev left a comment

dmitry-uraev left a comment

dmitry-uraev Mar 11, 2021

dmitry-uraev Mar 11, 2021

dmitry-uraev Mar 11, 2021

dmitry-uraev Mar 11, 2021

dmitry-uraev commented Mar 11, 2021

dmitry-uraev commented Mar 11, 2021

dmitry-uraev left a comment

Dataset Collector #1, Bykova Ekaterina - 19FPL2 #38

Are you sure you want to change the base?

Dataset Collector #1, Bykova Ekaterina - 19FPL2 #38

Conversation

ffmiil commented Mar 6, 2021

dmitry-uraev left a comment

Choose a reason for hiding this comment

dmitry-uraev left a comment

Choose a reason for hiding this comment

dmitry-uraev Mar 11, 2021

Choose a reason for hiding this comment

dmitry-uraev Mar 11, 2021

Choose a reason for hiding this comment

dmitry-uraev Mar 11, 2021

Choose a reason for hiding this comment

dmitry-uraev Mar 11, 2021

Choose a reason for hiding this comment

dmitry-uraev commented Mar 11, 2021

dmitry-uraev commented Mar 11, 2021

dmitry-uraev left a comment

Choose a reason for hiding this comment