-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset Collector #1, Zelekson Daniil - 19FPL1 #43
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's time to write some code and lint&test it.
@@ -1,32 +1,35 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, finally. I was waiting for this one to be created.
scrapper.py
Outdated
|
||
class IncorrectURLError(Exception): | ||
""" | ||
Custom error | ||
""" | ||
pass | ||
# def __init__(self, ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice constructor initialization
|
||
|
||
class NumberOfArticlesOutOfRangeError(Exception): | ||
""" | ||
Custom error | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this line targeted for?
|
||
class IncorrectNumberOfArticlesError(Exception): | ||
""" | ||
Custom error | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
scrapper.py
Outdated
@@ -36,13 +39,15 @@ def find_articles(self): | |||
""" | |||
Finds articles | |||
""" | |||
pass | |||
raise IncorrectURLError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!
scrapper.py
Outdated
|
||
def get_search_urls(self): | ||
""" | ||
Returns seed_urls param | ||
""" | ||
pass | ||
return seed_urls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
??
Oh! It was an accident! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see crawler for chosen link: https://znamia29.ru/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I see different links in crawler config and our table. Can you explain?
- Please move on to pipeline.
@@ -0,0 +1,86 @@ | |||
argon2-cffi==20.1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
????
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from pip freeze
wrapt==1.12.1 | ||
xlrd==1.2.0 | ||
xlwt==1.3.0 | ||
zipp==3.4.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You used all these? Will you share with me on Monday?
mydate_framework.py
Outdated
@@ -0,0 +1,26 @@ | |||
def get_month(m): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice framework, but it is not quite a framework) It is just one module with one function
@@ -0,0 +1,3 @@ | |||
beautifulsoup4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better specify version here for consistency
scrapper.py
Outdated
@@ -27,71 +53,164 @@ class UnknownConfigError(Exception): | |||
""" | |||
|
|||
|
|||
lw = LinkWorker('', '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need it to get absolute link from relative
lw = LinkWorker('', '') | |
lw = LinkWorker('', '') |
scrapper.py
Outdated
arr = date_str.split(" ") | ||
arr[0] = arr[0][0:len(arr[0]) - 1] | ||
return arr[3] + '-' + get_month(arr[2]) + '-' + arr[1] + ' ' + arr[0] + ':00' | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you may specify particular error here
self.article.title = article_soup.find('h1').text | ||
except Exception: | ||
self.article.title = 'NOT FOUND' | ||
self.article.topics.append(self.article.title) | ||
|
||
@staticmethod | ||
def unify_date_format(date_str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good method
No description provided.