-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset Collector #1, Zelekson Daniil - 19FPL1 #43
Open
daniilzelekson
wants to merge
54
commits into
fipl-hse:main
Choose a base branch
from
daniilzelekson:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
54 commits
Select commit
Hold shift + click to select a range
9803720
Starting
daniilzelekson bd523b5
small rework
daniilzelekson ca93002
Merge remote-tracking branch 'upstream/main' into HEAD
dmitry-uraev 684b151
small rework
daniilzelekson d575815
small rework
daniilzelekson c277d90
small rework
daniilzelekson 692c72d
small rework
daniilzelekson 23699ec
small rework
daniilzelekson b47f292
small rework
daniilzelekson 4788560
small rework
daniilzelekson 598df03
small rework
daniilzelekson ed2ace3
small rework
daniilzelekson 69bb24b
small rework
daniilzelekson 2f8fdc4
small rework
daniilzelekson dc51845
small rework
daniilzelekson c1d3065
small rework
daniilzelekson 170aeb6
small rework
daniilzelekson 1955427
small rework
daniilzelekson d4fe999
small rework
daniilzelekson 451c483
small rework
daniilzelekson b240bb5
small rework
daniilzelekson 4c2bac2
small rework
daniilzelekson 180e614
small rework
daniilzelekson 5bb34c0
small rework
daniilzelekson d2965c3
small rework
daniilzelekson 778a743
small rework
daniilzelekson 261f93b
small rework
daniilzelekson 3de6397
small rework
daniilzelekson 1fa2747
small rework
daniilzelekson 62619c1
small rework
daniilzelekson c9bb755
small rework
daniilzelekson 0dc4249
small rework :)
daniilzelekson e0efdec
small rework
daniilzelekson 3441fda
small rework
daniilzelekson 8615b19
Merge remote-tracking branch 'upstream/main' into HEAD
dmitry-uraev 746bb51
Merge branch 'main' of https://github.com/daniilzelekson/2020-2-level…
daniilzelekson 853879e
very little pf work
daniilzelekson 7b37ca1
very little pf work
daniilzelekson 68265db
very little pf work
daniilzelekson d6aa3e5
very little pf work
daniilzelekson 78665d1
small rework
daniilzelekson 12c290b
small rework
daniilzelekson 3b71c8b
small rework
daniilzelekson 0814835
small rework
daniilzelekson cbd7e0c
small rework
daniilzelekson 027067c
small rework
daniilzelekson 92fb26d
small rework
daniilzelekson b467bd8
small rework
daniilzelekson 8363d33
small rework
daniilzelekson c49e415
small rework
daniilzelekson 178631a
small rework
daniilzelekson 43b58f9
small rework
daniilzelekson 0a9fbac
small rework
daniilzelekson 076ef72
small rework
daniilzelekson File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
{ | ||
"base_urls": [], | ||
"total_articles_to_find_and_parse": 0, | ||
"max_number_articles_to_get_from_one_seed": 0 | ||
"base_urls": ["https://znamia29.ru/news/17197/"], | ||
"total_articles_to_find_and_parse": 5, | ||
"max_number_articles_to_get_from_one_seed": 5 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
argon2-cffi==20.1.0 | ||
astroid==2.4.2 | ||
async-generator==1.10 | ||
attrs==20.3.0 | ||
backcall==0.2.0 | ||
beautifulsoup4==4.9.3 | ||
bleach==3.3.0 | ||
certifi==2020.4.5.1 | ||
cffi==1.14.5 | ||
chardet==3.0.4 | ||
colorama==0.4.3 | ||
DAWG-Python==0.7.2 | ||
decorator==4.4.2 | ||
defusedxml==0.7.1 | ||
docopt==0.6.2 | ||
entrypoints==0.3 | ||
et-xmlfile==1.0.1 | ||
idna==2.9 | ||
importlib-metadata==3.7.2 | ||
ipykernel==5.5.0 | ||
ipython==7.21.0 | ||
ipython-genutils==0.2.0 | ||
ipywidgets==7.6.3 | ||
jdcal==1.4.1 | ||
jedi==0.18.0 | ||
Jinja2==2.11.3 | ||
jsonschema==3.2.0 | ||
jupyter==1.0.0 | ||
jupyter-client==6.1.11 | ||
jupyter-console==6.2.0 | ||
jupyter-core==4.7.1 | ||
jupyterlab-pygments==0.1.2 | ||
jupyterlab-widgets==1.0.0 | ||
lazy-object-proxy==1.4.3 | ||
lml==0.0.9 | ||
MarkupSafe==1.1.1 | ||
mistune==0.8.4 | ||
nbclient==0.5.3 | ||
nbconvert==6.0.7 | ||
nbformat==5.1.2 | ||
nest-asyncio==1.5.1 | ||
notebook==6.2.0 | ||
numpy==1.20.1 | ||
openpyxl==3.0.3 | ||
packaging==20.9 | ||
pandocfilters==1.4.3 | ||
parso==0.8.1 | ||
pickleshare==0.7.5 | ||
prometheus-client==0.9.0 | ||
prompt-toolkit==3.0.16 | ||
pycparser==2.20 | ||
pyexcel==0.5.15 | ||
pyexcel-io==0.5.20 | ||
pyexcel-xls==0.5.8 | ||
pyexcel-xlsx==0.5.8 | ||
Pygments==2.8.1 | ||
pymorphy2==0.9.1 | ||
pymorphy2-dicts-ru==2.4.417127.4579844 | ||
pymorphy2-dicts-uk==2.4.1.1.1460299261 | ||
pyparsing==2.4.7 | ||
pyrsistent==0.17.3 | ||
python-dateutil==2.8.1 | ||
pywin32==300 | ||
pywinpty==0.5.7 | ||
pyzmq==22.0.3 | ||
qtconsole==5.0.2 | ||
QtPy==1.9.0 | ||
requests==2.23.0 | ||
Send2Trash==1.5.0 | ||
six==1.15.0 | ||
soupsieve==2.2 | ||
terminado==0.9.2 | ||
testpath==0.4.4 | ||
texttable==1.6.2 | ||
tornado==6.1 | ||
traitlets==5.0.5 | ||
typed-ast==1.4.1 | ||
typing-extensions==3.7.4.3 | ||
urllib3==1.25.8 | ||
wcwidth==0.2.5 | ||
webencodings==0.5.1 | ||
widgetsnbextension==3.5.1 | ||
wrapt==1.12.1 | ||
xlrd==1.2.0 | ||
xlwt==1.3.0 | ||
zipp==3.4.1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You used all these? Will you share with me on Monday? |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
def get_month(m): | ||
if m.find('нвар') != -1: | ||
return '01' | ||
if m.find('рал') != -1: | ||
return '02' | ||
if m.find('арт') != -1: | ||
return '03' | ||
if m.find('прел') != -1: | ||
return '04' | ||
if m.find('Ма') != -1: | ||
return '05' | ||
if m.find('Июн') != -1: | ||
return '06' | ||
if m.find('Июл') != -1: | ||
return '07' | ||
if m.find('Авгус') != -1: | ||
return '08' | ||
if m.find('Сент') != -1: | ||
return '09' | ||
if m.find('Окт') != -1: | ||
return '10' | ||
if m.find('Нояб') != -1: | ||
return '11' | ||
if m.find('Декаб') != -1: | ||
return '12' | ||
return '' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
beautifulsoup4 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. better specify version here for consistency |
||
lxml | ||
requests |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
????
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from pip freeze