Group13 consists of: Mohammadreza Mowlai, Giulio Rago, Zhibek Aben
The repository consists of:
-
collector.py
: a python file that contains the line of code needed to collect our data from thehtml
page and Wikipedia. -
collector_utils.py
: a python file that stores the function we used incollector.py
. -
parser.py
: a python file that contains the line of code needed to parse the entire collection ofhtml
pages and save those intsv
files. -
parser_utils.py
: a python file that gathers the function we used inparser.py
. -
index.py
: a python file that once executed generate the indexes of the Search engines. -
index_utils.py
: a python file that contains the functions we used for creating indexes. -
utils.py
: a python file that gather functions we needed in more than one of the previous files like (collector
,parser
, etc.) -
main.py
: a python file that once executed build up the search engine. -
exercise_4.py
: python file that contains the implementation of the algorithm that solves problem 4. -
main.ipynb
: a Jupyter notebook explaines the strategies we adopted solving the homework.