The objective of this Thesis is to find the most diverse set of scientific papers from a given corpus. The diversity of a set of papers is measured by the number of different topics covered by the set. The set of papers with the highest diversity is called the megadiverse set.
Hence, we address the following problem: given a corpus of scientific papers, find the megadiverse set of papers.
-
Create conda environment with Python 3.8.19
conda create -n "myenv" python=3.8.19
-
Activate conda environment
conda activate myenv
-
Install requirements
pip install -r requirements.txt
-
Download dataset folder from here and add it to the
thesis_exp
project directory. -
Run application with these parameters in this order:
python main.py --eda
python main.py --metadata
python main.py --corpus
python main.py --eval
python main.py --lda
python main.py --umap
python main.py --entropy
python main.py --biblio