- Add a configuration file or edit
configuration_experiments.py
file provided in this repository to set up the experiment configuration. - Run
python3 analysis.py configuration_experiments [experiment_name]
script (the first argument is the name of the file with the selected configuration, and the second argument, optional, is an additional name for the experiment logs folder).
If you do not want that python uses buffered output, which is useful when you want to see stdout logs as soon as they are produced, especially when the stdout is written to a file (e.g. nohup), where large buffers are used that may retain the output for a while, run python with -u
option (unbuffered).
For example, python3 -u analysis.py configuration_experiments KNN
The models developed in this study are:
- PPIIBM_first_item, Pair Prediction by Item Identification Baseline Model (first item mode)
- PPIIBM_both_items, Pair Prediction by Item Identification Baseline Model (both items mode)
The classic machine learning models that can be selected are:
- KNN, k-nearest neighbors
- LR, logistic regression classifier
- RF, random forest classifier
- SVC, Support Vector Classifier
The selectable combinations are as follows:
- Add: call AddEmbeddings().
- Multiply: call MultiplyEmbeddings()
- Concatenate: call ConcatenateEmbeddings()
- Concatenate with inverse pair: call ConcatEmbeddings(add_inverted_interactions=False)
Each combination corresponds to a callable function in the code.
A datasets.zip
file is provided in the repository, which contains the datasets required for the experiments.
- This file must be extracted either into the project's root directory or into a custom directory created by the user.
- After extracting, you need to specify the path to the documents within this ZIP file in the configuration file, ensuring that the paths match the location of the extracted files.
For example:
datasets = ['dataset_clean_wei_protbert.h5']
# or
datasets = ['data_folder/dataset_clean_wei_protbert.h5']
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
It is necessary to have Conda previously installed on your system.
Creating the Conda environment with RAPIDS:
conda create -n rapids-24.02 -c rapidsai -c conda-forge -c nvidia cuml=24.02 python=3.10 cuda-version=11.8
Once the environment is activated, you can run the Python scripts to use RAPIDS and execute them on GPU by changing the use_GPU = True
flag in the experiment configuration file.