About

This repo is for running inverse scaling examples. There is a colab set up for it, which you can find in the task spreadsheet.

Running on NYU

To run on NYU:

Follow these Getting Started instructions to get connected to Greene.
Follow these Singularity instructions up until Install packages with the following differences:
1. Instead of cuda11.2-cudnn8-devel-ubuntu20.04.sif, use cuda11.3.0-cudnn8-devel-ubuntu20.04.sif
2. Instead of overlay-7.5GB-300K.ext3.gz use overlay-10GB-400K.ext3
Activate the Singularity image with the overlay
1. Remember to run source /ext3/env.sh (or whatever you called it when setting up the image) to activate the Python environment.
cd to /ext3 and run git clone https://github.com/naimenz/inverse-scaling-eval-pipeline to get a copy of the code.
Run pip install . to install the inverse-scaling-eval-pipeline package.
Run the command python -m pip install torch==1.10.2+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html to install the correct version of PyTorch.

Copy the example.sbatch script included under the /ext3/inverse-scaling-eval-pipeline/scripts directory to somewhere outside the image, e.g. your home or scratch.
There are two options for pointing to your data:
1. Put your data in /ext3/inverse-scaling-eval-pipeline/data and use the option --data as in the script.
2. Put your data elsewhere and use the option --dataset-path to point to it.
For --exp-dir, give the absolute path of the directory you want the results to be saved in.
Remember to add the flag --use-gpu only for HuggingFace models (GPT-2, GPT-Neo) and to add the flag --batch-size n (with n > 1) only for OpenAI API models (GPT-3)
Submit your .sbatch file as a job with sbatch example.sbatch
Run the plotting file by activating the Singularity image and running python /ext3/inverse-scaling-eval-pipeline/eval_pipeline/plot_loss.py </path/to/results/dir>

Let me know which parts of these instructions are incorrect/unclear!

Name		Name	Last commit message	Last commit date
Latest commit History 194 Commits
data		data
data_prep		data_prep
docs		docs
eval_pipeline		eval_pipeline
raw_data		raw_data
results		results
scripts		scripts
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml