Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Eval scripts, harmfulness results, fix to eval_harmfulness, readme #109

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

Adamliu1
Copy link
Owner

@Adamliu1 Adamliu1 commented Jun 11, 2024

This PR is for scripts to evaluate tasks and evaluate harmfulness, but it includes some of my previous tasks regarding eval-harmfulness that I didn't push to the main.

Reference scripts:

  • For eval harmfulness
run_experiment_multiple_seed_cluster.sh
run_experiment_multiple_seed.sh
eval_multi_seed_cluter_submission.sh 

NOTE: Seeding still seems to have a bit of a problem in the code; I noticed I got the same results from all models after using the reproducibility method from ADL CW. But currently, this is a low priority.

  • For eval task framework (from Andrzej)
eval_results.sh
run_experiment_raw_model.sh 
run_experiment.sh schedule_experiment.qsub.sh

WIP

  • For a script that evaluates all unlearned models (from experiments) and runs eval task framework and harmfulness evaluation.
run_evals.sh 
run_task_eval_exp.sh (small modification to Andrzej's script)
run_harmfulness_eval_exp.sh (missing)

@Adamliu1 Adamliu1 changed the title [WIP] Eval scripts [WIP] Eval scripts, harmfulness results, fix to eval_harmfulness Jun 11, 2024
@Adamliu1 Adamliu1 changed the title [WIP] Eval scripts, harmfulness results, fix to eval_harmfulness [WIP] Eval scripts, harmfulness results, fix to eval_harmfulness, readme Jun 11, 2024
@Willmish Willmish mentioned this pull request Jun 25, 2024
2 tasks
@Willmish
Copy link
Collaborator

leaving open for now to have track of the experiment results done by @Adamliu1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants