Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate an evaluation harness #12

Open
Vectorrent opened this issue Oct 13, 2024 · 2 comments
Open

Integrate an evaluation harness #12

Vectorrent opened this issue Oct 13, 2024 · 2 comments
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@Vectorrent
Copy link
Contributor

We will need to test our models against common, industry-standard benchmarks. Pythia is what everyone uses today:
https://github.com/EleutherAI/lm-evaluation-harness

The process will involve:

  • Updating test.py to load the model with the Transformers API
  • Overwriting it with your model's latest checkpoint
  • Then, executing Pythia tests from the same script
@Vectorrent Vectorrent added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels Oct 13, 2024
@Vectorrent
Copy link
Contributor Author

I added an eval.py script, which covers most of this work, but it doesn't seem to work right. For some reason, eval suites tend to fail almost immediately, with weird tokenization errors. I'm not sure if that's because of a poorly-trained tokenizer, or an under-trained model, or because of the custom architecture - and I'm not sure how to fix it, right now.

@Vectorrent Vectorrent changed the title Integrate the Pythia test harness Integrate an evaluation harness Nov 22, 2024
@Vectorrent
Copy link
Contributor Author

I was not aware of the evaluate library. This looks pretty nice, and since we use the Huggingface Transformers API already - would probably be easy to setup. We should try this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant