Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
alopatenko authored Apr 19, 2024
1 parent ea601b2 commit a2adb57
Showing 1 changed file with 10 additions and 10 deletions.
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,16 +63,16 @@ My view on LLM Evaluation: [Deck](LLMEvaluation.pdf), and [video Analytics Vidh
- Ray/Anyscale's LLM Performance Leaderboard https://github.com/ray-project/llmperf-leaderboard (explanation: https://www.anyscale.com/blog/comparing-llm-performance-introducing-the-open-source-leaderboard-for-llm)
---
### Evaluation Software
- MTEB https://huggingface.co/spaces/mteb/leaderboard
- OpenICL Framework https://arxiv.org/abs/2303.02913
- RAGAS https://docs.ragas.io/en/stable/
- EleutherAI LLM Evaluation Harness https://github.com/EleutherAI/lm-evaluation-harness
- OpenAI Evals https://github.com/openai/evals
- ML Flow Evaluate https://mlflow.org/docs/latest/llms/llm-evaluate/index.html
- MosaicML Composer https://github.com/mosaicml/composer
- TruLens https://github.com/truera/trulens/
- BigCode Evaluation Harness https://github.com/bigcode-project/bigcode-evaluation-harness
- LLMeBench https://github.com/qcri/LLMeBench/ (see [LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking
- [MTEB](https://huggingface.co/spaces/mteb/leaderboard)
- [OpenICL Framework ](https://arxiv.org/abs/2303.02913)
- [RAGAS]( https://docs.ragas.io/en/stable/)
- [EleutherAI LLM Evaluation Harness ](https://github.com/EleutherAI/lm-evaluation-harness)
- [OpenAI Evals]( https://github.com/openai/evals)
- [ML Flow Evaluate ](https://mlflow.org/docs/latest/llms/llm-evaluate/index.html)
- [MosaicML Composer ](https://github.com/mosaicml/composer)
- [TruLens ](https://github.com/truera/trulens/)
- [BigCode Evaluation Harness ](https://github.com/bigcode-project/bigcode-evaluation-harness)
- [LLMeBench]( https://github.com/qcri/LLMeBench/) (see [LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking
](https://arxiv.org/abs/2308.04945))
---
### LLM Evaluation articles in tech media and blog posts from companies <a id="articles"></a>
Expand Down

0 comments on commit a2adb57

Please sign in to comment.