This project provides a framework for benchmarking various models, including but not limited to Language Models (LLMs). The benchmarking process involves preparing datasets for classification measurement and generating classification metrics.
- Prerequisites
- Setup
- Step 1: Preparing Dataset for Classification Measurement
- Step 2: Generating Classification Metrics
- Usage
Before you begin, ensure you have met the following requirements:
- You have installed Kotlin and Jupyter Notebook.
- You have an OpenAI API key (if benchmarking LLM AgentRoutingSpec Resolver). Set the
OPENAI_API_KEY
environment variable with your OpenAI API key. - You have Ollama installed(if benchmarking Vector AgentRoutingSpec Resolver) on your local machine. The default model is "all-minilm".
-
Clone the repository:
git clone https://github.com/eclipse-lmos/lmos-router.git cd lmos-router
-
Set the
OPENAI_API_KEY
environment variable (if applicable):export OPENAI_API_KEY=your_openai_api_key
The first step involves preparing the dataset by running the LLMResolverBenchmark.kt
or VectorResolverBenchmark.kt
script. This script reads an input CSV file, processes each record to generate predictions using the specified model, and writes the results to an output CSV file.
The second step involves using the prediction file generated in Step 1 to compute classification metrics. This is done using a Jupyter Notebook.
-
Open the Jupyter Notebook:
jupyter notebook benchmarks/benchmark.ipynb
-
Follow the instructions in the notebook to load the prediction file and generate classification metrics.
-
Run the respective resolver script to generate the prediction file:
- For LLM AgentRoutingSpec Resolver:
kotlinc src/main/kotlin/llm/LLMResolverBenchmark.kt -include-runtime -d LLMResolverBenchmark.jar java -jar LLMResolverBenchmark.jar
- For Vector AgentRoutingSpec Resolver:
kotlinc src/main/kotlin/vector/VectorResolverBenchmark.kt -include-runtime -d VectorResolverBenchmark.jar java -jar VectorResolverBenchmark.jar
-
Open the Jupyter Notebook to generate classification metrics:
jupyter notebook benchmarks/benchmark.ipynb