Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to use local judge llm #132

Merged
merged 4 commits into from
Mar 28, 2024
Merged

Allow to use local judge llm #132

merged 4 commits into from
Mar 28, 2024

Conversation

StarCycle
Copy link
Contributor

Deploy a local language model as the judge / choice extractor

The default setting mentioned above uses OpenAI's GPT as the judge LLM. However, you can also deploy a local judge LLM with LMDeploy.

First install:

pip install lmdeploy openai

And then deploy a local judge LLM with the single line of code. LMDeploy will automatically download the model from Huggingface. Assuming we use internlm2-chat-1_8b as the judge, port 23333, and the key sk-123456 (the key must start with "sk-" and follow with any number you like):

lmdeploy serve api_server internlm/internlm2-chat-1_8b --server-port 23333

You need to get the model name registered by LMDeploy with the following code:

from openai import OpenAI
client = OpenAI(
    api_key='sk-123456',
    base_url="http://0.0.0.0:23333/v1"
)
model_name = client.models.list().data[0].id

Now set some environment variables to tell VLMEvalKit how to use the local judge LLM. In fact, the local judge LLM mimics an online OpenAI model.

export OPENAI_API_KEY=sk-123456
export OPENAI_API_BASE=http://0.0.0.0:23333/v1/chat/completions
export LOCAL_LLM=<model_name you get>

Finally, you can run the commands in step 2 to evaluate your VLM with the local judge LLM.

Note that

  • If you hope to deploy the judge LLM in a single GPU and evaluate your VLM on other GPUs because of limited GPU memory, try CUDA_VISIBLE_DEVICES=x like
CUDA_VISIBLE_DEVICES=0 lmdeploy serve api_server internlm/internlm2-chat-1_8b --server-port 23333
CUDA_VISIBLE_DEVICES=1,2,3 torchrun --nproc-per-node=3 run.py --data HallusionBench  --model qwen_chat --verbose
  • If the local judge LLM is not good enough in following the instructions, the evaluation may fail. Please report such failures (e.g., by issues).
  • It's possible to deploy the judge LLM in different ways, e.g., use a private LLM (not from HuggingFace) or use a quantized LLM. Please refer to the LMDeploy doc. You can use any other deployment framework if they support OpenAI API.

StarCycle and others added 4 commits March 27, 2024 22:50
Allow to use a local judge llm by setting the system variable LOCAL_LLM
@kennymckormick kennymckormick merged commit ee8cb93 into open-compass:main Mar 28, 2024
1 check passed
shan23chen pushed a commit to shan23chen/VLMEvalKit that referenced this pull request Oct 3, 2024
* Use local llm

Allow to use a local judge llm by setting the system variable LOCAL_LLM

* Update Quickstart.md for local judge LLM

* run pre-commit

* Update misc.py

---------

Co-authored-by: Haodong Duan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants