From 1cfde5b0ff37bc7735e3e054862c81ce22d34378 Mon Sep 17 00:00:00 2001
From: Taemin Lee <persuade@gmail.com>
Date: Mon, 27 May 2024 18:21:15 +0900
Subject: [PATCH] Update documentation of OpenAI compatible server
 configuration (#1141)

Update README.md

Fixed incorrect setup for extra parameters in OpenAI compatible server configuration (see PR #1032)
---
 README.md | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index ec1bdae4e1e..d4af037d0d6 100644
--- a/README.md
+++ b/README.md
@@ -273,10 +273,12 @@ If `endpoints` are left unspecified, ChatUI will look for the model on the hoste
 
 ##### OpenAI API compatible models
 
-Chat UI can be used with any API server that supports OpenAI API compatibility, for example [text-generation-webui](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/openai), [LocalAI](https://github.com/go-skynet/LocalAI), [FastChat](https://github.com/lm-sys/FastChat/blob/main/docs/openai_api.md), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), and [ialacol](https://github.com/chenhunghan/ialacol).
+Chat UI can be used with any API server that supports OpenAI API compatibility, for example [text-generation-webui](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/openai), [LocalAI](https://github.com/go-skynet/LocalAI), [FastChat](https://github.com/lm-sys/FastChat/blob/main/docs/openai_api.md), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), and [ialacol](https://github.com/chenhunghan/ialacol) and [vllm](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html).
 
 The following example config makes Chat UI works with [text-generation-webui](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/openai), the `endpoint.baseUrl` is the url of the OpenAI API compatible server, this overrides the baseUrl to be used by OpenAI instance. The `endpoint.completion` determine which endpoint to be used, default is `chat_completions` which uses `v1/chat/completions`, change to `endpoint.completion` to `completions` to use the `v1/completions` endpoint.
 
+Parameters not supported by OpenAI (e.g., top_k, repetition_penalty, etc.) must be set in the extraBody of endpoints. Be aware that setting them in parameters will cause them to be omitted.
+
 ```
 MODELS=`[
   {
@@ -285,15 +287,17 @@ MODELS=`[
     "parameters": {
       "temperature": 0.9,
       "top_p": 0.95,
-      "repetition_penalty": 1.2,
-      "top_k": 50,
-      "truncate": 1000,
       "max_new_tokens": 1024,
       "stop": []
     },
     "endpoints": [{
       "type" : "openai",
-      "baseURL": "http://localhost:8000/v1"
+      "baseURL": "http://localhost:8000/v1",
+      "extraBody": {
+        "repetition_penalty": 1.2,
+        "top_k": 50,
+        "truncate": 1000
+      }
     }]
   }
 ]`