diff --git a/docs/usage.md b/docs/usage.md index d0da98f6..550e3941 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -313,3 +313,48 @@ for _ in range(3): print("Time elapsed:", round(time.time() - start, 3)) print("Answer:", response["choices"][0]["message"]["content"]) ``` + +### Build GPTCache server + +GPTCache now supports building a server with caching and conversation capabilities. You can start a customized GPTCache service within a few lines. Here is a simple example to show how to build and interact with GPTCache server. For more detailed information, arguments, parameters, refer to [this](../examples/README.md). + +**Start server** + +Once you have GPTCache installed, you can start the server with following command: +```shell +$ gptcache_server -s 127.0.0.1 -p 8000 +``` + +**GPTCache service configuration** + +You can config the server via a YAML file, refer to the [template](../gptcache_server/dockerfiles/Dockerfile). + +**Build sevice with docker** + +Also, you can start the service in a docker container: + +Build image with the [Dockerfile](../gptcache_server/dockerfiles/Dockerfile) GPTCache provides +```shell +$ docker build -t gptcache:v1 +``` + +**Intereact with the server** + +GPTCache supports two ways of intereaction with the server: + +- With command line: + ```shell + $ curl -X PUT -d "receive a hello message" "http://localhost:4000?prompt=hello" + $ curl -X GET "http://localhost:4000?prompt=hello" + "receive a hello message" + ``` +- With python client: + ```python + >>> from gptcache import Client + + >>> client = Client(uri="http://localhost:8000") + >>> client.put("Hi", "Hi back") + 200 + >>> client.get("Hi") + 'Hi back' + ``` diff --git a/examples/README.md b/examples/README.md index 590bf6af..09e288b1 100644 --- a/examples/README.md +++ b/examples/README.md @@ -6,6 +6,7 @@ - [How to set the `similarity evaluation` interface](#How-to-set-the-similarity-evaluation-interface) - [Other cache init params](#Other-cache-init-params) - [How to run with session](#How-to-run-with-session) +- [How to build GPTCache server](#How-to-build-GPTCache-server) - [Benchmark](#Benchmark) ## How to run Visual Question Answering with MiniGPT-4 @@ -538,6 +539,98 @@ response = openai.ChatCompletion.create( And you can also run `data_manager.list_sessions` to list all the sessions. +## How to build GPTCache server + +GPTCache now supports building a server with caching and conversation capabilities. You can start a customized GPTCache service within a few lines. + +### Start server + +Once you have GPTCache installed, you can start the server with following command: +```shell +$ gptcache_server -s [HOST] -p [PORT] -d [CACHE_DIRECTORY] -f [CACHE_CONFIG_FILE] +``` +The args are optional: +- -s/--host: Specify the host to start GPTCache service, defaults to "0.0.0.0". +- -p/--port: Specify the port to access to the service, defaults to 8000. +- -d/--cache-dir: Specify the directory of the cache, defaults to `gptcache_data` folder. +- -f/--cache-config-file: Specify the YAML file to config GPTCache service, defaults to None. + +**GPTCache service configuration** + +You can config the server via a YAML file, here is an example config yaml: + +```yaml +model_src: + onnx +model_config: + # Set model kws here including `model`, `api_key` if needed +storage_config: + data_dir: + gptcache_data + manager: + sqlite,faiss + vector_params: + # Set vector storage related params here +evaluation: + distance +evaluation_kws: + # Set evaluation metric kws here +pre_function: + get_prompt +post_function: + first +config: + threshold: 0.8 + # Set other config here +``` +- model_src: The model source. +- model_config: The model name, model config, api key. +- data_dir: The cache directory. +- manager: The cache storage and vector storage. +- evaluation: The evaluation storage. +- pre_function: The pre-processing function. +- post_function: The post-processing function. + +For `model_src`, `evaluation`, `storage_config` options, check [README.md](../README.md) for more. + +**Build sevice with docker** + +Also, you can start the service in a docker container: + +- Build image with the [Dockerfile](../gptcache_server/dockerfiles/Dockerfile) GPTCache provides + ```shell + $ docker build -t gptcache:v1 + ``` +- Build and run the service in a conatiner with default port + ```shell + $ docker run -p 8000:8000 -it gptcache:v1 + ``` +- Build and run the service in a conatiner with certain port (e.g. 4000) + ```shell + $ docker run -p 4000:4000 -it gptcache:v0 gptcache_server -s 0.0.0.0 -p 4000 + ``` + +**Intereact with the server** + +GPTCache supports two ways of intereaction with the server: + +- With command line: + ```shell + $ curl -X PUT -d "receive a hello message" "http://localhost:4000?prompt=hello" + $ curl -X GET "http://localhost:4000?prompt=hello" + "receive a hello message" + ``` +- With python client: + ```python + >>> from gptcache import Client + + >>> client = Client(uri="http://localhost:8000") + >>> client.put("Hi", "Hi back") + 200 + >>> client.get("Hi") + 'Hi back' + ``` + ## [Benchmark](https://github.com/zilliztech/GPTCache/tree/main/examples/benchmark/benchmark_sqlite_faiss_onnx.py) The benchmark script about the `Sqlite + Faiss + ONNX`