Skip to content

Commit

Permalink
python: documentation update and typing improvements (nomic-ai#2129)
Browse files Browse the repository at this point in the history
Key changes:
* revert "python: tweak constructor docstrings"
* docs: update python GPT4All and Embed4All documentation
* breaking: require keyword args to GPT4All.generate

Signed-off-by: Jared Van Bortel <[email protected]>
  • Loading branch information
cebtenzzre authored Mar 19, 2024
1 parent f301514 commit a1bb608
Show file tree
Hide file tree
Showing 9 changed files with 300 additions and 251 deletions.
8 changes: 4 additions & 4 deletions gpt4all-backend/llamamodel.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -692,9 +692,9 @@ void LLamaModel::embed(
return "unsupported dimensionality " + std::to_string(dimensionality) + " for model " + modelName;
};
if (!spec->matryoshkaCapable)
throw std::logic_error(msg() + " (supported: " + std::to_string(n_embd) + ")");
throw std::out_of_range(msg() + " (supported: " + std::to_string(n_embd) + ")");
if (dimensionality == 0 || dimensionality > n_embd)
throw std::logic_error(msg() + " (recommended: " + spec->recommendedDims + ")");
throw std::out_of_range(msg() + " (recommended: " + spec->recommendedDims + ")");
}

if (!prefix) {
Expand All @@ -709,7 +709,7 @@ void LLamaModel::embed(
{
std::stringstream ss;
ss << std::quoted(*prefix) << " is not a valid task type for model " << modelName;
throw std::logic_error(ss.str());
throw std::invalid_argument(ss.str());
}

embedInternal(texts, embeddings, *prefix, dimensionality, doMean, atlas, spec);
Expand Down Expand Up @@ -763,7 +763,7 @@ void LLamaModel::embedInternal(
tokenize(text, inp, false);
if (atlas && inp.size() > atlasMaxLength) {
if (doMean) {
throw std::logic_error(
throw std::length_error(
"length of text at index " + std::to_string(i) + " is " + std::to_string(inp.size()) +
" tokens which exceeds limit of " + std::to_string(atlasMaxLength)
);
Expand Down
2 changes: 1 addition & 1 deletion gpt4all-bindings/python/docs/gpt4all_cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ The GPT4All command-line interface (CLI) is a Python script which is built on to
package. The source code, README, and local build instructions can be found
[here][repo-bindings-cli].

[docs-bindings-python]: gpt4all_python.html
[docs-bindings-python]: gpt4all_python.md
[repo-bindings-python]: https://github.com/nomic-ai/gpt4all/tree/main/gpt4all-bindings/python
[repo-bindings-cli]: https://github.com/nomic-ai/gpt4all/tree/main/gpt4all-bindings/cli
[typer]: https://typer.tiangolo.com/
Expand Down
34 changes: 0 additions & 34 deletions gpt4all-bindings/python/docs/gpt4all_modal.md

This file was deleted.

189 changes: 51 additions & 138 deletions gpt4all-bindings/python/docs/gpt4all_python.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,30 +8,22 @@ The source code and local build instructions can be found [here](https://github.
pip install gpt4all
```

=== "GPT4All Example"
``` py
from gpt4all import GPT4All
model = GPT4All("orca-mini-3b-gguf2-q4_0.gguf")
output = model.generate("The capital of France is ", max_tokens=3)
print(output)
```
=== "Output"
```
1. Paris
```
``` py
from gpt4all import GPT4All
model = GPT4All("orca-mini-3b-gguf2-q4_0.gguf")
```

This will:

- Instantiate `GPT4All`, which is the primary public API to your large language model (LLM).
- Automatically download the given model to `~/.cache/gpt4all/` if not already present.
- Through `model.generate(...)` the model starts working on a response. There are various ways to
steer that process. Here, `max_tokens` sets an upper limit, i.e. a hard cut-off point to the output.

Read further to see how to chat with this model.

### Chatting with GPT4All
Local LLMs can be optimized for chat conversations by reusing previous computational history.

Use the GPT4All `chat_session` context manager to hold chat conversations with the model.
### Chatting with GPT4All
To start chatting with a local LLM, you will need to start a chat session. Within a chat session, the model will be
prompted with the appropriate template, and history will be preserved between successive calls to `generate()`.

=== "GPT4All Example"
``` py
Expand Down Expand Up @@ -72,15 +64,19 @@ Use the GPT4All `chat_session` context manager to hold chat conversations with t
]
```

When using GPT4All models in the `chat_session` context:
When using GPT4All models in the `chat_session()` context:

- Consecutive chat exchanges are taken into account and not discarded until the session ends; as long as the model has capacity.
- Internal K/V caches are preserved from previous conversation history, speeding up inference.
- The model is given a system and prompt template which make it chatty. Depending on `allow_download=True` (default),
it will obtain the latest version of [models2.json] from the repository, which contains specifically tailored templates
for models. Conversely, if it is not allowed to download, it falls back to default templates instead.
- A system prompt is inserted into the beginning of the model's context.
- Each prompt passed to `generate()` is wrapped in the appropriate prompt template. If you pass `allow_download=False`
to GPT4All or are using a model that is not from the official models list, you must pass a prompt template using the
`prompt_template` parameter of `chat_session()`.

NOTE: If you do not use `chat_session()`, calls to `generate()` will not be wrapped in a prompt template. This will
cause the model to *continue* the prompt instead of *answering* it. When in doubt, use a chat session, as many newer
models are designed to be used exclusively with a prompt template.

[models2.json]: https://github.com/nomic-ai/gpt4all/blob/main/gpt4all-chat/metadata/models2.json
[models3.json]: https://github.com/nomic-ai/gpt4all/blob/main/gpt4all-chat/metadata/models3.json


### Streaming Generations
Expand All @@ -91,13 +87,14 @@ To interact with GPT4All responses as the model generates, use the `streaming=Tr
from gpt4all import GPT4All
model = GPT4All("orca-mini-3b-gguf2-q4_0.gguf")
tokens = []
for token in model.generate("The capital of France is", max_tokens=20, streaming=True):
tokens.append(token)
with model.chat_session():
for token in model.generate("What is the capital of France?", streaming=True):
tokens.append(token)
print(tokens)
```
=== "Output"
```
[' Paris', ' is', ' a', ' city', ' that', ' has', ' been', ' a', ' major', ' cultural', ' and', ' economic', ' center', ' for', ' over', ' ', '2', ',', '0', '0']
[' The', ' capital', ' of', ' France', ' is', ' Paris', '.']
```


Expand Down Expand Up @@ -131,20 +128,11 @@ generation; be sure to review all their descriptions.
The model folder can be set with the `model_path` parameter when creating a `GPT4All` instance. The example below is
is the same as if it weren't provided; that is, `~/.cache/gpt4all/` is the default folder.

=== "GPT4All Model Folder Example"
``` py
from pathlib import Path
from gpt4all import GPT4All
model = GPT4All(model_name='orca-mini-3b-gguf2-q4_0.gguf',
model_path=(Path.home() / '.cache' / 'gpt4all'),
allow_download=False)
response = model.generate('my favorite 3 fruits are:', temp=0)
print(response)
```
=== "Output"
```
My favorite three fruits are apples, bananas and oranges.
```
``` py
from pathlib import Path
from gpt4all import GPT4All
model = GPT4All(model_name='orca-mini-3b-gguf2-q4_0.gguf', model_path=Path.home() / '.cache' / 'gpt4all')
```

If you want to point it at the chat GUI's default folder, it should be:
=== "macOS"
Expand Down Expand Up @@ -179,22 +167,20 @@ Alternatively, you could also change the module's default model directory:

``` py
from pathlib import Path
import gpt4all.gpt4all
gpt4all.gpt4all.DEFAULT_MODEL_DIRECTORY = Path.home() / 'my' / 'models-directory'
from gpt4all import GPT4All
from gpt4all import GPT4All, gpt4all
gpt4all.DEFAULT_MODEL_DIRECTORY = Path.home() / 'my' / 'models-directory'
model = GPT4All('orca-mini-3b-gguf2-q4_0.gguf')
...
```


### Managing Templates
Session templates can be customized when starting a `chat_session` context:
When using a `chat_session()`, you may customize the system prompt, and set the prompt template if necessary:

=== "GPT4All Custom Session Templates Example"
``` py
from gpt4all import GPT4All
model = GPT4All('wizardlm-13b-v1.2.Q4_0.gguf')
system_template = 'A chat between a curious user and an artificial intelligence assistant.'
system_template = 'A chat between a curious user and an artificial intelligence assistant.\n'
# many models use triple hash '###' for keywords, Vicunas are simpler:
prompt_template = 'USER: {0}\nASSISTANT: '
with model.chat_session(system_template, prompt_template):
Expand All @@ -218,111 +204,38 @@ Session templates can be customized when starting a `chat_session` context:
particles, making the sky appear blue to our eyes.
```

To do the same outside a session, the input has to be formatted manually. For example:

=== "GPT4All Templates Outside a Session Example"
``` py
model = GPT4All('wizardlm-13b-v1.2.Q4_0.gguf')
system_template = 'A chat between a curious user and an artificial intelligence assistant.'
prompt_template = 'USER: {0}\nASSISTANT: '
prompts = ['name 3 colors', 'now name 3 fruits', 'what were the 3 colors in your earlier response?']
first_input = system_template + prompt_template.format(prompts[0])
response = model.generate(first_input, temp=0)
print(response)
for prompt in prompts[1:]:
response = model.generate(prompt_template.format(prompt), temp=0)
print(response)
```
=== "Output"
```
1) Red
2) Blue
3) Green

1. Apple
2. Banana
3. Orange

The colors in my previous response are blue, green and red.
```


### Introspection
A less apparent feature is the capacity to log the final prompt that gets sent to the model. It relies on
[Python's logging facilities][py-logging] implemented in the `pyllmodel` module at the `INFO` level. You can activate it
for example with a `basicConfig`, which displays it on the standard error stream. It's worth mentioning that Python's
logging infrastructure offers [many more customization options][py-logging-cookbook].
### Without Online Connectivity
To prevent GPT4All from accessing online resources, instantiate it with `allow_download=False`. When using this flag,
there will be no default system prompt by default, and you must specify the prompt template yourself.

[py-logging]: https://docs.python.org/3/howto/logging.html
[py-logging-cookbook]: https://docs.python.org/3/howto/logging-cookbook.html
You can retrieve a model's default system prompt and prompt template with an online instance of GPT4All:

=== "GPT4All Prompt Logging Example"
=== "Prompt Template Retrieval"
``` py
import logging
from gpt4all import GPT4All
logging.basicConfig(level=logging.INFO)
model = GPT4All('nous-hermes-llama2-13b.Q4_0.gguf')
with model.chat_session('You are a geography expert.\nBe terse.',
'### Instruction:\n{0}\n\n### Response:\n'):
response = model.generate('who are you?', temp=0)
print(response)
response = model.generate('what are your favorite 3 mountains?', temp=0)
print(response)
model = GPT4All('orca-mini-3b-gguf2-q4_0.gguf')
print(repr(model.config['systemPrompt']))
print(repr(model.config['promptTemplate']))
```
=== "Output"
```py
'### System:\nYou are an AI assistant that follows instruction extremely well. Help as much as you can.\n\n'
'### User:\n{0}\n### Response:\n'
```
INFO:gpt4all.pyllmodel:LLModel.prompt_model -- prompt:
You are a geography expert.
Be terse.

### Instruction:
who are you?

### Response:

===/LLModel.prompt_model -- prompt/===
I am an AI-powered chatbot designed to assist users with their queries related to geographical information.
INFO:gpt4all.pyllmodel:LLModel.prompt_model -- prompt:
### Instruction:
what are your favorite 3 mountains?

### Response:

===/LLModel.prompt_model -- prompt/===
1) Mount Everest - Located in the Himalayas, it is the highest mountain on Earth and a significant challenge for mountaineers.
2) Kangchenjunga - This mountain is located in the Himalayas and is the third-highest peak in the world after Mount Everest and K2.
3) Lhotse - Located in the Himalayas, it is the fourth highest mountain on Earth and offers a challenging climb for experienced mountaineers.
```


### Without Online Connectivity
To prevent GPT4All from accessing online resources, instantiate it with `allow_download=False`. This will disable both
downloading missing models and [models2.json], which contains information about them. As a result, predefined templates
are used instead of model-specific system and prompt templates:
Then you can pass them explicitly when creating an offline instance:

=== "GPT4All Default Templates Example"
``` py
from gpt4all import GPT4All
model = GPT4All('ggml-mpt-7b-chat.bin', allow_download=False)
# when downloads are disabled, it will use the default templates:
print("default system template:", repr(model.config['systemPrompt']))
print("default prompt template:", repr(model.config['promptTemplate']))
print()
# even when inside a session:
with model.chat_session():
assert model.current_chat_session[0]['role'] == 'system'
print("session system template:", repr(model.current_chat_session[0]['content']))
print("session prompt template:", repr(model._current_prompt_template))
```
=== "Output"
```
default system template: ''
default prompt template: '### Human:\n{0}\n\n### Assistant:\n'
``` py
from gpt4all import GPT4All
model = GPT4All('orca-mini-3b-gguf2-q4_0.gguf', allow_download=False)

session system template: ''
session prompt template: '### Human:\n{0}\n\n### Assistant:\n'
```
system_prompt = '### System:\nYou are an AI assistant that follows instruction extremely well. Help as much as you can.\n\n'
prompt_template = '### User:\n{0}\n\n### Response:\n'

with model.chat_session(system_prompt=system_prompt, prompt_template=prompt_template):
...
```

### Interrupting Generation
The simplest way to stop generation is to set a fixed upper limit with the `max_tokens` parameter.
Expand Down
Loading

0 comments on commit a1bb608

Please sign in to comment.