-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
21 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,32 +1,42 @@ | ||
--- | ||
title: "Constraining LLM outputs" | ||
summary: "" | ||
date: 2024-1-29 | ||
draft: True | ||
tags: ['machine-learning'] | ||
title: "Hacky Prompt Engineering" | ||
summary: "Using Python-Formatted Output to Constrain LLM Responses" | ||
date: 2024-01-02 | ||
draft: False | ||
tags: ['machine-learning', 'llm'] | ||
--- | ||
|
||
LLMs can be notoriously stochastic in their output formats. The smaller the model, the more immune they are to prompts directing them to format their response in a certain way. Doesn't matter if you politely request or COMMAND THEM IN ALL-CAPS - these parrots can tend to fly to their own tune. | ||
|
||
While using llama-cpp-python, I came across Llama Grammars - a novel method of constraining the output of an LLM to a specific format. I am not sure how this exactly works under the hood, but it works! I packed two tasks into one prompt, asked it to output me a JSON, and provided a simple format that the LLM obediently followed. No fluff or filler text, and mostly correct JSON that I could json.loads into my application. | ||
Large language models (LLMs) can be unpredictable in their output formats, making it challenging to direct them to produce specific results. A list of bullet points might be numbered or asterisked, for example. Sometimes - especially with Llama 2 - they also output unnecessary filler text ("Sure! Here is the output you requested...") in a bid to sound conversational. When the output is consumed directly by a human, these inconsistencies are forgivable. When they are to be consumed by another program or within an application, parsing non-uniform outputs can be a challenge. | ||
|
||
Generating a new output schema isn't too much work, but you still have a few hoops to jump through. A simpler way to constrain the output - just tell it to write python! Not actual python code, but format its output as if it was a valid data structure returned from a function. | ||
## Llama Grammars: A Novel Approach to Constraining LLM Outputs | ||
While working with llama.cpp, I learnt about [Llama Grammars](https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md), a method that allows us to specify a strict format for the LLM's output. Although I'm not quite clear on how this method works under the hood, it (mostly) works! By providing a schema and prompting the LLM to only answer in JSON, we can obtain mostly correct JSON outputs without any fluff or filler text. | ||
|
||
The catch - constructing a new grammar file can be somewhat tricky (did you see the notation?!), and yet the LLM finds a way to stray away from the expected format. And because I don't know yet how it works, I'm hesitant to use it in my application. | ||
|
||
## Python-Formatted Output | ||
So instead of generating a new output schema, I used a simpler approach to constrain the output of the LLM. The semantic of programming languages is a schema by itself; there is only one way to represent a python list of strings. By telling the LLM to write its output as if it were a valid data structure returned from a function, we can achieve consistently-formatted outputs. For example, if we want the LLM to provide bullet points based on some context, we can prompt it to write it in a python list. | ||
|
||
Let's say you want it to write bullet points based on some context... prompt the LLM to write a python list. | ||
``` | ||
prompt = { | ||
"system": "Given a passage of text, concisely summarize the passage in simple language. Format your response as a python list of bullet points", | ||
"user": f"PASSAGE: {passage}", | ||
"output": "SUMMARY: ```python\n summary: List[str] = " | ||
} | ||
``` | ||
This ensures my application downstream doesn't have to deal with asterisks or numbers. It can directly `eval` the LLM output into a data structure. | ||
|
||
What about an ontology based on the text? Prompt the LLM to format it as a list of dicts. Give it an example too, just to make sure. | ||
Similarly, if we want an ontology based on the text, we can ask the LLM to format its output as a list of dicts. Providing an example helps ensure that the LLM understands the desired format. Here's an example prompt: | ||
``` | ||
prompt = { | ||
"system": "Write an ontology of entities contained in the passage as list. Format your response as a python list" | ||
"system": "Write an ontology of entities contained in the passage as a list. Format your response as a python list", | ||
"user": f"PASSAGE: {passage}", | ||
"output": "ONTOLOGY: ```python\n# ontology = [{'entity': 'Japan', 'class': 'country'}, {'entity': 'pizza', class: 'food'}]\n\nontology = " | ||
} | ||
``` | ||
|
||
The [Prompt Engineering Guide](https://github.com/facebookresearch/llama-recipes/blob/main/examples/Prompt_Engineering_with_Llama_2.ipynb) does something similar, by asking the LLM to output only in JSON. | ||
|
||
|
||
## Keep It Simple | ||
Llama Grammars are cool, but I think they are better suited for more elaborate outputs, or where I'm asking the LLM to do multiple tasks in a single prompt. The "hacky" prompt engineering technique using Python- or JSON-formatted outputs is a simple way to constrain the output of large language models and make it directly usable for other applications. |