Update llama-loves-python.md

subramen · Jan 30, 2024 · 8c73aba · 8c73aba
1 parent 2b65c70
commit 8c73aba
Showing 1 changed file with 21 additions and 11 deletions.
diff --git a/content/posts/llama-loves-python.md b/content/posts/llama-loves-python.md
@@ -1,32 +1,42 @@
 ---
-title: "Constraining LLM outputs"
-summary: ""
-date: 2024-1-29
-draft: True
-tags: ['machine-learning']
+title: "Hacky Prompt Engineering"
+summary: "Using Python-Formatted Output to Constrain LLM Responses"
+date: 2024-01-02
+draft: False
+tags: ['machine-learning', 'llm']
 ---
 
-LLMs can be notoriously stochastic in their output formats. The smaller the model, the more immune they are to prompts directing them to format their response in a certain way. Doesn't matter if you politely request or COMMAND THEM IN ALL-CAPS - these parrots can tend to fly to their own tune.
 
-While using llama-cpp-python, I came across Llama Grammars - a novel method of constraining the output of an LLM to a specific format. I am not sure how this exactly works under the hood, but it works! I packed two tasks into one prompt, asked it to output me a JSON, and provided a simple format that the LLM obediently followed. No fluff or filler text, and mostly correct JSON that I could json.loads into my application.
+Large language models (LLMs) can be unpredictable in their output formats, making it challenging to direct them to produce specific results. A list of bullet points might be numbered or asterisked, for example. Sometimes - especially with Llama 2 - they also output unnecessary filler text ("Sure! Here is the output you requested...") in a bid to sound conversational. When the output is consumed directly by a human, these inconsistencies are forgivable. When they are to be consumed by another program or within an application, parsing non-uniform outputs can be a challenge.
 
-Generating a new output schema isn't too much work, but you still have a few hoops to jump through. A simpler way to constrain the output - just tell it to write python! Not actual python code, but format its output as if it was a valid data structure returned from a function.
+## Llama Grammars: A Novel Approach to Constraining LLM Outputs
+While working with llama.cpp, I learnt about [Llama Grammars](https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md), a method that allows us to specify a strict format for the LLM's output. Although I'm not quite clear on how this method works under the hood, it (mostly) works! By providing a schema and prompting the LLM to only answer in JSON, we can obtain mostly correct JSON outputs without any fluff or filler text. 
+
+The catch - constructing a new grammar file can be somewhat tricky (did you see the notation?!), and yet the LLM finds a way to stray away from the expected format. And because I don't know yet how it works, I'm hesitant to use it in my application. 
+
+## Python-Formatted Output
+So instead of generating a new output schema, I used a simpler approach to constrain the output of the LLM. The semantic of programming languages is a schema by itself; there is only one way to represent a python list of strings. By telling the LLM to write its output as if it were a valid data structure returned from a function, we can achieve consistently-formatted outputs. For example, if we want the LLM to provide bullet points based on some context, we can prompt it to write it in a python list. 
 
-Let's say you want it to write bullet points based on some context... prompt the LLM to write a python list. 
 ```
 prompt = {
     "system": "Given a passage of text, concisely summarize the passage in simple language. Format your response as a python list of bullet points",
     "user": f"PASSAGE: {passage}",
     "output": "SUMMARY: ```python\n summary: List[str] = "
 }
 ```
+This ensures my application downstream doesn't have to deal with asterisks or numbers. It can directly `eval` the LLM output into a data structure.
 
-What about an ontology based on the text? Prompt the LLM to format it as a list of dicts. Give it an example too, just to make sure.
+Similarly, if we want an ontology based on the text, we can ask the LLM to format its output as a list of dicts. Providing an example helps ensure that the LLM understands the desired format. Here's an example prompt:
 ```
 prompt = {
-    "system": "Write an ontology of entities contained in the passage as list. Format your response as a python list"
+    "system": "Write an ontology of entities contained in the passage as a list. Format your response as a python list",
     "user": f"PASSAGE: {passage}",
     "output": "ONTOLOGY: ```python\n# ontology = [{'entity': 'Japan', 'class': 'country'}, {'entity': 'pizza', class: 'food'}]\n\nontology = "
 }
 ```
 
+The [Prompt Engineering Guide](https://github.com/facebookresearch/llama-recipes/blob/main/examples/Prompt_Engineering_with_Llama_2.ipynb) does something similar, by asking the LLM to output only in JSON.
+
+
+## Keep It Simple
+Llama Grammars are cool, but I think they are better suited for more elaborate outputs, or where I'm asking the LLM to do multiple tasks in a single prompt. The "hacky" prompt engineering technique using Python- or JSON-formatted outputs is a simple way to constrain the output of large language models and make it directly usable for other applications.