Steer Your LLM: Stateful Control of Large Language Models
LLMs generate words like a firehose with no nozzle—powerful, yet chaotic.
The PSE isn't a filter, but a valve; turning a stochastic LLM into a stateful engine capable of driving complex interactions.
The PSE allows the model to "color within the lines".
pip install pse
or, for those in the know:
uv pip install pse
The structuring engine:
- Maintains the real-time state during the LLM's generation,
- Guarantees output structure (e.g., valid syntax, nested schemas, etc.),
- Handles ambiguity and recursion,
- Operates at the token level, striking a balance between flexibility and control,
- Enforces structure without effecting the model's creativity.
Move beyond the limitations of prompt engineering, regex, overfit fine-tuning, or index-based masking.
Feature | Prompt Engineering | Re-try if Invalid | Regex | Simple Templating | Index Based Masking | PSE |
---|---|---|---|---|---|---|
Guaranteed Structure | ❌ | ❌ | ❌ | ✅ | ✅ | |
Handles Recursion | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ |
Native token healing | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
Handles Ambiguity | ✅ | ❌ | ❌ | ❌ | ❌ | ✅ |
Flexibility (Content) | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ |
Performance | ✅ | ❌ Slow | ✅ | ✅ | ✅ | |
Integration with LLMs | ✅ | ✅ | ✅ | |||
Extensibility | ✅ | ❌ | ❌ | ❌ | ❌ | ✅ |
Stateful | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
Here's a quickstart example using the PSE with a simple schema:
import torch
from transformers import AutoTokenizer, LlamaForCausalLM
from pse.engine.structuring_engine import StructuringEngine
from pse.util.torch_mixin import PSETorchMixin
# 1. Apply the PSE mixin to your model
class PSE_Torch(PSETorchMixin, LlamaForCausalLM):
pass
# 2. Load your model and tokenizer.
model_path = "meta-llama/Llama-3.2-1B-Instruct" # any model
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = PSE_Torch.from_pretrained(model_path, torch_dtype=torch.bfloat16, device_map="auto")
# Ensure padding token is set for generation
model.config.pad_token_id = model.config.eos_token_id[0]
if model.generation_config:
model.generation_config.pad_token_id = model.config.eos_token_id[0]
# 3. Create the StructuringEngine and configure it with your schema
model.engine = StructuringEngine(tokenizer)
schema = {
"type": "object",
"properties": {"answer": {"type": "string"}},
"required": ["answer"],
}
model.engine.configure(schema)
# 4. Create your prompt.
prompt = 'Please respond with a JSON object with the key "answer" and the value "Hello, world!"'
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
# 5. Generate!
output = model.generate(input_ids, do_sample=True, top_p=None) # disable truncation samplers like top_p
# Example output without the PSE:
# Sure! Here's your answer: { "text": "Hello, world!" } Hope that helps!
#
# Example output with the PSE:
# {"answer": "Hello, world!"}
print(tokenizer.decode(output[0]))
Check out the examples/ for more examples and advanced usage:
quickstart.py
:- An interactive quickstart guide to using PSE with a simple example.
simple_demo.py
:- Basic generation with simple and advanced schemas.
thinking_answer.py
:- Demonstrates creating a custom state machine to enforce a "chain-of-thought" reasoning process.
- This example showcases how to combine different
StateMachine
types to build complex generation workflows.
PSE works with most modern LLM stacks. We provide mixins for the Transformers library (PyTorch, Flax, TensorFlow) for easy integration, and the structuring engine exposes both logits_processor
and sampler
methods, so you can graft PSE into almost any inference pipeline. Need to integrate with a custom setup? Just drop in our logit processor and sampler—no workarounds needed.
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
The pse-core
C++ library is distributed as a pre-built package.
Source code availability for pse-core will be determined at a later date.
For questions or support, please open an issue on the GitHub repository.