Skip to content

Commit

Permalink
Refactor the Transformers and TransformersVision models
Browse files Browse the repository at this point in the history
  • Loading branch information
Robin Picard committed Feb 20, 2025
1 parent cac20af commit ff6366c
Show file tree
Hide file tree
Showing 9 changed files with 759 additions and 1,334 deletions.
150 changes: 64 additions & 86 deletions docs/reference/models/transformers.md
Original file line number Diff line number Diff line change
@@ -1,96 +1,63 @@
# transformers
# Transformers


!!! Installation

You need to install the `transformer`, `datasets` and `torch` libraries to be able to use these models in Outlines, or alternatively:
You need to install the `transformer` library to be able to use these models in Outlines, or alternatively:

```bash
pip install "outlines[transformers]"
```

## Create a `Transformers` model

Outlines provides an integration with the `torch` implementation of causal models in the [transformers][transformers] library. You can initialize the model by passing its name:

The only mandatory argument to instantiate a `Transformers` model is the name of the model to use.
```python
from outlines import models

model = models.transformers("microsoft/Phi-3-mini-4k-instruct", device="cuda")
model = models.Transformers("microsoft/Phi-3-mini-4k-instruct")
```

If you need more fine-grained control you can also initialize the model and tokenizer separately:
The model name must be a valid `transformers` model name. You can find a list of all in the HuggingFace library [here](https://huggingface.co/models).

When instantiating a `Transformers` model as such, the class creates a model from the transformers libray using the class `AutoModelForCausalLM` by default (`transformers.AutoModelForCausalLM.from_pretrained(model_name, **model_kwargs)`).

You can also provide keyword arguments in an optional `model_kwargs` parameter. Those will be passed to the `from_pretrained` method of the model class. One such argument is `device_map`, which allows you to specify the device on which the model will be loaded.

For instance:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from outlines import models

llm = AutoModelForCausalLM.from_pretrained("gpt2", output_attentions=True)
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = models.Transformers(llm, tokenizer)
model = models.Transformers("microsoft/Phi-3-mini-4k-instruct", model_kwargs={"device_map": "cuda"})
```

## Using Logits Processors

There are two ways to use Outlines Structured Generation with HuggingFace Transformers:
## Alternative model classes

1. Use Outlines generation wrapper, `outlines.models.transformers`
2. Use `OutlinesLogitsProcessor` with `transformers.AutoModelForCausalLM`

Outlines supports a myriad of logits processors for structured generation. In these example, we will use the `RegexLogitsProcessor` which guarantees generated text matches the specified pattern.

### Using `outlines.models.transformers`
If the model you want to use is not compatible with `AutoModelForCausalLM`, you must provide a value for the `model_class` parameter. This value must be a valid `transformers` model class.

For instance:
```python
import outlines

time_regex_pattern = r"(0?[1-9]|1[0-2]):[0-5]\d\s?(am|pm)?"

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct", device="cuda")
generator = outlines.generate.regex(model, time_regex_pattern)
from outlines import models
from transformers import AutoModelForSeq2SeqLM

output = generator("The the best time to visit a dentist is at ")
print(output)
# 2:30 pm
model = models.Transformers("facebook/bart-large", model_class=AutoModelForSeq2SeqLM)
```

### Using models initialized via the `transformers` library

```python
import outlines
import transformers

When you instantiate a `Transformers` model, the class also creates a `Tokenizer` instance from the `AutoTokenizer` class. You can provide keyword arguments in an optional `tokenizer_kwargs` parameter. Those will be passed to the `from_pretrained` method of the tokenizer class as such: `tokenizer_class.from_pretrained(model_name, **tokenizer_kwargs)`.

model_uri = "microsoft/Phi-3-mini-4k-instruct"
Similarly, if your model is not compatible with `AutoTokenizer`, you must provide a value for the `tokenizer_class` parameter.

outlines_tokenizer = outlines.models.TransformerTokenizer(
transformers.AutoTokenizer.from_pretrained(model_uri)
)
phone_number_logits_processor = outlines.processors.RegexLogitsProcessor(
"\\+?[1-9][0-9]{7,14}", # phone number pattern
outlines_tokenizer,
)

generator = transformers.pipeline('text-generation', model=model_uri)
```python
from outlines import models
from transformers import T5ForConditionalGeneration, T5Tokenizer

output = generator(
"Jenny gave me her number it's ",
logits_processor=transformers.LogitsProcessorList([phone_number_logits_processor])
model_pile_t5 = models.Transformers(
model_name="EleutherAI/pile-t5-large",
model_class=T5ForConditionalGeneration,
tokenizer_class=T5Tokenizer
)
print(output)
# [{'generated_text': "Jenny gave me her number it's 2125550182"}]
# not quite 8675309 what we expected, but it is a valid phone number
```

[transformers]: https://github.com/huggingface/transformers


## Alternative Model Classes

`outlines.models.transformers` defaults to `transformers.AutoModelForCausalLM`, which is the appropriate class for most standard large language models, including Llama 3, Mistral, Phi-3, etc.

However other variants with unique behavior can be used as well by passing the appropriate class.

### Mamba

[Mamba](https://github.com/state-spaces/mamba) is a transformers alternative which employs memory efficient, linear-time decoding.
Expand All @@ -100,25 +67,14 @@ To use Mamba with outlines you must first install the necessary requirements:
pip install causal-conv1d>=1.2.0 mamba-ssm torch transformers
```

Then you can either create an Mamba-2 Outlines model via
```python
import outlines

model = outlines.models.mamba("state-spaces/mamba-2.8b-hf")
```

or explicitly with
Then you can create an `Mamba` Outlines model via:
```python
import outlines
from transformers import MambaForCausalLM
from outlines import models

model = outlines.models.transformers(
"state-spaces/mamba-2.8b-hf",
model_class=MambaForCausalLM
)
model = models.Mamba("state-spaces/mamba-2.8b-hf", model_kwargs={"device_map": "cuda"}, tokenizer_kwargs={"padding_side": "left"})
```


Alternatively, you can use the `Transformers` class to create an `Mamba` model by providing the appropriate `model_class` and `tokenizer_class` arguments.

Read [`transformers`'s documentation](https://huggingface.co/docs/transformers/en/model_doc/mamba) for more information.

Expand All @@ -128,21 +84,43 @@ You can use encoder-decoder (seq2seq) models like T5 and BART with Outlines.

Be cautious with model selection though, some models such as `t5-base` don't include certain characters (`{`) and you may get an error when trying to perform structured generation.

T5 Example:
## Use the model to generate text

Once you have created a `Transformers` model, you can use it to generate text by calling the instance of the model.
```python
import outlines
from transformers import AutoModelForSeq2SeqLM
model("Hello, how are you?")
```

model_pile_t5 = outlines.models.transformers(
model_name="EleutherAI/pile-t5-large",
model_class=AutoModelForSeq2SeqLM,
)
You can also first create a `Generator` and then call it.
```python
from outlines import Generator

generator = Generator(model)
generator("Hello, how are you?")
```

Bart Example:
`Transformers` models typically support batching and the generation of several samples at once.

For instance:
```python
model_bart = outlines.models.transformers(
model_name="facebook/bart-large",
model_class=AutoModelForSeq2SeqLM,
)
model(["Hello, how are you?", "Respond with one word. Not more."], num_return_sequences=2, num_beams=2)
```

This would generate two sequences for each prompt, for a total of four sequences (two lists of 2 elements each in a list).

## Use the model to generate structured data

`Transformers` models can generate structured data by providing a value for the parameter `output_type` (the second positional argument of the `generate` method, right after the prompt).

Supported types include `Json`, `Choice`, `Regex` and `CFG`.

For instance:
```python
from outlines.types import Json
from pydantic import BaseModel

class Character(BaseModel):
name: str

model("Create a character with a name.", Json(Character))
```
128 changes: 65 additions & 63 deletions docs/reference/models/transformers_vision.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,33 +2,28 @@

Outlines allows seamless use of [vision models](https://huggingface.co/learn/computer-vision-course/en/unit4/multimodal-models/tasks-models-part1).

`outlines.models.transformers_vision` shares interfaces with, and is based on [outlines.models.transformers](./transformers.md).
`outlines.models.Transformers_vision` shares interfaces with, and is based on [outlines.models.Transformers](./transformers.md).

Tasks supported include
## Create a `TransformersVision` model

- image + text -> text
- video + text -> text
`TransformersVision` models inherit from `Transformers` and accept the same initialization parameters.

In addition, they also accept the optional parameters `processor_class` and `processor_kwargs`. Those are used to create a `Processor` instance that is then used to preprocess the images. By default, `AutoProcessor` is used to create the processor as such: `AutoProcessor.from_pretrained(model_name, **processor_kwargs)`.

If your model is not compatible with `AutoProcessor`, you must provide a value for the `processor_class` parameter.
For instance:
```python
from outlines import models
from transformers import CLIPModel, CLIPProcessor

## Example: Using [Llava-Next](https://huggingface.co/docs/transformers/en/model_doc/llava_next) Vision Models
model = models.TransformersVision("openai/clip-vit-base-patch32", model_class=CLIPModel, processor_class=CLIPProcessor)
```

Install dependencies
`pip install torchvision pillow flash-attn`
## Use the model to generate text from prompts and images

Create the model
```python
import outlines
from transformers import LlavaNextForConditionalGeneration

model = outlines.models.transformers_vision(
"llava-hf/llava-v1.6-mistral-7b-hf",
model_class=LlavaNextForConditionalGeneration,
device="cuda",
)
```
When calling the model, the prompt argument you provide must be a dictionary with a key `"prompts"` and a key `"images"`. The associated values must be a string or a list of strings for the prompts, and a PIL image or a list of PIL images for the images. Your prompts must include `<image>` tags tokens to indicate where the image should be inserted. There must be as many `<image>` tags as there are images.

Create convenience function to load a `PIL.Image` from URL
For easier use, we recommend you to create a convenience function to load a `PIL.Image` from URL.
```python
from PIL import Image
from io import BytesIO
Expand All @@ -39,56 +34,65 @@ def img_from_url(url):
return Image.open(img_byte_stream).convert("RGB")
```

### Describing an image

You can then call the model with your prompts and images to generate text.
```python
description_generator = outlines.generate.text(model)
description_generator(
"<image> detailed description:",
[img_from_url("https://upload.wikimedia.org/wikipedia/commons/2/25/Siam_lilacpoint.jpg")]
)
from transformers import LlavaForConditionalGeneration
from outlines import models

model = models.TransformersVision("trl-internal-testing/tiny-LlavaForConditionalGeneration", model_class=LlavaForConditionalGeneration)
prompt = {
"prompts": "<image> detailed description:",
"images": img_from_url("https://upload.wikimedia.org/wikipedia/commons/2/25/Siam_lilacpoint.jpg")
}
model(prompt)
```

> This is a color photograph featuring a Siamese cat with striking blue eyes. The cat has a creamy coat and a light eye color, which is typical for the Siamese breed. Its features include elongated ears, a long, thin tail, and a striking coat pattern. The cat is sitting in an indoor setting, possibly on a cat tower or a similar raised platform, which is covered with a beige fabric, providing a comfortable and soft surface for the cat to rest or perch. The surface of the wall behind the cat appears to be a light-colored stucco or plaster.
#### Multiple Images

To include multiple images in your prompt you simply add more `<image>` tokens to the prompt

You can include several images per prompt by adding more `<image>` tags to the prompt. Batching is also supported.
```python
image_urls = [
"https://cdn1.byjus.com/wp-content/uploads/2020/08/ShapeArtboard-1-copy-3.png", # triangle
"https://cdn1.byjus.com/wp-content/uploads/2020/08/ShapeArtboard-1-copy-11.png", # hexagon
]
description_generator = outlines.generate.text(model)
description_generator(
"<image><image>What shapes are present?",
list(map(img_from_url, image_urls)),
)
from transformers import LlavaForConditionalGeneration
from outlines import models

model = models.TransformersVision("trl-internal-testing/tiny-LlavaForConditionalGeneration", model_class=LlavaForConditionalGeneration)
prompt = {
"prompts": ["<image><image>detailed description:", "<image><image>. What animals are present?"],
"images": [
img_from_url("https://upload.wikimedia.org/wikipedia/commons/2/25/Siam_lilacpoint.jpg"),
img_from_url("https://upload.wikimedia.org/wikipedia/commons/7/71/2010-kodiak-bear-1.jpg"),
img_from_url("https://upload.wikimedia.org/wikipedia/commons/2/25/Siam_lilacpoint.jpg"),
img_from_url("https://upload.wikimedia.org/wikipedia/commons/7/71/2010-kodiak-bear-1.jpg"),
]
}
model(prompt)
```

> There are two shapes present. One shape is a hexagon and the other shape is an triangle. '
Here we have two prompts, each expecting two images. We correspondingly provide four images. This will generate two descriptions, one for each prompt.

### Use the model for structured generation

You can use the model to generate structured data by providing a value for the parameter `output_type` (the second positional argument of the `generate` method, right after the prompt).

### Classifying an Image
Supported types include `Json`, `Choice`, `Regex` and `CFG`.

For instance to do classification, you can use the `Regex` type:
```python
pattern = "Mercury|Venus|Earth|Mars|Saturn|Jupiter|Neptune|Uranus|Pluto"
planet_generator = outlines.generate.regex(model, pattern)
from outlines import models
from outlines.types import Regex
from transformers import LlavaForConditionalGeneration

planet_generator(
"What planet is this: <image>",
[img_from_url("https://upload.wikimedia.org/wikipedia/commons/e/e3/Saturn_from_Cassini_Orbiter_%282004-10-06%29.jpg")]
)
model = models.TransformersVision("trl-internal-testing/tiny-LlavaForConditionalGeneration", model_class=LlavaForConditionalGeneration)
pattern = "Mercury|Venus|Earth|Mars|Saturn|Jupiter|Neptune|Uranus|Pluto"
prompt = {
"prompts": "<image>detailed description:",
"images": img_from_url("https://upload.wikimedia.org/wikipedia/commons/e/e3/Saturn_from_Cassini_Orbiter_%282004-10-06%29.jpg"),
}
model(prompt, Regex(pattern))
```

> Saturn

### Extracting Structured Image data

Another example could be to generated a structured description of an image using the `Json` type:
```python
from outlines import models
from pydantic import BaseModel
from transformers import LlavaForConditionalGeneration
from typing import List, Optional

class ImageData(BaseModel):
Expand All @@ -97,17 +101,15 @@ class ImageData(BaseModel):
object_list: List[str]
is_photo: bool

image_data_generator = outlines.generate.json(model, ImageData)

image_data_generator(
"<image> detailed JSON metadata:",
[img_from_url("https://upload.wikimedia.org/wikipedia/commons/9/98/Aldrin_Apollo_11_original.jpg")]
)
model = models.TransformersVision("trl-internal-testing/tiny-LlavaForConditionalGeneration", model_class=LlavaForConditionalGeneration)
pattern = "Mercury|Venus|Earth|Mars|Saturn|Jupiter|Neptune|Uranus|Pluto"
prompt = {
"prompts": "<image>detailed description:",
"images": img_from_url("https://upload.wikimedia.org/wikipedia/commons/e/e3/Saturn_from_Cassini_Orbiter_%282004-10-06%29.jpg"),
}
model(prompt, Json(ImageData))
```

> `ImageData(caption='An astronaut on the moon', tags_list=['moon', 'space', 'nasa', 'americanflag'], object_list=['moon', 'moon_surface', 'space_suit', 'americanflag'], is_photo=True)`

## Resources

### Choosing a model
Expand Down
6 changes: 3 additions & 3 deletions outlines/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,13 @@
from .mlxlm import MLXLM, mlxlm
from .ollama import Ollama
from .openai import AzureOpenAI, OpenAI
from .transformers import Transformers, TransformerTokenizer, mamba, transformers
from .transformers_vision import TransformersVision, transformers_vision
from .transformers import Mamba, Transformers, TransformerTokenizer
from .transformers_vision import TransformersVision
from .vllm import VLLM, vllm

LogitsGenerator = Union[
Transformers, LlamaCpp, OpenAI, ExLlamaV2Model, MLXLM, VLLM, Ollama
]

LocalModel = LlamaCpp
LocalModel = Union[LlamaCpp, Transformers]
APIModel = Union[AzureOpenAI, OpenAI, Anthropic, Gemini, Ollama]
Loading

0 comments on commit ff6366c

Please sign in to comment.