OAI compat endpoint w/ Images? #1008

bioshazard · 2024-12-27T03:41:10Z

Describe the bug

I finally got llama 3.2 11b working and /image works great with -i but using it as an OAI compat endpoint doesn't seem to accept b64 images. I get this error:

ERROR mistralrs_core::engine: prompt step - Model failed with error: Msg("The number of images in each batch [0] should be the same as the number of images [1]. The model cannot support a different number of images per patch. Perhaps you forgot a `<|image|>` tag?")

With this messages payload:

[
  {
    "role": "user",
    "content": [
      {
        "type": "text",
        "text": "who is this?"
      },
      {
        "type": "image_url",
        "image_url": {
          "url": "(b64 dataurl)"
        }
      }
    ]
  }
]

I see no mention of image_url support in the HTTP.md so maybe this is not supported for OAI compat endpoint?

https://github.com/EricLBuehler/mistral.rs/blob/master/docs/HTTP.md

Latest commit or version

Using docker: ghcr.io/ericlbuehler/mistral.rs:cuda-86-sha-b38c72c

The text was updated successfully, but these errors were encountered:

bioshazard · 2024-12-27T03:50:44Z

Hmm maybe this example gives the hint that I need to include <|image_1|>\n in my text payload?? Even if that worked, its very strange for OAI compat endpoint. Recommend inferring image_1 etc in text payload if necessary. Will look into code base in case I can contribute anywhere.

https://github.com/EricLBuehler/mistral.rs/blob/master/examples/server/phi3v_base64.py#L61

bioshazard · 2024-12-27T03:52:40Z

Yep, it works if I include <|image|> in my text payload in OpenWebUI, but I gotta say that is not the OAI compat I'd expect. I can work with this for now tho, but will leave the bug report open as there is room to more directly meet the OAI compat standard. Thanks again for this excellent project to enable me to use 3.2 11b on my 3090

bioshazard · 2024-12-29T16:48:11Z

Found the error line:

mistral.rs/mistralrs-core/src/vision_models/mllama/inputs_processor.rs

Line 285 in d28ddf9

    
           "The number of images in each batch {n_images_in_text:?} should be the same as the number of images {n_images_in_images:?}. The model cannot support a different number of images per patch. Perhaps you forgot a `<|image|>` tag?"

bioshazard · 2024-12-29T16:52:37Z

So I am thinking that mllama expects those image tokens within the text section. So my expectation for what I'd want out of this is to inject the necessary token at the oai payload processing step (rather than within this mllama processing step) if the token is not already present when an image content is provided.

This would accommodate how I have seen the schema not require the image token in the text content (which is necessary for a vanilla oai compat consumption by open web UI). And it would also accommodate existing users that already include the token.

I have never messed with rust but if someone doesn't beat me to it I might try my hand at what I'm suggesting.

bioshazard · 2024-12-29T17:21:24Z

I think I worked this out with Claude. Will attempt to add a step in here to detect and inject image tokens to the text part if not present.

mistral.rs/mistralrs-server/src/chat_completion.rs

Line 156 in d28ddf9

async fn parse_request(

…r#1008)

bioshazard · 2024-12-30T20:40:58Z

Opened a PR with a minimal check-inject addition. Hope you can make use of it. It meets at least my own needs for using naturally in OpenWebUI.

bioshazard added the bug Something isn't working label Dec 27, 2024

bioshazard added a commit to bioshazard/mistral.rs that referenced this issue Dec 29, 2024

Inject image token if not present for oai image messages (EricLBuehle…

d722164

…r#1008)

bioshazard added a commit to bioshazard/mistral.rs that referenced this issue Dec 29, 2024

Inject image token if not present for oai image messages (EricLBuehle…

e14384d

…r#1008)

bioshazard added a commit to bioshazard/mistral.rs that referenced this issue Dec 30, 2024

EricLBuehler#1008: Working <|image|> injection

b7d0e34

bioshazard mentioned this issue Dec 30, 2024

#1008: Working <|image|> injection #1015

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OAI compat endpoint w/ Images? #1008

OAI compat endpoint w/ Images? #1008

bioshazard commented Dec 27, 2024 •

edited

Loading

bioshazard commented Dec 27, 2024

bioshazard commented Dec 27, 2024 •

edited

Loading

bioshazard commented Dec 29, 2024

bioshazard commented Dec 29, 2024 •

edited

Loading

bioshazard commented Dec 29, 2024 •

edited

Loading

bioshazard commented Dec 30, 2024 •

edited

Loading

OAI compat endpoint w/ Images? #1008

OAI compat endpoint w/ Images? #1008

Comments

bioshazard commented Dec 27, 2024 • edited Loading

Describe the bug

Latest commit or version

bioshazard commented Dec 27, 2024

bioshazard commented Dec 27, 2024 • edited Loading

bioshazard commented Dec 29, 2024

bioshazard commented Dec 29, 2024 • edited Loading

bioshazard commented Dec 29, 2024 • edited Loading

bioshazard commented Dec 30, 2024 • edited Loading

bioshazard commented Dec 27, 2024 •

edited

Loading

bioshazard commented Dec 27, 2024 •

edited

Loading

bioshazard commented Dec 29, 2024 •

edited

Loading

bioshazard commented Dec 29, 2024 •

edited

Loading

bioshazard commented Dec 30, 2024 •

edited

Loading