Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OAI compat endpoint w/ Images? #1008

Open
bioshazard opened this issue Dec 27, 2024 · 6 comments
Open

OAI compat endpoint w/ Images? #1008

bioshazard opened this issue Dec 27, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@bioshazard
Copy link

bioshazard commented Dec 27, 2024

Describe the bug

I finally got llama 3.2 11b working and /image works great with -i but using it as an OAI compat endpoint doesn't seem to accept b64 images. I get this error:

ERROR mistralrs_core::engine: prompt step - Model failed with error: Msg("The number of images in each batch [0] should be the same as the number of images [1]. The model cannot support a different number of images per patch. Perhaps you forgot a `<|image|>` tag?")

With this messages payload:

[
  {
    "role": "user",
    "content": [
      {
        "type": "text",
        "text": "who is this?"
      },
      {
        "type": "image_url",
        "image_url": {
          "url": "(b64 dataurl)"
        }
      }
    ]
  }
]

I see no mention of image_url support in the HTTP.md so maybe this is not supported for OAI compat endpoint?

https://github.com/EricLBuehler/mistral.rs/blob/master/docs/HTTP.md

Latest commit or version

Using docker: ghcr.io/ericlbuehler/mistral.rs:cuda-86-sha-b38c72c

@bioshazard bioshazard added the bug Something isn't working label Dec 27, 2024
@bioshazard
Copy link
Author

Hmm maybe this example gives the hint that I need to include <|image_1|>\n in my text payload?? Even if that worked, its very strange for OAI compat endpoint. Recommend inferring image_1 etc in text payload if necessary. Will look into code base in case I can contribute anywhere.

https://github.com/EricLBuehler/mistral.rs/blob/master/examples/server/phi3v_base64.py#L61

@bioshazard
Copy link
Author

bioshazard commented Dec 27, 2024

Yep, it works if I include <|image|> in my text payload in OpenWebUI, but I gotta say that is not the OAI compat I'd expect. I can work with this for now tho, but will leave the bug report open as there is room to more directly meet the OAI compat standard. Thanks again for this excellent project to enable me to use 3.2 11b on my 3090

@bioshazard
Copy link
Author

Found the error line:

"The number of images in each batch {n_images_in_text:?} should be the same as the number of images {n_images_in_images:?}. The model cannot support a different number of images per patch. Perhaps you forgot a `<|image|>` tag?"

@bioshazard
Copy link
Author

bioshazard commented Dec 29, 2024

So I am thinking that mllama expects those image tokens within the text section. So my expectation for what I'd want out of this is to inject the necessary token at the oai payload processing step (rather than within this mllama processing step) if the token is not already present when an image content is provided.

This would accommodate how I have seen the schema not require the image token in the text content (which is necessary for a vanilla oai compat consumption by open web UI). And it would also accommodate existing users that already include the token.

I have never messed with rust but if someone doesn't beat me to it I might try my hand at what I'm suggesting.

@bioshazard
Copy link
Author

bioshazard commented Dec 29, 2024

I think I worked this out with Claude. Will attempt to add a step in here to detect and inject image tokens to the text part if not present.

async fn parse_request(

@bioshazard
Copy link
Author

bioshazard commented Dec 30, 2024

Opened a PR with a minimal check-inject addition. Hope you can make use of it. It meets at least my own needs for using naturally in OpenWebUI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant