[Feature Request] Video VL Support #1290

kleineluka · 2024-12-29T14:47:23Z

Describe the Issue
New multimodal models are supporting not only image captioning (which Kobold implements) but video captioning as well. For examples see Qwen2-VL or Apollo (which is built on Qwen).

Additional Information:
For UI implementation, a simple "Add video" button beside the "Add img" button would suffice - although I believe getting it working with the API is more important. If there is already a way to achieve this with Kobold and I'm mistaken, please let me know!

Thank you for all the hard work! ^_^

jabberjabberjabber · 2025-01-08T20:07:26Z

The API can in fact already analyze videos. Here is a demo.

kleineluka · 2025-01-09T00:48:27Z

Is this the same way that models like Qwen caption videos? From a brief overview of the repository you linked, it looks like that is just captioning frame-by-frame. Admittedly, I'm not too sure how the native video support works, but I would've expected it to be a different process than sending frame-by-frame and captioning as pictures?

jabberjabberjabber · 2025-01-09T06:10:16Z

Yes, in fact I copied the ffmpeg idea from MiniCPM-V-2.6:

For MiniCPM-V 2.6, we took the approach of extracting frames from the video file and inputting each frame data sequentially to the model. At the code level, I introduced the open source library ffmpeg to implement video frame extraction.And added the "video" parameter to the args of llama.cpp to read video files.

Support video understanding ggerganov/llama.cpp#9165

jabberjabberjabber · 2025-01-09T06:16:39Z

Is this the same way that models like Qwen caption videos? From a brief overview of the repository you linked, it looks like that is just captioning frame-by-frame. Admittedly, I'm not too sure how the native video support works, but I would've expected it to be a different process than sending frame-by-frame and captioning as pictures?

You change the 'batch-size' to send it multiple images at once. Unfortunately in it's current version KoboldCpp will not allow more tha 4 images to be submitted at the same time, so that's our limitation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Video VL Support #1290

[Feature Request] Video VL Support #1290

kleineluka commented Dec 29, 2024

jabberjabberjabber commented Jan 8, 2025

kleineluka commented Jan 9, 2025

jabberjabberjabber commented Jan 9, 2025

jabberjabberjabber commented Jan 9, 2025

[Feature Request] Video VL Support #1290

[Feature Request] Video VL Support #1290

Comments

kleineluka commented Dec 29, 2024

jabberjabberjabber commented Jan 8, 2025

kleineluka commented Jan 9, 2025

jabberjabberjabber commented Jan 9, 2025

jabberjabberjabber commented Jan 9, 2025