-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Video VL Support #1290
Comments
The API can in fact already analyze videos. Here is a demo. |
Is this the same way that models like Qwen caption videos? From a brief overview of the repository you linked, it looks like that is just captioning frame-by-frame. Admittedly, I'm not too sure how the native video support works, but I would've expected it to be a different process than sending frame-by-frame and captioning as pictures? |
Yes, in fact I copied the ffmpeg idea from MiniCPM-V-2.6:
|
You change the 'batch-size' to send it multiple images at once. Unfortunately in it's current version KoboldCpp will not allow more tha 4 images to be submitted at the same time, so that's our limitation. |
Describe the Issue
New multimodal models are supporting not only image captioning (which Kobold implements) but video captioning as well. For examples see Qwen2-VL or Apollo (which is built on Qwen).
Additional Information:
For UI implementation, a simple "Add video" button beside the "Add img" button would suffice - although I believe getting it working with the API is more important. If there is already a way to achieve this with Kobold and I'm mistaken, please let me know!
Thank you for all the hard work! ^_^
The text was updated successfully, but these errors were encountered: