[FEATURE] Support multimodal input #319

RyanMarten · 2025-01-07T19:50:06Z

Example use cases with curator

synthetic data: generating synthetic captions for images
structured data extraction: getting itemized costs from receipts

Tasks:

marianna13 · 2025-01-10T11:08:39Z

Hey @RyanMarten can help with captioning (we have a common project with LAION exactly on that)

vutrung96 · 2025-01-10T18:26:49Z

@marianna13 would be great if you could help! i think we were thinking of support multimodal as a input modality more generally (captioning is an example). we can brainstorm on this :D

RyanMarten · 2025-01-10T18:28:33Z

That would be awesome @marianna13!

madiator · 2025-01-11T04:41:03Z

Thanks Marianna! Sent via Superhuman ( ***@***.*** )

…

On Fri, Jan 10, 2025 at 10:28 AM, Ryan Marten < ***@***.*** > wrote: That would be awesome @ marianna13 ( https://github.com/marianna13 ) ! — Reply to this email directly, view it on GitHub ( #319 (comment) ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AAIJX25V74PKFJN7L7MJWBL2KAGORAVCNFSM6AAAAABUYM4I7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOBTGUYDIMJWG4 ). You are receiving this because you are subscribed to this thread. Message ID: <bespokelabsai/curator/issues/319/2583504167 @ github. com>

adamoptimizer · 2025-01-28T13:29:58Z

Tasks:

Design an interface for multi modality
Support images/videos as input (url/local path)

marianna13 · 2025-01-28T13:46:42Z

I think loading images/videos from URLs will not be sustainable (for large datasets)

adamoptimizer · 2025-01-28T13:54:11Z

I think loading images/videos from URLs will not be sustainable (for large datasets)

For large datasets, we have batch processing!
we will have a basic support of multi modality starting with OpenAI. (Online mode)
Then progress with other providers along with edge cases!
Thanks

marianna13 · 2025-01-28T15:42:54Z

what do you want to use for batch processing?

RyanMarten added the enhancement New feature or request label Jan 7, 2025

adamoptimizer self-assigned this Jan 28, 2025

adamoptimizer changed the title ~~Request: Support multimodal input~~ [FEATURE] Support multimodal input Jan 28, 2025

adamoptimizer added the epic This is an epic label Jan 28, 2025

kartik4949 assigned kartik4949 and unassigned adamoptimizer Jan 28, 2025

kartik4949 closed this as completed Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Support multimodal input #319

[FEATURE] Support multimodal input #319

RyanMarten commented Jan 7, 2025 •

edited by adamoptimizer

Loading

marianna13 commented Jan 10, 2025

vutrung96 commented Jan 10, 2025

RyanMarten commented Jan 10, 2025

madiator commented Jan 11, 2025 via email

adamoptimizer commented Jan 28, 2025

marianna13 commented Jan 28, 2025

adamoptimizer commented Jan 28, 2025 •

edited

Loading

marianna13 commented Jan 28, 2025

[FEATURE] Support multimodal input #319

[FEATURE] Support multimodal input #319

Comments

RyanMarten commented Jan 7, 2025 • edited by adamoptimizer Loading

marianna13 commented Jan 10, 2025

vutrung96 commented Jan 10, 2025

RyanMarten commented Jan 10, 2025

madiator commented Jan 11, 2025 via email

adamoptimizer commented Jan 28, 2025

marianna13 commented Jan 28, 2025

adamoptimizer commented Jan 28, 2025 • edited Loading

marianna13 commented Jan 28, 2025

RyanMarten commented Jan 7, 2025 •

edited by adamoptimizer

Loading

adamoptimizer commented Jan 28, 2025 •

edited

Loading