Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support multimodal input #319

Closed
2 tasks done
RyanMarten opened this issue Jan 7, 2025 · 8 comments
Closed
2 tasks done

[FEATURE] Support multimodal input #319

RyanMarten opened this issue Jan 7, 2025 · 8 comments
Assignees
Labels
enhancement New feature or request epic This is an epic

Comments

@RyanMarten
Copy link
Contributor

RyanMarten commented Jan 7, 2025

Example use cases with curator

  • synthetic data: generating synthetic captions for images
  • structured data extraction: getting itemized costs from receipts

Tasks:

@RyanMarten RyanMarten added the enhancement New feature or request label Jan 7, 2025
@marianna13
Copy link
Contributor

Hey @RyanMarten can help with captioning (we have a common project with LAION exactly on that)

@vutrung96
Copy link
Contributor

@marianna13 would be great if you could help! i think we were thinking of support multimodal as a input modality more generally (captioning is an example). we can brainstorm on this :D

@RyanMarten
Copy link
Contributor Author

That would be awesome @marianna13!

@madiator
Copy link
Contributor

madiator commented Jan 11, 2025 via email

@adamoptimizer adamoptimizer self-assigned this Jan 28, 2025
@adamoptimizer
Copy link
Contributor

Tasks:

  • Design an interface for multi modality
  • Support images/videos as input (url/local path)

@marianna13
Copy link
Contributor

I think loading images/videos from URLs will not be sustainable (for large datasets)

@adamoptimizer
Copy link
Contributor

adamoptimizer commented Jan 28, 2025

I think loading images/videos from URLs will not be sustainable (for large datasets)

For large datasets, we have batch processing!
we will have a basic support of multi modality starting with OpenAI. (Online mode)
Then progress with other providers along with edge cases!
Thanks

@marianna13
Copy link
Contributor

what do you want to use for batch processing?

@adamoptimizer adamoptimizer changed the title Request: Support multimodal input [FEATURE] Support multimodal input Jan 28, 2025
@adamoptimizer adamoptimizer added the epic This is an epic label Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request epic This is an epic
Projects
None yet
Development

No branches or pull requests

6 participants