Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3rd Party multi-modal model gemini-2.0-flash-001 responds with MANY '-' and whitespaces in tables causing >1000 chunks for 1 page #614

Open
JohanBekker opened this issue Feb 11, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@JohanBekker
Copy link

Describe the bug
3rd Party multi-modal model gemini-2.0-flash-001 sometimes uses a lot of -'s to describe a markdown table, exploding the number of tokens for the page. With a chunk size of <=512, this sometimes leads to >1000 chunks for a single page.

Files
https://www.asml.com/en/investors/annual-report/2023

Job ID
0d446037-f53b-4cd0-8978-dfd346e50915

Client:
Please remove untested options:

  • Python Library
  • API

Additional context
Response for page 347 of the ASML annual report is added as file because it didn't fit in here.

page_347.json

@JohanBekker JohanBekker added the bug Something isn't working label Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant