Page Streaming? #2

xdave · 2025-01-21T05:57:19Z

Hi again, I have sometimes very large PDF documents (sometimes 1200+ pages) to convert into markdown. In my current setup (using another parser), I'm using pypdfium2 to split out a single page at a time and then I'm passing it to docling.

In this setup, I'm carefully managing buffers so that a potentially massive document doesn't cause OOM, and for each page that gets passed in, I return the results lazily using yield, and a consuming function then streams these things back to where it needs to go.

How difficult would it be to support something like this in pdf2markdown4llm, where, perhaps while analyzing, extracting, and converting, the markdown results of a single page could be streamed back to the caller, similarly?

Would the requirements of the analysis process be to rigid to support this? Thanks.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Page Streaming? #2

Page Streaming? #2

xdave commented Jan 21, 2025 •

edited

Loading

Page Streaming? #2

Page Streaming? #2

Comments

xdave commented Jan 21, 2025 • edited Loading

xdave commented Jan 21, 2025 •

edited

Loading