You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi again, I have sometimes very large PDF documents (sometimes 1200+ pages) to convert into markdown. In my current setup (using another parser), I'm using pypdfium2 to split out a single page at a time and then I'm passing it to docling.
In this setup, I'm carefully managing buffers so that a potentially massive document doesn't cause OOM, and for each page that gets passed in, I return the results lazily using yield, and a consuming function then streams these things back to where it needs to go.
How difficult would it be to support something like this in pdf2markdown4llm, where, perhaps while analyzing, extracting, and converting, the markdown results of a single page could be streamed back to the caller, similarly?
Would the requirements of the analysis process be to rigid to support this? Thanks.
The text was updated successfully, but these errors were encountered:
Hi again, I have sometimes very large PDF documents (sometimes 1200+ pages) to convert into markdown. In my current setup (using another parser), I'm using pypdfium2 to split out a single page at a time and then I'm passing it to docling.
In this setup, I'm carefully managing buffers so that a potentially massive document doesn't cause OOM, and for each page that gets passed in, I return the results lazily using
yield
, and a consuming function then streams these things back to where it needs to go.How difficult would it be to support something like this in pdf2markdown4llm, where, perhaps while analyzing, extracting, and converting, the markdown results of a single page could be streamed back to the caller, similarly?
Would the requirements of the analysis process be to rigid to support this? Thanks.
The text was updated successfully, but these errors were encountered: