You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I run ocrmypdf on a PDF where the first page is already OCRed (with a footer OCRed on every page), I try to force OCR on pages 2–99999999. Although the resulting PDF has selectable text on all pages in a browser, when I process it in Python (using a library based on poppler and etree), only the text from page 1 is accessible.
Now all pages’ text (including page 2 onwards) is fully accessible via poppler + etree.
Expected Behavior
Forcing OCR on pages 2+ in one command should yield the same PDF as doing it in two steps (first removing the OCR layer on pages 2+ then re-running OCR with --skip-text).
If I’ve misunderstood anything or missed any important detail, please let me know — I really appreciate your help in troubleshooting this!
The text was updated successfully, but these errors were encountered:
When I run ocrmypdf on a PDF where the first page is already OCRed (with a footer OCRed on every page), I try to force OCR on pages 2–99999999. Although the resulting PDF has selectable text on all pages in a browser, when I process it in Python (using a library based on poppler and etree), only the text from page 1 is accessible.
VN---3335Y_Y.fs.pdf
VN---3335Y_Y.fs-problem.pdf
Workaround
Remove OCR layer on pages 2+ and produce an intermediate PDF:
VN---3335Y_Y.fs-step1.pdf
Then run with --skip-text on the intermediate file:
VN---3335Y_Y.fs.pdf
Now all pages’ text (including page 2 onwards) is fully accessible via poppler + etree.
Expected Behavior
Forcing OCR on pages 2+ in one command should yield the same PDF as doing it in two steps (first removing the OCR layer on pages 2+ then re-running OCR with --skip-text).
If I’ve misunderstood anything or missed any important detail, please let me know — I really appreciate your help in troubleshooting this!
The text was updated successfully, but these errors were encountered: