-
Notifications
You must be signed in to change notification settings - Fork 570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can not obtain correct PDF version because of warning "cannot recognize version marker" is not present #3226
Comments
import fitz
doc=fitz.open("PDF 2.0 UTF-8 string and annotation.pdf")
doc.metadata
{'format': 'PDF 2.0', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': '', 'producer': '', 'creationDate': '', 'modDate': '', 'trapped': '', 'encryption': None}
doc.is_repaired
False
fitz.TOOLS.mupdf_warnings()
''
fitz.__version__
'1.23.26'
doc=fitz.open("PDF 2.0 with offset start.pdf")
doc.metadata
{'format': 'PDF 1.7', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': '', 'producer': '', 'creationDate': '', 'modDate': '', 'trapped': '', 'encryption': None}
doc.is_repaired
True
fitz.TOOLS.mupdf_warnings()
'format error: cannot recognize version marker\ntrying to repair broken xref\nrepairing PDF document' I think this behavior is correct - and no bug at all. |
I am sorry, I did not want to submit the issue in this state, it was an accident I did not even know about. |
I fixed the title and description as it was incomplete. |
My output is different @JorjMcKie , there is no warning which I would like to see Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import fitz
>>> fitz.__version__
'1.23.26'
>>> doc=fitz.open("PDF 2.0 with offset start.pdf")
>>> doc.metadata
{'format': 'PDF 1.7', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': '', 'producer': '', 'creationDate': '', 'modDate': '', 'trapped': '', 'encryption': None}
>>> doc.is_repaired
True
>>> fitz.TOOLS.mupdf_warnings()
'trying to repair broken xref\nrepairing PDF document' |
This behavior is re-instated in (one of) the next MuPDF versions: Python 3.12.1 (tags/v3.12.1:2305ca5, Dec 7 2023, 22:03:25) [MSC v.1937 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.14.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import fitz
In [2]: fitz.__version__
Out[2]: '1.23.26'
In [3]: fitz.mupdf_version_tuple
Out[3]: (1, 24, 0)
In [4]: doc=fitz.open("PDF 2.0 with offset start.pdf")
In [5]: fitz.TOOLS.mupdf_warnings()
Out[5]: 'format error: cannot recognize version marker
trying to repair broken xref
repairing PDF document' |
Great, so the only thing I should do is to wait for future PyMuPDF relase(es)? |
Description of the bug
File: PDF 2.0 with offset start.pdf (taken from PDF association GitHub page with example PDFs)
Since version v1.23.1 (or v1.23.0 - not sure because v1.23.0 contains a critical bug fixed in v1.23.1 which prevents me runnig it), PyMuPDF stopped providing "cannot recognize version marker" warning. It was available in all previous versions. I use this warning to detect PDF version by itself instead of using one obtained by PyMuPDF (see #1435).
Here is a description of the linked PDF file (source)
How to reproduce the bug
To reproduce
Version 1.22.5:
Version 1.23.26:
Expected behaviour
Either detected PDF version will be 2.0 or at least "cannot recognize version marker" warning will be emited.
PyMuPDF version
1.23.26
Operating system
Linux
Python version
3.10
The text was updated successfully, but these errors were encountered: