Can not obtain correct PDF version because of warning "cannot recognize version marker" is not present #3226

Matmaus · 2024-03-04T15:19:09Z

Description of the bug

File: PDF 2.0 with offset start.pdf (taken from PDF association GitHub page with example PDFs)

Since version v1.23.1 (or v1.23.0 - not sure because v1.23.0 contains a critical bug fixed in v1.23.1 which prevents me runnig it), PyMuPDF stopped providing "cannot recognize version marker" warning. It was available in all previous versions. I use this warning to detect PDF version by itself instead of using one obtained by PyMuPDF (see #1435).

Here is a description of the linked PDF file (source)

This is an example of a PDF file that was updated from a PDF 1.7 file to a PDF 2.0 file. This shows how an incremental save might be used when an existing PDF 1.7 file is updated and you want to mark the PDF as a PDF 2.0 file. The page should display the string "PDF 2.0 files have spacing" if it is properly parsed and interpreted; a different string will display if the viewer is not capable of reading the incremental save in the file. This example also shows how a PDF "file" may contain more than just PDF data. The comments at the beginning of the file are not in PDF syntax and are not considered as part of the PDF data. Note that file offsets in the PDF cross-reference table are relative to the start of the PDF data, and not to the beginning of the file itself.

How to reproduce the bug

To reproduce

import fitz as pymupdf

doc = pymupdf.open('PDF 2.0 with offset start.pdf')  # see section above
doc.metadata['format']
doc.is_repaired
pymupdf.TOOLS.mupdf_warnings()

Version 1.22.5:

>>> import fitz as pymupdf
>>> doc = pymupdf.open('PDF 2.0 with offset start.pdf')
>>> doc.is_repaired
True
>>> doc.metadata['format']
'PDF 1.7'
>>> pymupdf.TOOLS.mupdf_warnings()
'cannot recognize version marker\ntrying to repair broken xref\nrepairing PDF document'

Version 1.23.26:

>>> import fitz as pymupdf
>>> doc = pymupdf.open('PDF 2.0 with offset start.pdf')
>>> doc.is_repaired
True
>>> doc.metadata['format']
'PDF 1.7'
>>> pymupdf.TOOLS.mupdf_warnings()
'trying to repair broken xref\nrepairing PDF document'

Expected behaviour

Either detected PDF version will be 2.0 or at least "cannot recognize version marker" warning will be emited.

PyMuPDF version

1.23.26

Operating system

Linux

Python version

3.10

JorjMcKie · 2024-03-04T15:29:00Z

import fitz
doc=fitz.open("PDF 2.0 UTF-8 string and annotation.pdf")
doc.metadata
{'format': 'PDF 2.0', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': '', 'producer': '', 'creationDate': '', 'modDate': '', 'trapped': '', 'encryption': None}
doc.is_repaired
False
fitz.TOOLS.mupdf_warnings()
''
fitz.__version__
'1.23.26'
doc=fitz.open("PDF 2.0 with offset start.pdf")
doc.metadata
{'format': 'PDF 1.7', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': '', 'producer': '', 'creationDate': '', 'modDate': '', 'trapped': '', 'encryption': None}
doc.is_repaired
True
fitz.TOOLS.mupdf_warnings()
'format error: cannot recognize version marker\ntrying to repair broken xref\nrepairing PDF document'

I think this behavior is correct - and no bug at all.

Matmaus · 2024-03-04T15:32:32Z

I am sorry, I did not want to submit the issue in this state, it was an accident I did not even know about.

Matmaus · 2024-03-04T15:39:17Z

I fixed the title and description as it was incomplete.

Matmaus · 2024-03-04T15:46:03Z

My output is different @JorjMcKie , there is no warning which I would like to see

Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import fitz
>>> fitz.__version__
'1.23.26'
>>> doc=fitz.open("PDF 2.0 with offset start.pdf")
>>> doc.metadata
{'format': 'PDF 1.7', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': '', 'producer': '', 'creationDate': '', 'modDate': '', 'trapped': '', 'encryption': None}
>>> doc.is_repaired
True
>>> fitz.TOOLS.mupdf_warnings()
'trying to repair broken xref\nrepairing PDF document'

JorjMcKie · 2024-03-04T16:00:40Z

This behavior is re-instated in (one of) the next MuPDF versions:

Python 3.12.1 (tags/v3.12.1:2305ca5, Dec  7 2023, 22:03:25) [MSC v.1937 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.14.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import fitz
In [2]: fitz.__version__
Out[2]: '1.23.26'
In [3]: fitz.mupdf_version_tuple
Out[3]: (1, 24, 0)
In [4]: doc=fitz.open("PDF 2.0 with offset start.pdf")
In [5]: fitz.TOOLS.mupdf_warnings()
Out[5]: 'format error: cannot recognize version marker
trying to repair broken xref
repairing PDF document'

Matmaus · 2024-03-04T16:06:18Z

Great, so the only thing I should do is to wait for future PyMuPDF relase(es)?

JorjMcKie added the not a bug not a bug / user error / unable to reproduce label Mar 4, 2024

JorjMcKie closed this as completed Mar 4, 2024

Matmaus changed the title ~~Can not obtain correct PDF version because~~ Can not obtain correct PDF version because of warning "cannot recognize version marker" is not present Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can not obtain correct PDF version because of warning "cannot recognize version marker" is not present #3226

Can not obtain correct PDF version because of warning "cannot recognize version marker" is not present #3226

Matmaus commented Mar 4, 2024 •

edited

Loading

JorjMcKie commented Mar 4, 2024

Matmaus commented Mar 4, 2024

Matmaus commented Mar 4, 2024

Matmaus commented Mar 4, 2024 •

edited

Loading

JorjMcKie commented Mar 4, 2024

Matmaus commented Mar 4, 2024

Can not obtain correct PDF version because of warning "cannot recognize version marker" is not present #3226

Can not obtain correct PDF version because of warning "cannot recognize version marker" is not present #3226

Comments

Matmaus commented Mar 4, 2024 • edited Loading

Description of the bug

How to reproduce the bug

To reproduce

Expected behaviour

PyMuPDF version

Operating system

Python version

JorjMcKie commented Mar 4, 2024

Matmaus commented Mar 4, 2024

Matmaus commented Mar 4, 2024

Matmaus commented Mar 4, 2024 • edited Loading

JorjMcKie commented Mar 4, 2024

Matmaus commented Mar 4, 2024

Matmaus commented Mar 4, 2024 •

edited

Loading

Matmaus commented Mar 4, 2024 •

edited

Loading