v1.24 breaks XPS-table example #3295

martin-dreyer-teck · 2024-03-22T15:32:19Z

Description of the bug

With v1.23.26 running page.find_tables() with the provided example file XPS-table.pdf returns one table.
With v1.24, doing the same returns ZERO tables.

How to reproduce the bug

pip install PyMuPDF==1.24

import fitz

doc = fitz.open("XPS-table.pdf")
page = doc[0]
tabs = page.find_tables()
print(f"len={len(tabs.tables)}")

output: len=0

PyMuPDF version

1.24.0

Operating system

Linux

Python version

3.10

The text was updated successfully, but these errors were encountered:

julian-smith-artifex-com · 2024-03-23T10:00:48Z

Could you provide the file XPS-table.pdf?

JorjMcKie · 2024-03-23T12:07:32Z

This is not a table-related problem. When taking a closer look it turns out that no text is extracted from the page.
Based on this, of course no table can ever be identified.

JorjMcKie · 2024-04-17T17:50:08Z

Closed b/o extended time without response.

martin-dreyer-teck · 2024-04-17T17:57:34Z

@JorjMcKie apologies, I missed the messages on this hence the late response. FYI the file XPS-table.pdf is in the pymupdf repo and presented in examples of how to do table extraction.

https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/table-analysis

JorjMcKie added the Waiting for information label Mar 23, 2024

JorjMcKie closed this as completed Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.24 breaks XPS-table example #3295

v1.24 breaks XPS-table example #3295

martin-dreyer-teck commented Mar 22, 2024

julian-smith-artifex-com commented Mar 23, 2024

JorjMcKie commented Mar 23, 2024

JorjMcKie commented Apr 17, 2024

martin-dreyer-teck commented Apr 17, 2024

v1.24 breaks XPS-table example #3295

v1.24 breaks XPS-table example #3295

Comments

martin-dreyer-teck commented Mar 22, 2024

Description of the bug

How to reproduce the bug

PyMuPDF version

Operating system

Python version

julian-smith-artifex-com commented Mar 23, 2024

JorjMcKie commented Mar 23, 2024

JorjMcKie commented Apr 17, 2024

martin-dreyer-teck commented Apr 17, 2024