Jules #3204

julian-smith-artifex-com · 2024-02-25T08:50:58Z

Install pymupdf command that runs fitz/__main__.py:main().

Also made most Package constructor args keyword-only.

This uses pipcl.py's new `entry_points` support. Note that direct installation with `setup.py install` does not implement this. tests/test_general.py: test_cli() - basic test of command-line `pymupdf` command. Addresses #3199.

Checks that fitz.mupdf.pdf_subset_fonts2() can be called.

JorjMcKie

Script test_font.py is insufficient as it only confirms that the call to pdf_subset_fonts() does not crash.

Could we include additional tests:

Fonts actually are being subset, e.g. by comparing file sizes before / after. So we do need examples with non-subset fonts.
Text extraction is not being affected in any way by replacing fonts with their subsets, e.g. by comparing "dict" text extractions before / after.

As per Robin's statement, always all document pages should be used in the array - not a subset. One might argue, that in this case, we shouldn't ever need more than the PDF argument.

I have tried mupdf.pdf_subset_fonts2 and it did not work: no font subsets were created.

julian-smith-artifex-com · 2024-02-25T11:09:19Z

Script test_font.py is insufficient as it only confirms that the call to pdf_subset_fonts() does not crash.

It's only intended to check the python bindings are working. Additional checks can be added afterwards.

If you prefer, i could remove the test from this PR for now, until we have better tests written?

[...]

I have tried mupdf.pdf_subset_fonts2 and it did not work: no font subsets were created.

Unfortunately there's a mistake in the implementation in mupdf; i have a fix.

JorjMcKie · 2024-02-25T11:24:25Z

Ah, I understand.
We need to decide whether we want to safeguard against crashes in MuPDF's pdf_subset_fonts() at all.
If yes, we certainly should process more / many PDFs.
I vote for not doing this: I already confirmed that all our PDFs in the test/resources folder are processed without crash.

Otherwise, I have a test script that makes a fresh PDF with text using almost 10 different fonts and a consequential file size of 2 MB. Font types are OTF, TTF and CID.
After font subset, the file size is only 5% of the original.
If you want, you could include this one now instead ... or we postpone testing this completely.

julian-smith-artifex-com added 4 commits February 25, 2024 08:28

pipcl.py: added support for entry_points.

9ba5075

Also made most Package constructor args keyword-only.

tests/test_font.py: added test_mupdf_subset_fonts2().

457a7c3

Checks that fitz.mupdf.pdf_subset_fonts2() can be called.

changes.txt: update.

2b29c20

julian-smith-artifex-com requested a review from JorjMcKie February 25, 2024 08:50

JorjMcKie requested changes Feb 25, 2024

View reviewed changes

JorjMcKie self-requested a review February 27, 2024 08:18

JorjMcKie approved these changes Feb 27, 2024

View reviewed changes

julian-smith-artifex-com merged commit c41f831 into main Feb 27, 2024
2 checks passed

julian-smith-artifex-com deleted the jules branch February 27, 2024 08:50

github-actions bot locked and limited conversation to collaborators Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jules #3204

Jules #3204

julian-smith-artifex-com commented Feb 25, 2024

JorjMcKie left a comment •

edited

Loading

julian-smith-artifex-com commented Feb 25, 2024

JorjMcKie commented Feb 25, 2024

Jules #3204

Jules #3204

Conversation

julian-smith-artifex-com commented Feb 25, 2024

JorjMcKie left a comment • edited Loading

Choose a reason for hiding this comment

julian-smith-artifex-com commented Feb 25, 2024

JorjMcKie commented Feb 25, 2024

JorjMcKie left a comment •

edited

Loading