Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'add_redact_annot' asking for 'need font file or buffer' for an existing font in PDF #3212

Closed
santanuOUP opened this issue Feb 29, 2024 · 2 comments
Labels
not a bug not a bug / user error / unable to reproduce

Comments

@santanuOUP
Copy link

Description of the bug

Trying to replace one text in a PDF using the following methods:
add_redact_annot
apply_redactions

Text replacement is done but the replaced text is not showing just like the searched text. So, wanted to apply the same font but PyMuPDF is throwing exception saying "need font file or buffer". Font for searched text was 'Verdana' so want to apply the same font for the replaced text.

Here is the log for the operation:

2024-02-29 11:42:34,188 Filename to process: Sample
2024-02-29 11:42:34,194 fonts: [(7, 'ttf', 'TrueType', 'FAAAAH+TimesNewRomanPSMT', 'FAAAAH', 'WinAnsiEncoding'), (11, 'ttf', 'TrueType', 'FAAABB+Verdana', 'FAAABB', 'WinAnsiEncoding'), (16, 'ttf', 'TrueType', 'FAAABG+Verdana-Bold', 'FAAABG', 'WinAnsiEncoding')]
2024-02-29 11:42:34,204 hits: [Rect(170.0, 252.3948974609375, 222.64393615722656, 265.7635498046875)]
2024-02-29 11:42:34,209 search_text: LocalPath
2024-02-29 11:42:34,209 search_text_size: 11.0 search_text_font: Verdana
2024-02-29 11:42:34,213 Exception raised: need font file or buffer

Here is my code:

try:
                # Open a document
                search_text = "LocalPath"
                replace_text = "better text"
                global search_text_size
                global search_text_font 
                with fitz.open(os.path.join("asset", file)) as doc:  
                    for page in doc:                  
                        logger.info(f'fonts: {page.get_fonts()}')
                        hits = page.search_for(search_text)  # list of rectangles where to replace
                        logger.info(f'hits: {hits}')
                        page_blocks = page.get_text("dict")["blocks"]
                        for block in page_blocks:
                            if "lines" in block.keys():
                                spans = block['lines']
                                for span in spans:
                                    data = span['spans']                                    
                                    for lines in data:
                                        if search_text == lines['text']:
                                            search_text_size = lines['size']
                                            search_text_font = lines['font']
                                            logger.info(f'search_text_size: {search_text_size} search_text_font: {search_text_font}')
                                                    
                        for rect in hits:
                            page.add_redact_annot(rect, replace_text, fontname = 'Verdana')  

                        page.apply_redactions(images=fitz.PDF_REDACT_IMAGE_NONE)  # don't touch images
                    doc.save("replaced.pdf", garbage=3, deflate=True)     
                
            except Exception as e:
                logger.error(f'Exception raised: {str(e)}')

Trying for long time, need help!

How to reproduce the bug

  1. Run the same code with a PDF file
  2. Try to replace one existing text and apply the same font like existing font to the replaced text

PyMuPDF version

1.23.25

Operating system

Windows

Python version

3.8

@JorjMcKie JorjMcKie added the not a bug not a bug / user error / unable to reproduce label Feb 29, 2024
@JorjMcKie
Copy link
Collaborator

This doesn't work like that!
If you want a font different from the Base14 fonts, you must use 2 or 3 steps:

  1. Extract the text information of the to-be-deleted text. Use the "dict" option with the rect as the clip.
  2. Erase text via apply_redactions.
  3. Provide your desired font in the usual way: font file or buffer, page.insert_font() etc. Then insert new text via è.g. page.insert_textbox() providing information previously extracted in step 1.

@santanuOUP
Copy link
Author

Hi @JorjMcKie ,

Thanks for your reply.

As you said I have implemented following:

page.get_text("dict", rect)
page.add_redact_annot(rect)
page.apply_redactions(images=fitz.PDF_REDACT_IMAGE_NONE)
page.insert_font(search_text_font)
page.insert_textbox(rect, replace_text)

My desired font is : 'CIDFont+F2' which is not a Base14 font, I think. But it is an existing font within the PDF and used by other texts also, so why it needs to be included again ?
page.insert_font throwing exception 'need font file or buffer'.

@pymupdf pymupdf locked and limited conversation to collaborators Mar 4, 2024
@JorjMcKie JorjMcKie converted this issue into discussion #3224 Mar 4, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
not a bug not a bug / user error / unable to reproduce
Projects
None yet
Development

No branches or pull requests

2 participants