-
Notifications
You must be signed in to change notification settings - Fork 567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
installation.rst: remove need to install tesseract #3266
Conversation
As stated in the document already, Tesseract OCR is already bundled into the application.
CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅ |
I have read the CLA Document and I hereby sign the CLA |
On the Dangerzone project our original read of the documentation made us assume that we'd have to install Tesseract OCR on the host, which can be quite challenging to do in a cross-platform way. However, later we found that this was not the case and it made an incredible difference (thanks PyMuPDF team!). So hopefully this commit helps future users of the library realize that they don't have to install Tesseract. |
Thanks for pointing this stuff out. I think for this one it is better if bullet point 1, remains and it says:
Because we can't assume it is installed (even though it probably is vi the MuPDF installation), possibly someone could have removed it - who knows? Also further up the section it says in the text: "PyMuPDF will already contain all the logic to support OCR functions. But it additionally does need Tesseract’s language support data, so installation of Tesseract-OCR is still required." Perhaps that needs to say: "PyMuPDF will already contain all the logic to support OCR functions. But it additionally does need Tesseract’s language support data, which should already be installed on your system." |
You're welcome.
I don't know about this. I'd be inclined to recommend against, because if I were reading that I'd assume that I'd have to install it. If it stays like that then there's little point in this PR. However you prefer, but I'd be inclined to remove it. |
Fair enough, we can top the 1st bullet point. If you update this PR to do this part further up: Then we are good. :) |
Great point! I can do something like that but I think we need to tweak that phrase a bit:
This technically is not the case , I think. The tesseract data is the sole thing that's missing. How about something like this:
How does this sound? |
Sure - we try to avoid links which are not descriptive - so here is out ! Let's say: PyMuPDF will already contain all the logic to support OCR functions. But it additionally does need Tesseract’s language support data. |
Done 👌 |
Whoops. Sorry about that. Fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
As stated in the document already, Tesseract OCR is already bundled into the application.