-
Notifications
You must be signed in to change notification settings - Fork 751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out of memory when running OCR for a lot of images #836
Comments
600 MB isn't a lot of memory. You'll probably need to increase that. |
Just to be sure, add a call to |
Yes. 600MB was because I tried to limit heap size to 300M on my local machine to reproduce the problem faster. When we ran the same process on Linux with more memory, we saw it threw the same error around 8G. Also we we monitored the Java Heap size and the JVM process memory size, there's a huge difference. On my local with heap size max to 300M, the JAVA heap size stayed below 250M, but the JVM process could use 2G memory. We do parallel OCR processing, for each thread, there's only one image file being OCR-ed at a given time. If the memory is cleaned up properly, ideally the memory usage shouldn't grow. I will try api.deallocate(). Any other ideas are much appreciated. Thanks! |
Hi @saudet , When I debug, both output and image have null deallocator at the following statements. Calling output.deallocate() doesn't seem doing anything. Is this the desired behavior?
Thanks. |
Yes, those are just pointers returned from native functions, so JavaCPP doesn't know how to deallocate them. |
Update: calling TessDeleteText(output) after each OCR greatly helped the memory issue (not fully resolved yet). |
java.lang.OutOfMemoryError: Physical memory usage is too high: physicalBytes (665M) > maxPhysicalBytes (600M)
at org.bytedeco.javacpp.Pointer.deallocator(Pointer.java:589)
at org.bytedeco.javacpp.Pointer.init(Pointer.java:125)
at org.bytedeco.tesseract.TessBaseAPI.allocate(Native Method)
at org.bytedeco.tesseract.TessBaseAPI.(TessBaseAPI.java:35)
If I set the heap size bigger, it will run into this error eventually. We follow the basic example. We create a new instance of BytedecoOcrAPI and call init() for each 'document' which consists of multiple image files that we call doOcr() for each image file.
public class BytedecoOcrAPI implements OcrAPI {
}
The text was updated successfully, but these errors were encountered: