-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add rule for ocrd-tool-all.json, reduce image size, fix+update modules, fix CUDA #362
Conversation
efe05af
to
f8cfe20
Compare
sry for the noise – just wanted to rebase to master, so CI runs through |
- remove unnecessary steps - simplify commands to free up space - add more locations to rm - use fixed base image ubuntu-latest (only Docker build anyway), remove respective input - remove setup-python (only Docker build anyway), remove respective input - remove input choices with `-git` (same as without) - add input boolean upload-github - log in and push to GHCR, too - use conditional syntax for Dockerhub/Github options - add command to generate ocrd-all-tool.json from Docker - add action to upload ocrd-all-tool.json as artifact
Note: failure of normal (Circle) CI seems to be an independent, very recent problem coming from nvidia-tensorflow. |
It does give us an artifact ocrd-all-tool.json, but unfortunately, it's always zipped, and therefore cannot be linked directly. IIRC this is a restriction by Github Actions API. I'll now try to run the opposite of the spectrum – maximum-cuda-git. |
As I suspected: the nvidia-tensorflow is now spoiling all our builds. |
So excluding |
Ok, so maximum-cuda-git seems impossible to build on Github Actions:
That's despite our efforts to first wipe the VM clean of stuff we don't need (freeing 25 GB). What now? |
Building locally results in an image of 36 GB size. We should try to find the minimal set of CUDA runtimes we actually need for ocrd/core-cuda. |
Here's my analysis:
Perhaps there's more, but that should already yield a significant decrease in image size. Fighting for CUDA support in the various processors has just begun (again)... |
OK, you already implemented those in OCR-D/core#1041 AFAICT
OK, if it does not help reduce the size significantly, let's skip that, also already in OCR-D/core#1041
Install a newer git from https://launchpad.net/~git-core/+archive/ubuntu/ppa?
OK, so we would clean up the git repos before the
OK, so we revert 7a5ff45 and replace
If it's really just about ocrd_detectron2 and ocrd_typegroups_classifier, can't we align their torch requirement to always install the same version? |
Yes, but it now looks like it's even more complicated. For TF with GPU support, you do need libcudnn8 from the OS. (Unlike Torch which uses a pip package nvidia-cudnn-cu11, so there will always be two copies of that library and libcublas and others, worth around 1 GB.) We could either install this as a
I thought of that, but that in turn would require I now believe we already get close enough by using
Yes, one could do e.g.
But for the CI build, we don't need that – as long as we never initially clone more than needed anyway (hence
Something like that, yes. (But we cannot use
oh, sure we can. I just added an order-only dependency between the two. |
now includes #365 and depends on OCR-D/core#1055 (recent changes are geared towards better CUDA support in native installations – I still have to update the readme) |
CI now also runs successfully. Please merge and release! |
@# workaround against breaking changes in Numpy and OpenCV | ||
. $(ACTIVATE_VENV) && $(SEMPIP) pip install "numpy<1.24" "opencv-python-headless<4.5" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bertsky, could you please document which breaking changes required the old versions of numpy and opencv-python-headless? Those old versions don't work with Python 3.11. So in the long run it will be necessary to work with recent package versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't remember. At the time we had multiple modules which had not migrated to the new APIs of these modules, but I also remember hammering these into quite a few modules before the PR was finished.
Since we now have a test-workflow
backing the deployment, which does cover lots of critical modules, and the quiver diachronic view as a complementary check that can also be run locally, in advance, I actually recommend trying out dropping this – in a new PR.
in lieu of https://ocr-d.de/js/ocrd-all-tool.json, this generates the file dynamically
(to be used locally, or as part of CI – e.g. storing as artifact)