add rule for ocrd-tool-all.json, reduce image size, fix+update modules, fix CUDA #362

bertsky · 2023-03-28T18:45:29Z

in lieu of https://ocr-d.de/js/ocrd-all-tool.json, this generates the file dynamically

(to be used locally, or as part of CI – e.g. storing as artifact)

bertsky · 2023-03-28T19:56:10Z

sry for the noise – just wanted to rebase to master, so CI runs through

- remove unnecessary steps - simplify commands to free up space - add more locations to rm - use fixed base image ubuntu-latest (only Docker build anyway), remove respective input - remove setup-python (only Docker build anyway), remove respective input - remove input choices with `-git` (same as without) - add input boolean upload-github - log in and push to GHCR, too - use conditional syntax for Dockerhub/Github options - add command to generate ocrd-all-tool.json from Docker - add action to upload ocrd-all-tool.json as artifact

bertsky · 2023-03-28T21:13:27Z

Note: In 3bc8d6a, I modified @stweil's Github Action workflow for Docker – see detailed commit msg.

I triggered it for minimum (without Dockerhub or Github push, because that would not work from my fork anyway) to see if cleanup, Docker build and artifact uploading works.

bertsky · 2023-03-28T23:14:20Z

Note: failure of normal (Circle) CI seems to be an independent, very recent problem coming from nvidia-tensorflow.

bertsky · 2023-03-28T23:53:17Z

I triggered it for minimum (without Dockerhub or Github push, because that would not work from my fork anyway) to see if cleanup, Docker build and artifact uploading works.

It does give us an artifact ocrd-all-tool.json, but unfortunately, it's always zipped, and therefore cannot be linked directly. IIRC this is a restriction by Github Actions API.

I'll now try to run the opposite of the spectrum – maximum-cuda-git.

bertsky · 2023-03-29T00:38:17Z

Note: failure of normal (Circle) CI seems to be an independent, very recent problem coming from nvidia-tensorflow.

I'll now try to run the opposite of the spectrum – maximum-cuda-git.

As I suspected: the nvidia-tensorflow is now spoiling all our builds.

bertsky · 2023-03-29T09:06:10Z

So excluding nvidia-tensorflow==1.15.5+nv23.3 helps, but we have another glitch with protobuf, which I recall seeing in the last release sprint already.

bertsky · 2023-03-29T17:19:58Z

Ok, so maximum-cuda-git seems impossible to build on Github Actions:

no space left on device

That's despite our efforts to first wipe the VM clean of stuff we don't need (freeing 25 GB).

What now?

bertsky · 2023-03-30T09:55:39Z

Building locally results in an image of 36 GB size. We should try to find the minimal set of CUDA runtimes we actually need for ocrd/core-cuda.

bertsky · 2023-04-01T09:05:44Z

Building locally results in an image of 36 GB size. We should try to find the minimal set of CUDA runtimes we actually need for ocrd/core-cuda.

Here's my analysis:

concerning ocrd/core:
- wrong base image; nvidia/cuda/-runtime-cudnn-ubuntu – the cudnn contains gigabytes of cublas and cudnn that are never actually used in our venvs, because pip needs to install newer/different versions anyway, we still miss devel for things like nvcc (and cudnn-devel actually means development files for cudnn, not for CUDA code)
- multi-version CUDA runtimes; we probably don't need that if we get a correct CUDA toolkit (devel) and rebuild packages
- apt-get autoremove is ill-conceived; first, it also removes packages indiscriminately that we do need like most of the things in devel (nvcc etc); second, due to layering it does not actually reduce the size; in the non-CUDA build, keeping the extra gcc costs merely ~100 MB
concerning ocrd/all:
- git branches, e.g. ~700 MB gh-pages in ocrd_detectron2; as soon as we started shipping complete repos (-git variants with pip install -e), it was wrong to allow git submodule update --init; instead, we should have created partial clones. Fortunately, git>=2.26 allows using git submodule update --init --single-branch. Unfortunately, Ubuntu 20.04 still only ships v2.25 (and I don't know how to simply emulate this behaviour, short of doing explicit git clone --single-branch for each submodule)
- git submodule deinit and git clean in the submodules within the Docker build; this has to happen outside, in the Docker build context, via proper module dependencies of the docker% target (or at least before COPY . .)
- abandoning our single-layer principle, e.g. apt-get -y install automake autoconf libtool pkg-config g++ && make deps-ubuntu && apt-get -y autoremove && apt-get clean as an extra step; everything must be part of docker.sh, otherwise it does not save space at all
- torch both in venv and sub-venv; we actually started this to accomodate ocrd_detectron2's more elaborate (CUDA-enabled) installation recipe with ocrd-typegroups-classifier (which installs whatever torch it can get); pip being dumb as it is breaks this satisfiable conflict; but the version pulled from ocrd_detectron2 is always better, so it should be given preference – within the top-level venv

Perhaps there's more, but that should already yield a significant decrease in image size. Fighting for CUDA support in the various processors has just begun (again)...

kba · 2023-04-03T12:29:00Z

Building locally results in an image of 36 GB size. We should try to find the minimal set of CUDA runtimes we actually need for ocrd/core-cuda.

Here's my analysis:

concerning ocrd/core:

wrong base image; nvidia/cuda/-runtime-cudnn-ubuntu – the cudnn contains gigabytes of cublas and cudnn that are never actually used in our venvs, because pip needs to install newer/different versions anyway, we still miss devel for things like nvcc (and cudnn-devel actually means development files for cudnn, not for CUDA code)

multi-version CUDA runtimes; we probably don't need that if we get a correct CUDA toolkit (devel) and rebuild packages

OK, you already implemented those in OCR-D/core#1041 AFAICT

apt-get autoremove is ill-conceived; first, it also removes packages indiscriminately that we do need like most of the things in devel (nvcc etc); second, due to layering it does not actually reduce the size; in the non-CUDA build, keeping the extra gcc costs merely ~100 MB

OK, if it does not help reduce the size significantly, let's skip that, also already in OCR-D/core#1041

concerning ocrd/all:

git branches, e.g. ~700 MB gh-pages in ocrd_detectron2; as soon as we started shipping complete repos (-git variants with pip install -e), it was wrong to allow git submodule update --init; instead, we should have created partial clones. Fortunately, git>=2.26 allows using git submodule update --init --single-branch. Unfortunately, Ubuntu 20.04 still only ships v2.25 (and I don't know how to simply emulate this behaviour, short of doing explicit git clone --single-branch for each submodule)

Install a newer git from https://launchpad.net/~git-core/+archive/ubuntu/ppa?

git submodule deinit and git clean in the submodules within the Docker build; this has to happen outside, in the Docker build context, via proper module dependencies of the docker% target (or at least before COPY . .)

OK, so we would clean up the git repos before the docker build call?

abandoning our single-layer principle, e.g. apt-get -y install automake autoconf libtool pkg-config g++ && make deps-ubuntu && apt-get -y autoremove && apt-get clean as an extra step; everything must be part of docker.sh, otherwise it does not save space at all

OK, so we revert 7a5ff45 and replace apt-get ... with echo "apt-get ..." >> docker.sh?

torch both in venv and sub-venv; we actually started this to accomodate ocrd_detectron2's more elaborate (CUDA-enabled) installation recipe with ocrd-typegroups-classifier (which installs whatever torch it can get); pip being dumb as it is breaks this satisfiable conflict; but the version pulled from ocrd_detectron2 is always better, so it should be given preference – within the top-level venv

If it's really just about ocrd_detectron2 and ocrd_typegroups_classifier, can't we align their torch requirement to always install the same version?

bertsky · 2023-04-03T12:49:24Z

multi-version CUDA runtimes; we probably don't need that if we get a correct CUDA toolkit (devel) and rebuild packages

OK, you already implemented those in OCR-D/core#1041 AFAICT

Yes, but it now looks like it's even more complicated. For TF with GPU support, you do need libcudnn8 from the OS. (Unlike Torch which uses a pip package nvidia-cudnn-cu11, so there will always be two copies of that library and libcublas and others, worth around 1 GB.)

We could either install this as a FIXUP in core-cuda, or via deps-ubuntu in ocrd_all. In ocrd_all we need some extra workaround anyway: TF now depends on CUDA>=11.8, but we wanted to keep 11.3 (for various reasons). So we need to hold it at tensorflow<2.12, which with stupid pip means preinstalling it...

Unfortunately, Ubuntu 20.04 still only ships v2.25 (and I don't know how to simply emulate this behaviour, short of doing explicit git clone --single-branch for each submodule)

Install a newer git from https://launchpad.net/~git-core/+archive/ubuntu/ppa?

I thought of that, but that in turn would require software-properties-common etc. I don't like dragging in hundreds of megabytes worth of extras without being able to remove them afterwards (because we cannot just do autoremove, see above).

I now believe we already get close enough by using --depth 1. So for the outside (build context, everything in Dockerfile before COPY), we can use a newer Ubuntu and pass GIT_DEPTH=--single-branch. And for the inside (commands echoed into docker.sh) we can use GIT_DEPTH='--depth 1'. I hope...

OK, so we would clean up the git repos before the docker build call?

Yes, one could do e.g.

git submodule foreach 'for ref in $(git for-each-ref --no-contains=HEAD --format="%(refname)" refs/remotes/ | sed s,^refs/remotes/,,); do git branch -d -r $ref; done' && git gc

But for the CI build, we don't need that – as long as we never initially clone more than needed anyway (hence --single-branch or --depth 1).

OK, so we revert 7a5ff45 and replace apt-get ... with echo "apt-get ..." >> docker.sh?

Something like that, yes. (But we cannot use autoremove, too.)

If it's really just about ocrd_detectron2 and ocrd_typegroups_classifier, can't we align their torch requirement to always install the same version?

oh, sure we can. I just added an order-only dependency between the two.

bertsky · 2023-06-09T15:08:26Z

now includes #365 and depends on OCR-D/core#1055

(recent changes are geared towards better CUDA support in native installations – I still have to update the readme)

bertsky · 2023-06-14T08:53:52Z

CI now also runs successfully.

Please merge and release!

stweil · 2023-06-19T05:52:33Z

Makefile

+	@# workaround against breaking changes in Numpy and OpenCV
+	. $(ACTIVATE_VENV) && $(SEMPIP) pip install "numpy<1.24" "opencv-python-headless<4.5"


@bertsky, could you please document which breaking changes required the old versions of numpy and opencv-python-headless? Those old versions don't work with Python 3.11. So in the long run it will be necessary to work with recent package versions.

Sorry, I don't remember. At the time we had multiple modules which had not migrated to the new APIs of these modules, but I also remember hammering these into quite a few modules before the PR was finished.

Since we now have a test-workflow backing the deployment, which does cover lots of critical modules, and the quiver diachronic view as a complementary check that can also be run locally, in advance, I actually recommend trying out dropping this – in a new PR.

bertsky requested a review from kba March 28, 2023 18:45

add rule for ocrd-tool-all.json

f8cfe20

bertsky force-pushed the add-json-all-tools branch from efe05af to f8cfe20 Compare March 28, 2023 19:55

bertsky added 2 commits March 28, 2023 23:04

docker rmi: fix argument

2835c6c

bertsky added 7 commits March 28, 2023 23:15

docker rmi: avoid assuming which Ubuntu is installed

39955c9

reinstate -git variants

3e4f209

generate ocrd-all-tool.json: fix image name

c50dfa3

fix docker run (needs -i)

a0dbe73

fix make ocrd-all-tool.json

672eac3

make ocrd-all-tool.json: avoid git actions

bc364d7

add SSH session for debugging

8cd239c

bertsky added 2 commits March 29, 2023 01:24

make ocrd-all-tool.json: try outside of Docker

a799797

make ocrd-all-tool.json: add venv dependency

32e1538

TF1: exclude nvidia-tensorflow==1.15.5+nv23.3

e746a88

bertsky added 2 commits March 29, 2023 11:33

downgrade protobuf

72b3738

hold Numpy for ocrd_cis

353fa44

bertsky mentioned this pull request Mar 29, 2023

How to get ocrd-tool.json if processors not installed in processing server? OCR-D/core#1034

Open

bertsky mentioned this pull request Apr 1, 2023

Reduce ocrd/core-cuda OCR-D/core#1041

Merged

update submodules

08f39d8

bertsky added 6 commits June 10, 2023 22:39

downgrade eynollah

c37f993

update core (deps-cuda)

1757708

add 'test-core' and 'test-workflow', improve 'help'

8044756

update ocrd_kraken (default to device=cuda:0), adapt test-workflow

482f364

update/improve readme

c7c170b

improve readme markup

5f6a27b

bertsky changed the title ~~add rule for ocrd-tool-all.json~~ add rule for ocrd-tool-all.json, reduce image size, fix+update modules, fix CUDA Jun 12, 2023

bertsky and others added 8 commits June 12, 2023 18:01

improve/fix docker rules

8c62507

📝 changelog

93b445f

switch detectron2/kraken dependency

edb8f23

update changelog again

e4fe65d

GHA makedocker: add input switch for upterm console

626110a

GHA makedocker: move upterm console step before build

6131c46

GHA makedocker: workaround for input boolean vs string mixup

1a5a49a

docker*: avoid unconstrained parallelism (which leads to deadlock)

4ecde60

kba merged commit 4ecde60 into OCR-D:master Jun 14, 2023

This was referenced Jun 16, 2023

Fix opencv and tesserocr #365

Closed

Fix GitHub action for docker build #358

Closed

stweil reviewed Jun 19, 2023

View reviewed changes

kba mentioned this pull request Jun 21, 2023

ocrd resmgr: Circumvent dynamic lookup in the filesystem or resource list with ocrd-all-tool.json OCR-D/core#1059

Open

bertsky mentioned this pull request Jul 10, 2023

Python update OCR-D/ocrd-website#360

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add rule for ocrd-tool-all.json, reduce image size, fix+update modules, fix CUDA #362

add rule for ocrd-tool-all.json, reduce image size, fix+update modules, fix CUDA #362

bertsky commented Mar 28, 2023

bertsky commented Mar 28, 2023

bertsky commented Mar 28, 2023 •

edited

Loading

bertsky commented Mar 28, 2023

bertsky commented Mar 28, 2023

bertsky commented Mar 29, 2023

bertsky commented Mar 29, 2023

bertsky commented Mar 29, 2023

bertsky commented Mar 30, 2023

bertsky commented Apr 1, 2023 •

edited

Loading

kba commented Apr 3, 2023

bertsky commented Apr 3, 2023

bertsky commented Jun 9, 2023 •

edited

Loading

bertsky commented Jun 14, 2023

stweil Jun 19, 2023

bertsky Jun 19, 2023

		@# workaround against breaking changes in Numpy and OpenCV
		. $(ACTIVATE_VENV) && $(SEMPIP) pip install "numpy<1.24" "opencv-python-headless<4.5"

add rule for ocrd-tool-all.json, reduce image size, fix+update modules, fix CUDA #362

add rule for ocrd-tool-all.json, reduce image size, fix+update modules, fix CUDA #362

Conversation

bertsky commented Mar 28, 2023

bertsky commented Mar 28, 2023

bertsky commented Mar 28, 2023 • edited Loading

bertsky commented Mar 28, 2023

bertsky commented Mar 28, 2023

bertsky commented Mar 29, 2023

bertsky commented Mar 29, 2023

bertsky commented Mar 29, 2023

bertsky commented Mar 30, 2023

bertsky commented Apr 1, 2023 • edited Loading

kba commented Apr 3, 2023

bertsky commented Apr 3, 2023

bertsky commented Jun 9, 2023 • edited Loading

bertsky commented Jun 14, 2023

stweil Jun 19, 2023

Choose a reason for hiding this comment

bertsky Jun 19, 2023

Choose a reason for hiding this comment

bertsky commented Mar 28, 2023 •

edited

Loading

bertsky commented Apr 1, 2023 •

edited

Loading

bertsky commented Jun 9, 2023 •

edited

Loading