-
-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements #453
Performance improvements #453
Conversation
Avoid some operations if logging is turned off
Thanks @henrinormak, these are great findings and and an excellent write-up! I'll review this shortly. I will also look into if the polling can be dropped from the exec, I think it would make my day if it could be 😄 |
Locally on macOS and Not sure what kind of an impact (if any) it has on the performance |
@henrinormak if you have that change ready I'd appreciate the PR, and I can see how to get it tested on Windows, no worries if not, I'll look into it when I can |
@cristianrgreco pushed the change to this branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's... beautiful 😄
Just tested this on a Windows machine running Docker 20.10.22 and it works |
|
||
export const imageExists = async (dockerode: Dockerode, imageName: DockerImageName): Promise<boolean> => { | ||
log.debug(`Checking if image exists: ${imageName}`); | ||
return (await listImages(dockerode)).some((image) => image.equals(imageName)); | ||
|
||
return imageCheckLock.acquire(imageName.toString(), async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to provide a timeout here, in case something goes wrong and the lock can't be acquired? Likewise does the lock need to be released in some finally
block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lock implementation handles the release in case the user code throws, it also allows configuring a timeout, but for now I left it out, as presumably the underlying operations already have timeouts?
I think I found a few places that can be tuned to help increase performance in setups that have many parallel test cases, and to put these into context, I used one of the test sets I have access to where testcontainers is being rolled out.
The test set is fairly large, consisting of about 1300 jest test cases (in about 180 test suites), where almost each test case touches a container - container reuse is enabled, however they still go through
GenericContainer
and the reuse logic there (i.e no memory level resource sharing between test cases). All of the test-cases are markedconcurrent
and through all of my test runs, I used 4 jest workers (so 4 test suites were running in parallel).I ran each scenario 5 times, probably not statistically significant, but as the test set is pretty big, it takes too long to run them more than that. I did discard any test run that failed (there are some test cases that are flaky and fail sometimes), but all in all, <5 runs were discarded.
Here's the baseline I captured with version 9.1.3:
After moving the auth config lookup to only occur when an image actually needs pulling, adding a lock around image pulling and avoiding some logic if logging is disabled (mainly determining the system statistics) [1]:
After optimising the way the image existence check occurs [2]:
Reasoning
[1] - The auth check does not need to occur if we decide to not pull the image, this also fixes a bug, as previously
runInContainer
did not do the auth config checkup, which I assume it should have. The lock I added to avoid concurrent tests starting to pull an image that they all share, this way we only pull the image once and then determine that it already exists for the other checks. Determining the statistics, although spun off as a promise that is not awaited, I still think it does not make sense if logging is not enabled (which is probably in majority of cases).[2] - Previously every single check fetched the entire list of images from Docker, which on certain machines can be quite a large set. This new approach caches the results and again also applies a lock to avoid parallel checks for the same image from doing parallel work. Images that do not exist always update the full list of images we have.
Followup
I suspect there is some more room for improvement, one thing I notice is the default waiting strategy, which uses
Promise.all
for the internal port check commands. It could probably be improved by using a modified approach withPromise.any
, allowing for resolving fast in case one of the commands is faster then the others and gives a positive answer. I left this out asPromise.any
was introduced in Node 15, and if I understand correctly, this project also supports Node 14.It is also possible that the
execContainer
utility could be improved to avoid polling, at least dockerode's issue on this seems to suggest that latest versions of docker have a shim that solves the original issue whystream.on('end')
was not properly working on Windows (not sure whether that is the reason why this library uses the polling approach or not).