Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster testing using pytest-xdist #83

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

tmillenaar
Copy link
Contributor

@tmillenaar tmillenaar commented Oct 23, 2024

This is just a nice-to-have. It speed up the tests on my machine from 4 minutes down to one minute by running tests in parallel. The --dist worksteal is important here, because some of the tests take significantly longer than others. With worksteal enabled, if one process is testing a slow test, other processes can run the other tests in the meantime.

@tmillenaar
Copy link
Contributor Author

tmillenaar commented Oct 23, 2024

It looks like the CICD pytest duration went down from about 4 minutes to 2.5 minutes. I did have to use -n=logical instead of -n=auto. It seems while still a nice gain, the gain in performance is lower than on my own machine. I imagine the virtual CPU we get assigned in the github CICD does not have that many threads available. I think the difference between -n=auto and -n=logical is that logical uses hyperthreading, giving us more threads to work with.

@HannoSpreeuw HannoSpreeuw requested a review from suvayu October 28, 2024 15:56
@suvayu
Copy link
Collaborator

suvayu commented Oct 29, 2024

I would like to hold off on changes like this until we actually attempt to solve the slow test issue by addressing the root cause, converting to an inefficient format unnecessarily. This was there for backwards compatibility, but that appears to be unnecessary now.

@HannoSpreeuw
Copy link
Collaborator

I would like to hold off on changes like this until we actually attempt to solve the slow test issue by addressing the root cause, converting to an inefficient format unnecessarily. This was there for backwards compatibility, but that appears to be unnecessary now.

Right. As soon as we have replaced the source measurements, now collected as a containers.ExtractionResult instance of a list of extract.Detection instances by any data format that aligns well with RAM, there may be a considerable speedup, for TraP unit tests that process parameters from a large number of sources.
Most of the work we have to do is to adapt the PySE unit tests and TraP to accept this new data format that contains the source measurements.

@suvayu will be running TraP in debug mode to figure out exactly which source parameters TraP "eats".
I guess it will be the 17 floats and 1 Boolean that the extract.Detection.serialize produces, but that remains to be verified.
It seems to be the case, since we have

    serialized = [r.serialize(ew_sys_err, ns_sys_err) for r in results]

in tkp/steps/source_extraction.py

If confirmed, that would mean that on the PySE side our work would include collecting these 18 quantities in a single array, with number of rows equal to number of sources.

@tmillenaar
Copy link
Contributor Author

I can adapt to the structure you choose, if you would like to use a 2d numpy array than I can work with that :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants