-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation/Tutorial for multithreading #3078
Comments
Should this be in the tskit docs or the tutorials? If the latter, I guess it would come under the "parallelisation" tutorial mooted at tskit-dev/tutorials#151 (comment)? |
The most straightforward mode of parallelization is splitting the job over windows. After splitting, one can add the results (or average them by some weight) to get the final result. I've done some profiling and found that there is a good amount of overhead due to memory allocation for this strategy, especially in large problems. This can be avoided if we could pass a predefined array to the statistics functions and update the array "in-place" via Any thoughts?
|
Are you sure it's memory allocations here and not overhead associated with seeking along the sequence? I'd be surprised if malloc overhead was significant here |
Continuing from #3077.
I think this link from numpy docs is a good starting point: https://numpy.org/doc/2.2/reference/random/multithreading.html
The bottom line is that one can execute multiple computations concurrently with
concurrent.futures.ThreadPoolExecutor
given that the computation-heavy parts of the program are free from GIL.The text was updated successfully, but these errors were encountered: