Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chop up ancestors along the genome, for better parallelization #982

Open
hyanwong opened this issue Dec 6, 2024 · 0 comments
Open

Chop up ancestors along the genome, for better parallelization #982

hyanwong opened this issue Dec 6, 2024 · 0 comments

Comments

@hyanwong
Copy link
Member

hyanwong commented Dec 6, 2024

Chatting to @nspope, we just had an interesting idea for tsinfer. If you want to get some basic site times out, and don't mind too much about contiguity of the nodes, you could chop up the older ancestors arbitrarily in the same places for all chopped ancestors (e.g. at 1MB intervals). This would be a bit like running inference separately on 1MB chunks of genome, but with the advantage that you don't need to chop up the long, young ancestors. I think @benjeffery 's linesweep algorithm would then see all the chunks as parallelizable.
This could be a good way of doing a fast first pass to get site times for later reinference.

I was imagining a method on an ancestors instance like ancestors.truncate_ancestors, perhaps ancestors.chop(min_time, chop_positions), where chop_positions could be an integer giving a number of regularly spaced chop positions, or an array of floats specifying the positions to use, and only ancestors older than min_time are chopped. I think this could be quite easy to implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant