Chop up ancestors along the genome, for better parallelization #982

hyanwong · 2024-12-06T09:57:22Z

Chatting to @nspope, we just had an interesting idea for tsinfer. If you want to get some basic site times out, and don't mind too much about contiguity of the nodes, you could chop up the older ancestors arbitrarily in the same places for all chopped ancestors (e.g. at 1MB intervals). This would be a bit like running inference separately on 1MB chunks of genome, but with the advantage that you don't need to chop up the long, young ancestors. I think @benjeffery 's linesweep algorithm would then see all the chunks as parallelizable.
This could be a good way of doing a fast first pass to get site times for later reinference.

I was imagining a method on an ancestors instance like ancestors.truncate_ancestors, perhaps ancestors.chop(min_time, chop_positions), where chop_positions could be an integer giving a number of regularly spaced chop positions, or an array of floats specifying the positions to use, and only ancestors older than min_time are chopped. I think this could be quite easy to implement.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chop up ancestors along the genome, for better parallelization #982

Chop up ancestors along the genome, for better parallelization #982

hyanwong commented Dec 6, 2024

Chop up ancestors along the genome, for better parallelization #982

Chop up ancestors along the genome, for better parallelization #982

Comments

hyanwong commented Dec 6, 2024