-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a graph.slice()
feature to API
#332
Comments
That's an interesting idea. Can you elaborate on why (post) recapitation isn't your preferred approach? I think for recapitation you would then only need the bottom slice (for fwdpy11), and msprime could be given the full unsliced graph (it would just ignore events younger than the slice time). In any event, the bottom half is likely the more awkward half to construct because we require that demes without ancestors have an infinite start time. Your suggestion to extend the oldest epochs in the botton slice towards inifinty might work, but whether this works will depend on how the simulator chooses the time for the start of the simulation (the time corresponding to the end of the burn in). For SLiM, I choose this based on the oldest non-infinite time listed in the simulation (which might be a pulse Assuming this is the one-true-way for forward simulators to choose the simulation start time, then just extending the bottom-slice demes to infinity leaves the simulation start time as ambiguous, and the simulation start time could end up much more recent than the slice time (but maybe that doesn't matter?). Maybe it's easier to generalise the forward simulation code to accept a time parameter for the simulation start time, and everything before this is just ignored (so slicing would happen during conversion to forwards-time generations)?
Definitely worth it for graphs with ancient events/structure, and non-neutral features in the recent past.
Following our existing conventions, the resulting top and bottom slices would span the open-closed time intervals
Can you provide an example? |
Sounds like an interest idea to me too. One question that occurs to me, though: why can't the simulators deal with this by having |
Because that would require an API modification just to accommodate |
There are use cases were precapitation is preferable, such as simulating sweeps--dropping a mutation at a given frequency in a given deme requires pre-existing ancestry (in fwdpy11).
The start time of the simulation ( For the bottom time slice, one would convert it to the internal representation with
Well, this is tricky. IMO, one wouldn't want to modify the simulator (that's basically |
One issue with the For discrete time, one has to calculate the deme sizes a bit differently with exponential growth, e.g.: double next_size = std::round(static_cast<double>(N0[deme])* std::pow(G[deme],static_cast<double>(t - onset[deme] + 1))); More generally, since |
I'm referring here to API changes for the simulation function (analog of |
Ok, nice!
Yes, by "simulation start time" I was meaning the time you choose to correspond with the end of the burn in. It's natural to choose the oldest non-infinite time in the model, and this may or may not be appropriate for the bottom half of your sliced model.
I don't follow why this would be demes specific, or even required (except perhaps to permit this slice-based-precapitation). Surely being able to simulate a specific time interval of a demographic model is a feature that might be useful regardless of how the demographic model is given to the simulator? |
Perhaps, but it has never come up before, so it is not implemented. When building models using the low-level API, one would just build the model with the events at the "right times" or use Python operations to extract events from a larger model's |
Yes, this is the same approach I took in - name: deme2
epochs:
- {start_size: 75, end_time: 50}
- {start_size: 75, end_time: 20}
- {start_size: 300, end_time: 0} That way in
That was my thought at first, and we have a clear rule about the semi-open intervals that I think makes sense and we don't want to change. But imagine in the IM model above we slice at time = 100. In the top slice we would have the ancestral deme and then at the very end split it into deme1 and deme2, since the split should occur in the [100, inf) part of the graph. But the child demes would have start times of 0 and then existence times = 0. I think at some point it was decided that we don't want to allow that (and We'd either need to allow demes to have zero length epochs (which I think actually makes sense in some other scenarios as well), or disallow slices that cause this effect, or ...?
I hadn't thought of that, but the conversion code in |
What I have in mind here is that you might want to simulate one or more populations and then at time zero they are divided into multiple sub-demes. I could see this scenario coming up in some EE study or maybe doing some simulation based power analysis testing the difference between subdivision with evolution vs sub-demes coming immediately from the same panmictic population. I guess.. I can imagine setting up that scenario with actual organisms, so we might want to also allow simulating that scenario in |
Oooh, that's very clever! Sorry, I missed that in your original post.
I don't think this is a problem for your specific example of wanting to precapitate, because you can pass the unsliced graph to msprime and ask it to start at the slice time. Events at that time should be handled immediately (but I didn't check/test). import demes
import msprime
def demes_alive_at(graph, time):
return [deme for deme in graph.demes if deme.start_time > time >= deme.end_time]
T_slice = 100
graph = demes.load("preslice.yaml")
demog = msprime.Demography.from_demes(graph)
samples = {
deme.name: round(deme.size_at(T_slice)) for deme in demes_alive_at(graph, T_slice)
}
ts = msprime.sim_ancestry(demography=demog, samples=samples, start_time=T_slice)
I don't have any suggestions here, other than to round to an integer however you see fit. Is the maximum error here greater than 1 individual? |
It shouldn't be, if you treat both ends consistently. The problem is that someone may use |
Haha, yeah |
But not always, I think. I happened upon this lovely situation after writing loads of quiz questions for my class in Python and I could not predict the answer.
One usually wants round-to-even here, which |
@grahamgower -- this is the behavior that is irksome. (Had to look at my issues tickets for my course materials). In [1]: import numpy as np
In [2]: x = 2.675
In [3]: round(x, 2)
Out[3]: 2.67
In [4]: np.round(x, 2)
Out[4]: 2.68
In [5]: From the docs:
|
That is surprising, given that 2.675 seems to be represented exactly as a float. |
I think Python may not be using c99? The correct round to even is trivial to implement using |
Did you have any further thoughts here @apragsdale? I think we really do want to keep the current restriction that epoch lengths must be strictly greater than 0. It's fairly common to want to divide some number by an epoch's or deme's time span, so that code would need special cases to deal with zero-length epochs (code in the wild would inevitably be buggy, because such zero-length epochs wouldn't be tested). The example you provide here could be dealt with by just sampling individuals at time zero after the simulation has completed. Or am I missing something? A |
In conversation with @molpopgen, it could be useful to support a method for slicing an existing graph at a given time, which will return a top half describing the demography up to that slice point, and a bottom half with the remainder that ignores the demographic events above the slice. This came up because, for example, we want to be able to simulate the top half with
msprime
and then add selected mutations to only the bottom half (e.g. drop a sweeping mutation at a time after the slice event usingfwdpy11
), in effect to "precapitate" using the faster neutral simulator.In practice this could be a bit tricky. Below is a simple example of what we have in mind - slice the original model, returning two new graphs. The more ancient slice would just cut and reset times. The bottom slice would have demographic events and migrations from the slice to time zero, and above the slice time we would just add epochs to existing demes extending into the past.
Original graph:
Top slice:
Bottom slice:
Illustration using
demesdraw
(not deme widths are not to scale between panels):Some issues/questions:
The text was updated successfully, but these errors were encountered: