You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm running variant calling on non-model organisms, and for some of the downstream analyses (e.g., nucleotide diversity calculations), it is necessary to generate (possibly boolean) accessibility masks that classify sites as accessible for analysis at a single base-pair resolution. Accessibility masks can be generated by summing coverages over all samples and masking out sites with too low or too high coverage. In addition, one could mask sites based on the number/fraction of individuals having sufficient coverage, i.e., absence/presence calls (cf https://onlinelibrary.wiley.com/doi/full/10.1111/mec.16077, Table 3). The genomes in question are so large that it is not possible to generate variant files including monomorphic sites on which to perform filtering.
Until now I have been using the Python API to sum coverages and count the number of indivuduals with coverages within a threshold range for each site. This is somewhat slow so I was wondering whether this functionality could be added directly to the d4 Rust library. I gave it a try based on the merge function, but my Rust knowledge is somewhat limited.
I'm thinking of commands somewhere in the line of
d4tools sum file1.d4 file2.d4 ... fileN.d4 outfile.d4
I'd be happy to submit a pull request if I could get pointers on where to start. What are your thoughts on this - do you prefer cases like these to be handled by external APIs (e.g., Python) or is it amenable to implementation in Rust?
Cheers,
Per
The text was updated successfully, but these errors were encountered:
Hi @percyfal This is something I've been working on for our workflow snpArcher. I'm glad to see there is interest in a function/tool for generating accessibility masks via coverage. I don't have a repo for this yet, but will soon and can let you know when its available.
Thanks for the heads up @cademirch. FYI, I ended up drafting a Python package to perform the tasks detailed above. You can find the code at https://github.com/percyfal/d4utils. BTW, say hi to Erik with whom I previously have collaborated.
Awesome - just took a quick peek and it's looks great! I will definitely share with my colleagues, and perhaps we can integrate this in to our workflow. I'd also be happy to contribute if you are open to it - can discuss in your repo.
Hi,
I'm running variant calling on non-model organisms, and for some of the downstream analyses (e.g., nucleotide diversity calculations), it is necessary to generate (possibly boolean) accessibility masks that classify sites as accessible for analysis at a single base-pair resolution. Accessibility masks can be generated by summing coverages over all samples and masking out sites with too low or too high coverage. In addition, one could mask sites based on the number/fraction of individuals having sufficient coverage, i.e., absence/presence calls (cf https://onlinelibrary.wiley.com/doi/full/10.1111/mec.16077, Table 3). The genomes in question are so large that it is not possible to generate variant files including monomorphic sites on which to perform filtering.
Until now I have been using the Python API to sum coverages and count the number of indivuduals with coverages within a threshold range for each site. This is somewhat slow so I was wondering whether this functionality could be added directly to the d4 Rust library. I gave it a try based on the merge function, but my Rust knowledge is somewhat limited.
I'm thinking of commands somewhere in the line of
d4tools sum file1.d4 file2.d4 ... fileN.d4 outfile.d4
and
d4tools count file1.d4 file2.d4 ... fileN.d4 outfile.d4 --min-coverage 3
I'd be happy to submit a pull request if I could get pointers on where to start. What are your thoughts on this - do you prefer cases like these to be handled by external APIs (e.g., Python) or is it amenable to implementation in Rust?
Cheers,
Per
The text was updated successfully, but these errors were encountered: