General changes:
- Additional lines for the new option
--coverage
for the genome and transcriptome modes of the simulator on the mainREADME.md
file. - Added the
-x
or--coverage
flag for thesimulator.py
script. This option allows users to specify their target coverage for the simulation without any additional calculations on their end. Coverage is calculated based on raw read coverage (using the Lander/Waterman equation) and employs kernel density estimation functions for the aligned and unaligned read lengths, fitted on empirical data trained with the read_analysis.py script and specified to the simulator with the--model_prefix
flag. The system automatically applies kernel density estimation functions and the aligned/unaligned reads ratio to calculate the mean read length. It then counts the number of bases in the reference and divides that number by the mean read length to determine the number of reads required to achieve 1x raw read coverage. Subsequently, the number of reads needed to reach the specified raw read coverage is inferred by multiplying the number of reads for 1x coverage by the specified raw read coverage (#242).
genome
mode:
- For the
genome
mode of thesimulator.py
script, the coverage is calculated using the reference genome specified by the-rg
or--ref-g
flag.
trancriptome
mode:
- For the
transcriptome
mode of thesimulator.py
script, the coverage is calculated using the reference transcriptome specified by the-rt
or--ref_t
flag.
metagenome
mode:
- We currently do not support
--coverage
option for themetagenome
mode of thesimulator.py
script.
Notes:
- We expect this approach to estimate the coverage precisely enough. However, users should also be aware that if they specify minimum, maximum, or mean length for the reads that are substantially different than the emprical data, the calculated coverage might not estimate the output coverage.