Release v3.2.3 · bcgsc/NanoSim

General changes:

Additional lines for the new option --coverage for the genome and transcriptome modes of the simulator on the main README.md file.
Added the -x or --coverage flag for the simulator.py script. This option allows users to specify their target coverage for the simulation without any additional calculations on their end. Coverage is calculated based on raw read coverage (using the Lander/Waterman equation) and employs kernel density estimation functions for the aligned and unaligned read lengths, fitted on empirical data trained with the read_analysis.py script and specified to the simulator with the --model_prefix flag. The system automatically applies kernel density estimation functions and the aligned/unaligned reads ratio to calculate the mean read length. It then counts the number of bases in the reference and divides that number by the mean read length to determine the number of reads required to achieve 1x raw read coverage. Subsequently, the number of reads needed to reach the specified raw read coverage is inferred by multiplying the number of reads for 1x coverage by the specified raw read coverage (#242).

genome mode:

For the genome mode of the simulator.py script, the coverage is calculated using the reference genome specified by the -rg or --ref-g flag.

trancriptome mode:

For the transcriptome mode of the simulator.py script, the coverage is calculated using the reference transcriptome specified by the -rtor --ref_t flag.

metagenome mode:

We currently do not support --coverage option for the metagenome mode of the simulator.py script.

Notes:

We expect this approach to estimate the coverage precisely enough. However, users should also be aware that if they specify minimum, maximum, or mean length for the reads that are substantially different than the emprical data, the calculated coverage might not estimate the output coverage.

Provide feedback