A series of scripts to generate simulated datasets of different variant composition for SC2 wastewater surveillance efforts.
- python (3.8.1)
- art
- in_silico_pcr.pl
generate_replicates.py
can be used to create replicate datasets each with a random composition of a fixed number of variants
Example: generate_replicates.py -i accessions.txt -n 2 -o test_output
generate_simulated_datasets.py
takes the ouput ofgenerate_replicates.py
and a number of other files/arguments and generates simulated data reflecting the composition of the variants
Example: generate_simulated_datasets.py -f five_sequences.fasta -i test_output_replicate_1.tsv -p neb_vss1a_primer.tsv -n 10
generate_simulated_datasets.py
models differences in variation composition by multiplying the number of each amplicon from each variant by the percent abundance it is in the simulated 'sample'.- The total reads of is determined by the
-f
flag passed toart
which is "the fold of read coverage to be simulated or number of reads/read pairs generated for each amplicon"