-
Notifications
You must be signed in to change notification settings - Fork 24
smudgeplot hetkmers
This is an algorithm that extracts kmer pairs from a FastK k-mer database. The most computationally relevant parameter is L, which is the threshold for considering k-mers as genomic k-mers,usually the value would be dividing well the errors and the first genomic peak of the k-mer spectrum. Look at wikipage chosing L and U for details.
usage: smudgeplot hetkmers [-h] [-L L] [-t T] [-o O] [-tmp TMP] [--verbose] [infile]
Calculate unique kmer pairs from FastK k-mer database.
positional arguments:
infile Input FastK database (.ktab) file.
options:
-h, --help show this help message and exit
-L L Count threshold below which k-mers are considered erroneous
-t T Number of threads (default 4)
-o O The pattern used to name the output (kmerpairs).
-tmp TMP Directory where all temporary files will be stored (default /tmp).
--verbose verbose mode
The output file is <output_pattern>_text.smu
. The coverage file has the following format
10 10 9196
10 11 15000
10 12 12912
11 11 6324
10 13 10440
11 12 10526
10 14 8578
...
where the three columns correspond to covB
(the one of a pair with lower coverage), covA
(higher coverage) and freq
, which is how many k-mer pairs have been seen with these two k-mer coverages respectively. The less covered k-mer is always in the first column. At this point, it is impossible to retrieve sequences of the k-mers.