Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AlignmentTools.jar pairwise-knn output #1

Open
sheikki opened this issue May 11, 2016 · 2 comments
Open

AlignmentTools.jar pairwise-knn output #1

sheikki opened this issue May 11, 2016 · 2 comments

Comments

@sheikki
Copy link

sheikki commented May 11, 2016

I'm classifying representative sequences of quality controlled and clustered 16S reads with command:

java -jar AlignmentTools.jar pairwise-knn query.fq db.fa

The db file is unaligned prokaryotic subset of RDP 11.4 clustered at 99% (with some sequence length thresholds).

Is this a sensible way to assign taxonomy to my representative sequences?

In output, I see lines like:
@650A9:00200:00424 1 + 155 1.000 0 34 34 0 83 S004055894 Listeria monocytogenes; CA5 Lineage=Root;rootrank;Bacteria;domain;Firmicutes;phylum;Bacilli;class;Bacillales;order;Listeriaceae;family;Listeria;genus

As far as I can tell it's QID KNEIGHBOURS STRAND SCORE %ID QSTART QEND QEND QSTART SSTART SID. Is this the correct interpretation? Why is it that the QSTART and QEND values are displayed twice?

@rdpstaffmsu
Copy link

Hi, sheikki,

The columns definition is in the header of the output file:

#seqname k orientation score ident query_start query_end query_length
ref_start ref_end ref_seqid ref_desc

Is "@650A9:00200:00424" a sequence of length 34? If so, this assignment
might be your best bet, but it is too short to be reliable.

Benli Chai

RDP Staff

On Wed, May 11, 2016 at 5:48 AM, sheikki [email protected] wrote:

I'm classifying representative sequences of quality controlled and
clustered 16S reads with command:

java -jar AlignmentTools.jar pairwise-knn query.fq db.fa

The db file is unaligned prokaryotic subset of RDP 11.4 clustered at 99%
(with some sequence length thresholds).

Is this a sensible way to assign taxonomy to my representative sequences?

In output, I see lines like:

@650A9:00200:00424 1 + 155 1.000 0 34 34 0 83 S004055894 Listeria
monocytogenes; CA5
Lineage=Root;rootrank;Bacteria;domain;Firmicutes;phylum;Bacilli;class;Bacillales;order;Listeriaceae;family;Listeria;genus

As far as I can tell it's QID KNEIGHBOURS STRAND SCORE %ID QSTART QEND
QEND QSTART SSTART SID. Is this the correct interpretation? Why is it that
the QSTART and QEND values are displayed twice?


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#1

RDP Staff
Ribosomal Database Project
Center for Microbial Ecology
Michigan State University
567 Wilson Rd. Room 2225 A
East Lansing, MI 48824
(517) 353-3842

@sheikki
Copy link
Author

sheikki commented May 12, 2016

Thank you for the reply. Oddly, in my alignment file, ref_start value is always zero. A few examples:

@650A9:00007:00316  1   -   265 0.940   0   72  72  0   427 S001099040  Bacillus subtilis; XN-80-5  Lineage=Root;rootrank;Bacteria;domain;Firmicutes;phylum;Bacilli;class;Bacillales;order;Bacillaceae 1;family;Bacillus;genus
>-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------TGAGCAACATCTTGCACGGTACTGACT-ACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATAC----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>CACGTGGGTAACCTGCCTGTAAGACTGGGATAACTCCGGGAAACCGGGGCTAATACCGGATGGTTGTTTGAACCGCATGGTTCAGACATAAAAGGTGGCTTCGGCTACCACTTACAGATGGACCCGCGGCGCATTAGCTAGTTGGTGAGGTAACGGCTCACCAAGGCAACGATGCGTAGCCGACCTGAGAGGGTGATCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCGCAATGGACGAAAGTCTGACGGAGCAACGCCGCGTGAGTGATGAAGGTTTTCGGATCGTAAAGCTCTGTTGTTAGGTAAGAACAAGTGCCGTTCAAATA-GGGCGGCACCTTG-ACGGTAC---CTAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGGGCTCGCAGGCGGTTTCTTAAGTCTGATGTGAAAGCCCCCGGCTCAACCGGGGAGGGTCATTGGAAACTGGGGAACTTGAGTGCAGAAGAGGAGAGTGGAATTCCACGTGTAGCGGTGAAATGCGTAGAGATGTGGAGGAACACCAGTGGCGAAGGCGACTCTCTGGTCTGTAACTGACGCTGAGGAGCGAAAGCGTGGGGAGCGAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAAGTGTTAGGGGGTTTCCGCCCCTTAGTGCTGCAGCTAACGCATTAAGCACTCCGCCTGGGGAGTACGGTCGCAAGACTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGTTTAATTCGAAGCAACGCGAAGAACCTTACCAGGTCTTGACATCCTCTGACAATCCTAAGAGATAGGACGTCCCCTTCGGGGCAAGGTGACAGGTGGTGGCATTAGGAAGACAAGTCGTTCAATAAGCGGCACTTGACGGTACTACCAGAAAGGCCACGCTAACTACGTGCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTGTCGGAATATTGGGCGTAAAGGGCTCGCAGGCGGTTTCTTAAGTCTGATGTGAAAGCCCCCGGCTCAACCGGGGAGGGTCATTGGAAACTGGGGAACTTGAGTGCAGAAGAGGAGAGTGGAATTTCACGTGTAGCGGTGAAATGCGTAGAGATGTGGAGGAACACCAGTGGCGAAGGCGACTCTCTGGTCTGTAACTGACGCTGAGGAGCGAAAGCGTGGGGAGCGAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAAGTGTTAGGGGGTTTCCGCCCCTTAGTGCTGCAGCTAACGCATTAAGCACTCCGCCTGGGGAGTACGGTCGCAAGACTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAGGTCTTGACATCCTCTGACAATCCTAGAGATAGGACGTCCCCTTCGGGGGCAGAGTGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGATCTTAGTTGCCAGCATTCAGTTGGGCACTCTAAGGTGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGACCTGGGCTACACACGTGCTACAATGGGCAGAACAAAGGGCAGCGAAACCGCGAGGTTAAGCCAATCCCACAAATCTGTTCTCAGTTCGGATCGCAGTCTGCAACTCGACTGCGTGAAGCTGGAATCGTTAGTAATCGCGGATCAGCATGCCGCGGTGAATACGTTCCCGGGCCTTGTACACACCG
@650A9:00009:00308  1   -   449 1.000   0   102 102 0   515 S003301453  Bacillus cereus; B16    Lineage=Root;rootrank;Bacteria;domain;Firmicutes;phylum;Bacilli;class;Bacillales;order;Bacillaceae 1;family;Bacillus;genus
>---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ACTCTGGTTGTTAGGG-AGAACAAGTAGCTAG-T-AATAGCTGGCACCTTGACGGTACCTAA-CAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATAC-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>TATTTGGGCGGGGGGGGGCCTATCATGCAGTCGAGCGAATGGATTAAGAGCTTGCTCTTATGAAGTTATCGGCGGACGGGTGAGTAACACGTGGGTAACCTGCCCATAAGACTGGGATAACTCCGGGAAACCGGGGCTCTAATACCGGATAACATTTTGAACCGCATGGTTCGAAATTGAAAGGCGGCTTCGGCTGTCACTTATGGATGGACCCGCGTCGCATTAACTAGTTGGTGAGGTAACGGCTCACCAAGGCAACGATGCGTAGGCGACCTGAGAGGGTGATCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCGCAATGGACGAAAGTCTGACGGAGCAACGCCGCGTGAGTGATGAAGGCTTTCGGGTCGTAAAACTCT-GTTGTTAGGGAAGAACAAGT-GCTAGTTGAATAGCTGGCACCTTGACGGTACCTAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGCGCGCGCAGGTGGTTTCTTAAGTCTGATGTGGAAAGCCCACGGCTCAACCGTGGAGGGTCATTGGAAACTGGGAGACTTGAGTGCAGAGGAAAGTGGAATTCCATGTGTAGCGGTGAAATGCGTAGAGATATGGAGGAACACCAGTGGCGAAGGCGACTTTCTGGTCTGTAACTGACACTGAGGCGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAAGTGTTAGAGGGTTTCCGCCCTTTAGTGCTGAAGTTAACGCATTAAGCACTCCGCCTGGGGAGTACGGCCGCAAGGCTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTAATTCGAAGCAACGCGAAGAACCCTACCAGGTCTTGACATCCTCTGAAACCCTAGAGATAGGGCTTCTCCTTCGGGAGCAGAGTGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGTTAAGTCCGCAACGAGCGCAACCCTTGATCTTAGTTGCCATCATTAAGTTGGGCACTCTAAGGTGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGACCTGGGCTACACACGTGCTACAATGGACGGTACAAAGAGCTGCAAGACCGCGAGGTGGAGCTATTCTCATAAAACCGTTCTCAGTTCGGATTGTAGGCTGCAACTCGCCTACATGAAGCTGGAATCGCTAGTAATCGCGGATCAGGTTACCGCGGTGAATACGTTCCCGGGCCTTGTACACACCTCCCGTCACACCACGAGAGTTTGTAACACCCGAAGTCGGTGGGGTAACCTTTTGGGAGCCAGCCGGCCTAAAGGGGGAGAAAG

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants