Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More docs on output files #5

Closed
nick-youngblut opened this issue Jun 18, 2018 · 2 comments
Closed

More docs on output files #5

nick-youngblut opened this issue Jun 18, 2018 · 2 comments

Comments

@nick-youngblut
Copy link

As far as I can tell from reading the SPARSE docs, there's currently not much information describing the data in the output files. For instance, the Ragna_toy/profile.txt file generated in the toy example contains the following:

Total	3287	2883.0
Unmatched	12.291	0.000
Uncertain_match	7.373	8.406
~0	80.3361	91.5937	Bacteria|-|Proteobacteria|Gammaproteobacteria|Enterobacterales|Enterobacteriaceae (10)
u0	80.3361	91.5937	Bacteria|-|Proteobacteria|Gammaproteobacteria|Enterobacterales|Enterobacteriaceae|Salmonella (10)
s0	80.2903	91.5415	Bacteria|-|Proteobacteria|Gammaproteobacteria|Enterobacterales|Enterobacteriaceae|Salmonella|Salmonella enterica (10)
r0	72.5460	82.7120	Bacteria|-|Proteobacteria|Gammaproteobacteria|Enterobacterales|Enterobacteriaceae|Salmonella|Salmonella enterica|Salmonella enterica subsp. enterica (10)
p6	70.1516	79.9820	Bacteria|-|Proteobacteria|Gammaproteobacteria|Enterobacterales|Enterobacteriaceae|Salmonella|Salmonella enterica|Salmonella enterica subsp. enterica|Salmonella enterica subsp. enterica serovar Paratyphi C str. RKS4594: GCF_000018385.1 (10)

... but I haven't been able to find any information on what each value in this table represents, and it's probably not a good idea to just assume what these values are. It would be very helpful to have a clear description of the information in the output files.

@nick-youngblut
Copy link
Author

It would also help to have more docs on the query function, particularly the --tag option. The sparse query docs state:

Filter by relationships between different level of barcodes. i.e.,
                        "p!=r;p==a" gets references that have the same numbers in p groups and a groups, but different between p groups and r groups

...and I believe that p and r refer to barcode_tags, but it's really unclear what barcode_tags are. I thought they might just be listing the barcode_dist groupings, but at least in the docs, the barcode_tags list is 1 value longer than the barcode_tags list. Does "a" stand for "all" or something?

@nluhmann
Copy link
Collaborator

nluhmann commented Jul 5, 2018

Hi Nick,
thanks a lot for your feedback and sorry it took us a while to answer.
We extended the docs to include more detailed descriptions of the output files, I hope the contents of the profile.txt outfile are much clearer now.
Regarding the --tag option, p andr refer to different ANI threshold levels in the clustered database, and all level identifiers are now explained in the docs. We will add some more examples for useful --tag filters, similar to the example given in the command docs, very soon.
Please let us know if you have any other issues!

@nluhmann nluhmann closed this as completed Jul 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants