-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combining sage + rescoring engines #23
Comments
Thank you for your detailed issue report! At the moment there is indeed no support for rescored sage results. I will look how hard it would be to make this possible. |
Thank you so much! Looking forward to the updates! |
Could you share your ms2rescore output files? I'm having trouble to get ms2rescore working on my machine... |
The attachment does not fit in 25 MB. Do you mind if I send the attachment by email to [email protected]? |
Yes, please do! |
I have now added support for the ms2rescore mokapot format: https://github.com/kusterlab/picked_group_fdr/tree/develop/data/ms2rescore_example Please let me know if this works for you. Note that this does not include quantification support. One option to make quantification with rescoring possible is to build in support for FlashLFQ output, which MS2Rescore actually generates an input file for. EDIT: Note that the 1% FDR quant cutoff is hard-coded in Sage: https://github.com/lazear/sage/blob/73eaf49a4e53179b9bd5329bc1f085835f5f3987/crates/sage/src/lfq.rs#L96 |
Yes, that actually worked for me, thank you very much! I think support for FlashLFQ would also be very useful in the future. If you want to use FlashLFQ, be aware of the recent Mokapot update. See wfondrie/mokapot#131 for details. I think as the updated version is in main but not in release yet, the results from ms2rescore for FlashLFQ still contain file with RT in parts of minute, which might confuse you during development. Do you think, as the Sage FDR filtering for LFQ is hardcoded, it would be possible to include iBAQ calculation for selected group fdr as an analogue? Does this seem possible as we already get all the peptides of the protein group as well as the original fasta database? I'll try to look into this question and the newly obtained results of this commit and come back with another issue for iBAQ. Thanks again for the quick fix! |
Great, glad it worked! I will definitely look into FlashLFQ support for the future. I'm not sure how we would calculate iBAQ values, it would suffer from the exact same problem of not having quantification information for the newly identified peptides. If you're thinking to use the |
Hi,
it would be great if you could add support or other ways to run the pipeline for combining inputs.
For example, I would like to do quantification and apply picked group FDR for the search that was done with sage and later rescored with ms2rescore.
At first glance, this appears to be a viable option. As an example, let's take a look at a common benchmark PXD008425 that
mentioned in the sage example README.
download PXD008425 A, B and AB series and convert them into MZML format using ThermoRawFileParser with default options.
I also download
iprg2016_with_labels.fasta
provided in ZIP package.perform a regular sage search with your config
do a rescore with ms2rescore, which does a rescore and applies Mokapot
ms2rescore --psm-file results.sage.tsv --psm-file-type 'sage' --spectrum-path data/ -f iprg2016_with_labels.fasta -n 40
There is a notable break here. Since ms2rescore renames the columns of the Mokapot output and your inputs expect unambiguous Mokapot results, I've restored the original Mokapot-like form for the rescored files and returned them with original SpecIDs and Sage results.
now it's time to run picked_group_fdr
python3 -u -m picked_group_fdr \ --fasta iprg2016_with_labels.fasta \ --sage_results results.sage.tsv \ --sage_lfq_tsv lfq.tsv \ --protein_groups_out default_combined_protein.tsv \ --output_format fragpipe \ --do_quant --lfq_min_peptide_ratios 1 \ # the only difference is num of threads --methods sage --num_threads 40
python3 -u -m picked_group_fdr \ --fasta iprg2016_with_labels.fasta \ --sage_results results.sage.tsv \ # inserting rescore results here --perc_evidence as_real_mokapot.proteins.txt as_real_mokapot.proteins.decoy.txt \ --sage_lfq_tsv lfq.tsv \ --protein_groups_out with_perc_combined_protein.tsv \ --output_format fragpipe \ --do_quant --lfq_min_peptide_ratios 1 \ --methods sage --num_threads 40
It seems that any percolator evidence is ignored, output results are the same (or am I missing the logic and the results are meant to be the same?)
This is strange, because
--perc_evidence
is supposed to be an alternative to--mq_evidence
.--method
is not specified, pipeline skips quantification and fails laterpython3 -u -m picked_group_fdr \ --fasta iprg2016_with_labels.fasta \ --perc_evidence as_real_mokapot.proteins.txt as_real_mokapot.proteins.decoy.txt \ --sage_lfq_tsv lfq.tsv \ --protein_groups_out with_perc_no_sage_no_method_combined_protein.tsv \ --output_format fragpipe \ --do_quant --lfq_min_peptide_ratios 1 \ # manually skipped method --num_threads 40
Despite the fact, the
--sage_lfq_tsv
is provided.Fails just later here
In summary, would you be so kind as to recommend the logic for the Mokapot rescored sage results for quantification and picked protein group FDR approach?
Even if you do not support rescoring with ms2rescore, what about combining sage + mokapot?
Versions and packages:
The text was updated successfully, but these errors were encountered: