-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix modifications #66
Comments
Hi @kimruterford, just summarising what we discussed in the call today. Similar files are generated by the pipeline for protein modifications as for alleles, and can be used to fix them in the PHAF files and in canto. They are in this folder: https://github.com/pombase/allele_qc/tree/master/results How to use the proposed fixesFor the fixing, the unique identifier of a fix is Important exceptionsImportant for when you write the script that applies the changes from TLDR: If there is a value in Another special case to take into account is decribed in #62. It can happen that someone has reported a modification on a residue that no longer exists in the currect gene structure (probably assigned with a high-throughput pipeline). For those cases, I have set the value of the column
But more are likely to happen in the future. These can either be deleted, or kept knowing that they have a sequence error. Related to #63 |
I've now applied these changes to Canto. I'll apply the fixes to the modifications in SVN next.
|
That's done too now. I'll check Chado after tomorrow's load. Edit - these changes were made:
|
Hi @kimrutherford, it seems like most of them went through, except for a few. The one with the "?" (expected), but also some histone_fix ones. https://github.com/pombase/allele_qc/blob/master/results/protein_modification_auto_fix.tsv
|
I think I see why:
|
Hi @kimrutherford, as I said today, some new alleles have appeared in the allele list that did not exist before, for instance
The reason why it did not appear before is because this allele has no annotations in canto, and was dropped and not ran through the previous pipeline. Not sure how we want to handle that, maybe you can filter that list before exporting it. I am pretty sure there is a lot of garbage on alleles without annotations. Also, the misterious unfixed modification |
Hi Manu. The Canto allele export file has an "annotation_count" column. Could you ignore alleles where that column is zero? |
Yes, I can use that to filter them out. |
Hi @kimrutherford, I did this in 3ef4c14 I am not just removing the alleles that have zero annotations in the canto file, in case there would be a case in which there is an allele in Canto without annotations, but with annotations in the PHAF files. See below to check that it makes sense https://github.com/pombase/allele_qc/blob/master/filter_alleles_pombase.py |
I think we can close this one |
No description provided.
The text was updated successfully, but these errors were encountered: