-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create high confidence datasets for ER signal sequence (true positives and true negatives) #1195
Comments
Suggest use well known proteins that already have a SigPep for positive set The negative set should be fairly easy too |
Here are |
They seem good to me ! |
Here is a set of true negatives. |
Phobius on my desktop predicts 76 of the 111 likely true positives have signal peptides. |
SignalP in "fast" mode predicts 68. It finds one that Phobius doesn't (SPAC959.05c) and there are 9 that Phobius finds that "fast" SignalP doesn't:
|
In slow/accurate mode it find fewer matches: 63. In that mode there are 14 found by Phobius that SignalP doesn't report:
|
These are all expected to have signal peptides |
and what is the threshold? Or is it a binary cut-off? |
Phobius just reports signal peptide or not but I haven't investigated if there are any command line options to tweak things. For SignalP the cutoff seems to be a likelihood of 0.5 |
Here are the likelihood scores for the 14 genes that SignalP says don't have signal peptides. Mostly very low. SignalP doesn't report the coordinates if the likelihood is less than the cutoff (0.5).
|
I've just tried the "111 likely true positives" sequence in DeepSig and there were 69 matches. All of those were also predicted by Phobius. There were 17 predicted by Phobius that were not predicted by DeepSig. Disappointing! |
around 100? in each set?
The text was updated successfully, but these errors were encountered: