Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: not enough values to unpack (expected2, got 0) #29

Open
LiaSerrano opened this issue Aug 25, 2022 · 10 comments
Open

ValueError: not enough values to unpack (expected2, got 0) #29

LiaSerrano opened this issue Aug 25, 2022 · 10 comments

Comments

@LiaSerrano
Copy link

I tried to format a Prosit library like the TraMl lib. I am getting a similar error to what I did with an MGF massiveKB library—I think its not able to associate peptide to protein ?

I get a “file_corrected” output but the peptide/spectral FDR outputs are empty and there is no proteinFDR output. These outputs appear when I take out the decoys, however.

Let me know if you would like me to send over the library I was using if that would be helpful.
csodiaq_error_August.pdf

@jessegmeyerlab
Copy link
Member

jessegmeyerlab commented Aug 25, 2022

@AlexandreHutton was going to add direct support for prosit libraries. Let's see if that works because that would avoid any problems with library conversion. Lex can you please update us where we are with that?

@AlexandreHutton
Copy link
Member

I found this exact error while working with the FragPipe library. I thought it might be a problem with the library itself, but it sounds like it's an issue with the code.
I think the problem might be with the FDR calculation somewhere. Adding in decoys gets us past that error but then produces an empty proteinFDR file. I'm investigating.

@jessegmeyerlab
Copy link
Member

Thanks Lex for the update. I have some ideas.

If there was a way to convert a library that we know works to the other formats then we could rule out or confirm the issue relates to edge cases with the library format.

Since you think the problem is with fdr calculations and adding decoys gets past the error to produce empty output, I wonder how the fdr calculation deals with the case where there are no decoy hits. This could happen if the library contains no decoys or by luck in some rare circumstances.

@LiaSerrano, does your library have decoys?

@AlexandreHutton does the frag pipe library have decoys?

@LiaSerrano
Copy link
Author

The library I was using has reverse sequence decoys predicted by prosit. This error actually doesnt happen to me when I take out the decoy entries. Let me know if you would like me to send an example! Thank you

@AlexandreHutton
Copy link
Member

@AlexandreHutton does the frag pipe library have decoys?

It does not. I converted some entries from another (functional) library and added them in, which resulted in the empty output mentioned previously.

@jessegmeyerlab
Copy link
Member

I wonder if the decoy is the same as the label CsoDIAq looks for.

It might help us understand if you can share the exact library you're using. You could email it to Lex and I if you want to keep the library private.

@AlexandreHutton
Copy link
Member

The library I was using has reverse sequence decoys predicted by prosit. This error actually doesnt happen to me when I take out the decoy entries. Let me know if you would like me to send an example! Thank you

Please do!

@LiaSerrano
Copy link
Author

I'll shoot over an email, thanks!

@jessegmeyerlab
Copy link
Member

jessegmeyerlab commented Aug 25, 2022

@AlexandreHutton does the frag pipe library have decoys?

It does not. I converted some entries from another (functional) library and added them in, which resulted in the empty output mentioned previously.

Thanks Lex,

It might be how it handles where the decoys are hit. If it hits a decoy within the first 100 proteins (sorted by MaCC) then I believe it should return an empty list.

I don't remember how @CCranney made it handle when it never hits a decoy but that could be another place to look. It might help you debug if you can look at the intermediate matches list (that would be in memory) for proteins and see where the decoys fall in the order.

Thanks for looking at this Lex

@CCranney
Copy link
Member

CCranney commented Nov 5, 2022

Hi all,

I dug into the code, looking specifically for the error @LiaSerrano included in her PDF in the first comment. Backtracing the error, I think no peptides were identified (the _peptideFDR.csv output is completely blank). That, or the library used lacks or has different peptide and/or protein labels, and as such the "peptide" and/or "protein" columns of the peptideFDR output file are blank. This is just me extrapolating what the error could be, but could I have access to the data/GUI settings that led to this error?

Breakdown of my thought process:
The error is found here:

File "C:\Users\lrserrano\Anaconda3\envs\csod\lib\site-packages\csodiaq\idpicker.py",
line 23, in group_nodes_with_same_edge
if first: l1, l2 = map(list,zip(*data))

It looks like it tried to break data into two lists when data was actually blank. This data variable should have been a list of length-2 tuples, pairing peptides to proteins. So going back to where data came from, it looks like it was created and passed down through the following functions:

  1. Start: File csodiaq_identification_functions.py, . The <peptideDf> variable, a dataframe that was used to create the _peptideFDR.csv file in the output.
  2. This variable was passed into the function format_peptide_protein_connections(peptideDf) on line 104.
  3. Each peptide is tied to the proteins in its protein group in a 1:1 fashion as a list of tuples. For example, if the peptide EHALLAYTLGVK was attached to the protein group 3/sp|Q5VTE0|EF1A3_HUMAN/sp|Q05639|EF1A2_HUMAN/sp|P68104|EF1A1_HUMAN, you would expect the following list of tuples to be created. All peptide-protein connections would be put into the same list.
[
('EHALLAYTLGVK', 'sp|Q5VTE0|EF1A3_HUMAN'),
('EHALLAYTLGVK', 'sp|Q05639|EF1A2_HUMAN'),
('EHALLAYTLGVK', 'sp|P68104|EF1A1_HUMAN')
]
  1. This list of tuples is ultimately what the error is occurring on, like this list is empty (no peptide-protein connections). So either the _peptideFDR.csv is completely empty, or the peptide and/or protein columns of the _peptideFDR.csv file are blank. I'm leaning towards the former, but won't know without looking at the data in question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants