-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeyError: 13 when running DESMAN #20
Comments
I have fixed this problem by changing line 61 of Output_Results.py to: and line 141 to: This seems like a critical bug and I think it could even possibly result in inaccurate results without error messages in some circumstances. Is it due to recent changes in Pandas? |
Sorry Alex, I am a bit confused those lines look like the original code? Regarding your original question, the file tran_df file is the starting estimate for the error matrix, yours look sensible as does your variant file. Are you still having problems? |
Hi Chris, Everything else has gone smoothly for me - curious about what you think of my results below - I'm working with 38 samples at much lower coverages (min coverage of 5x, median coverage of 15x, max of 50x across samples). The Haplotype inference script says the best fit for this is 4 strains, but I noticed this graph looks quite different from your example in the documentation. Any other metrics I can rely on to get a feel for the quality of this strain # inference? |
Oh wait! I realize I still did make changes, but just incorrectly described them above. I wanted to say that I changed line 61 to:
and line 141 to:
The cast to a list seemed important because I think the pandas object that full_position is cannot access items by index (instead they can be accessed by some kind of row name) as is being attempted here. I think this might have to do with how your |
Thanks for this, I think it may be to do with a pandas update. I will look into it and change the code accordingly. Regarding your posterior deviance, there is one run with G = 2 that is a big outlier that makes it look odd. That is not necessarily a problem, just reflects a run that got stuck in a very suboptimal solution. Since you use G = 4 it does not matter. |
OK! I guess the question that really circulates in my mind is whether or not there is convincing evidence for N=4 strains in my data, compared to a null hypothesis of 1 strain. I guess this is what the resolve haplotype script does, but I'm wondering if there's is a statistic or visuals (e.g., a PCA of the filtered variants' frequencies showing strong separation between variants labeled as haplotypes) so I can get a sense of how convincing the N strains hypothesis is. One last thing - I know that DESMAN reports the haplotype assigned to each variant for each run and the abundances of that haplotype in each sample, but I'm having trouble figuring out where this data is in the output of each run. Those are all of the questions I have - but if you'd like, we could move this discussion somewhere away from this Issue so it focuses on the possible bug above. Thanks! |
Hi Alex, These files are contained in the output directory of the selected run. The relative frequencies are in the Gamma files and the haplotypes in the tau files, I tend to use the star files which are the MAP predictions. Best Wishes, |
Hi Chris and other DESMAN devs,
I have been obtaining the error below when running DESMAN. Here's an example of how I'm running it:
desman ../outputsel_var.csv -e ../outputtran_df.csv -o ClusterEC_2_1 -r 1000 -i 100 -g 2 -s 1
And attached are my variant output files. Any idea what gives?
outputtran_df.txt
outputsel_var.txt
PS - What exactly is the outputtran_df file anyways?
The text was updated successfully, but these errors were encountered: