-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERROR estimateStrainCountDesman (3) with desmanflow #30
Comments
Hi, It could be, it should run if there is at least one sample though. I did not write the Phylosift pipeline, Aaron Darling did. However, if you could upload the files outputsel_var.csv and outputtran_df.csv I can at least see if they are valid DESMAN input. Thanks, |
Hi Chris, |
hi Xabi, Chris just pointed me to your message, sorry I didn't see it earlier. Do you know if your dataset has any variants in these marker genes? It seems like the variant file is empty, possibly because there were no high quality variant sites for it. How big is your dataset and do you have any prior idea of the species & strain complexity of the sample? It's also possible that something else has gone wrong, so if you're sure there are strains we should dig into the steps upstream of where it broke to understand why. |
Hi @koadman, the MAG I'm running has about 1.4 SNVs per kb based on anvi'o's output (0 would indicate a clonal population). I manually searched about 10 of the COGs used on the automated DESMAN workflow and I couldn't find a single SNV, probably because they are too conserved and the strains are too closely related. I guess that it will be the problem. The metagenome assembly is based on 2 samples and this MAG has an overall coverage of ~200x. Most of the SNVs correspond to ~10% (5-20% range) of the mapped bases. The fact that they don't appear on highly conserved genes and that the proportion of constant bases is quite stable makes me think that the different bases aren't due to sequencing errors. @chrisquince, is there a way to use the SNV table generated with the Cheers, |
Hi Xavi, You can use the standard SNV table from anvi'o with Desman but the output from quince mode is probably better since I believe it contains all base positions. You just need to convert the anvio table into Desman format. I do have a script for that but because the anvi'o fields can be variable it may require tweaking for your case. Can you send me say the first 1000 lines of the anvi'o output and I can test it? Best, |
Thanks Chris, here it is. Do you have the script available anywhere by any chance? |
Hi Xabi,
|
Hi Chris, This one has the header and the last 1000 lines. Thanks |
Hi, This script produces DESMAN compatible output but the frequencies look a bit odd. Also without gene calls the test for median coverage employed by DESMAN will not work. To run change extension to .py and then: python ./ConvertANVIO.py Bin_1-SNVs_tail.txt Thanks, |
Hi Chris, Just changing L35 to By the way, should I remove the SNV calls from non-coding regions? Without them I still get 722 genes with a total of >5000 SNV positions. Cheers, |
I got to the step of running
Log:
Any idea what can be going on? I removed the non-coding SNVs and run like this beforehand:
|
Hi Xabi, |
Here you have. Thank you. |
Hi @chrisquince |
Dear Xabi, Sorry for the delay responding. The problem is that you are using the wrong input file to DESMAN. Your command above should have been: desman Bin_1-SNVs_outsel_var.csv -g 5 -i 50 -e Bin_1-SNVs_outtran_df.csv -o Bin_1-SNVs_out_g5 However, DESMAN will have problems resolving haplotypes from just two samples. I ran your data set. Running: varFile='Bin_1-SNVs_outsel_var.csv' eFile='Bin_1-SNVs_outtran_df.csv' for g in 1 2 3 4 This gave two haplotypes: python $DESMAN//scripts/resolvenhap.py Bin_1-SNVs 2,2,0,0.0430988660524,Bin_1-SNVs_2_0/Filtered_Tau_star.csv ,0,1 But with no difference between your samples. So I would be cautious interpreting these results. Thanks, |
Thanks Chris, that's more or less what I expected. Even though I know the output might not be completely reliable, I'm still trying to continue so I can setup my Anvi'o to DESMAN workflow. From the
What I'm not sure is the structure the
With the columns in this order:
with strand 0 = forward and 1 = reverse (not sure about that). As my input is not based on COGs I cannot provide one so I had to add the empty column (otherwise it complains).
It says something about 1030 which is one of the genes, but it is present in all the input files (except the contigs.fasta). Do you see anything unusual in what I'm doing that needs to be fixed? Again, thanks a lot. |
Has there been any continuation of this workflow? I would love to use anvi'o SNVs and an anvi'o bin as my starting data. |
Hi Xabi, I am doing that myself at the moment. When I am finished I could commit a workflow for that case? It won't be NextFlow though possibly just a bash script or ideally a snakemake. Have you exported the Anvio SNV profile correctly? Best, |
Has anyone tried to import Desman results into Anvio for further analyses and visualization purposes instead? (don't know if it's worth to open a new thread for that) |
Hi there,
after dealing with the Phylosift issue (#28), I was able to run the test data to completion without problem. Now I'm getting an error when DESMAN tries to estimate the strain count, not sure if could be due to the limited number of samples. Any idea?
Thank you in advance,
Xabi
The text was updated successfully, but these errors were encountered: