Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with SelectContigsPos.pl and ExtractCountFrep.pl #16

Open
gaohanlisa opened this issue Jun 16, 2017 · 2 comments
Open

Issue with SelectContigsPos.pl and ExtractCountFrep.pl #16

gaohanlisa opened this issue Jun 16, 2017 · 2 comments

Comments

@gaohanlisa
Copy link

Hi,
I tried to use selectContigsPos.pl to extract core COGs and use ExtractCountFrep.pl to get the variant frequency table. After I run ExtractCountFrep.pl, I got an empty variant frequency table. I them checked my position file (ClusterEC_core_cogs.tsv) generated by selectCongtigsPos. The format is contig, start, end. I assume the correct format should include cog number. But why do I miss that information?

Thanks,

@chrisquince
Copy link
Owner

Hi,

Sorry for the very slow response. Actually the format you have for ClusterEC_core_cogs.tsv is correct this just contains the contig start and end positions that is all that is needed for bam-readcount. I have added some discussion of that to the README. So the problem is with the perl script used to collate the base frequencies. I have now replaced that with a more robust python script. Have a look at the revised README but the command is:

python $DESMAN/scripts/ExtractCountFreqGenes.py AnnotateEC/ClusterEC_core.cogs Counts --output_file Cluster_esc3_scgs.freq

Does this fix your problem?

Thanks,
Chris

@KevinAMeyer
Copy link

Hello,

I've also been having issues with SelectContigsPos.pl in my workflow. Even when using StrainMetaSim Mock dataset. The Cluster_core.cogs file does not include any of the gene locations or strand information. As a result, I don't get any basecount files in the steps that follow.

Command:
while read -r cluster
do
echo $cluster
../SelectContigsPos.pl /usr/share/maganalysis/cogs.txt < Split/${cluster}/${cluster}.cog > Split/${cluster}/${cluster}_core.cogs
done < Concoct/Cluster75.txt

Input .cog file sample
k141_30204_2,COG1555
k141_34559.0_2,COG0210
k141_34559.0_3,COG1391
k141_34559.0_5,COG4304
k141_34559.0_8,COG5266

Output file sample
COG0016,k141_525462.5,,,k141_525462.5_5,
COG0048,k141_441551,,,k141_441551_3,
COG0049,k141_441551,,,k141_441551_4,
COG0051,k141_292107.0,,,k141_292107.0_2,
COG0052,k141_39482.10,,,k141_39482.10_14,
COG0060,k141_99476.3,,,k141_99476.3_4,

My contigs.tsv file looks like this:
k141_4315 1 2705
k141_9378 1 2043
k141_20287 1 5049
k141_30204 1 5940
k141_34559.0 1 10000
k141_34559.1 1 10000
k141_34559.2 1 14020

Any suggestions on what may be happening, or how to include my gene locations in the core.cogs file?

Thanks for your help.
Kevin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants