Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions regarding complex synthetic mock data. #31

Open
kmin940 opened this issue Oct 9, 2018 · 0 comments
Open

Questions regarding complex synthetic mock data. #31

kmin940 opened this issue Oct 9, 2018 · 0 comments

Comments

@kmin940
Copy link

kmin940 commented Oct 9, 2018

Hi Chris, I want to replicate complex strain mock on my computer. I also want to run complex strain mock that only differs in the kinds of bacteria data used from NCBI data.(I want to use different data with different organisms, other than data from https://complexstrainsim.s3.climb.ac.uk/Strains.tar.gz.) I have 2 questions.

  1. On Complex Strain Mock, I downloaded NCBI strains by
    wget https://complexstrainsim.s3.climb.ac.uk/Strains.tar.gz.
    At Strains/Strain_35814/GCF_000598125.1, there were 8 files 1)genomic.cogs, 2)genomic.faa, 3)genomic.fas 4)genomic.fna, 5)genomic.gbff.gz 6)genomic.gff 7)genomic.out 8)genomic_fas_map.uc. From NCBI ftp site,
    ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/598/125/GCF_000598125.1_gbhBH006, I've downloaded genomic.* and ran prodigal and rpsblast to generate genomic.faa, genomic.fas, genomic.out files. But I don't know how to generate _fas_map.uc files. How do I obtain these files?
    I also cannot spot SCGs, Select_SCGs, Select_ster7, Cluster7_core_tau.csv, Cluster7_core_tau_map.csv, Cluster7_core_tau_mapU.csv, CountCogs.pl, Hap.txt, IdentH.txt, IdentHG.csv, SCGs.fa, SCGs.gfa, SCGs,tree, strain_map.csv, strain_map_scg.csv and temp.fa.
    How do I get these files/folders also?
    And is there a way I can get Strain_35814 file at once, not having to download each accession numbers(GCF_000317335.1, GCF_000341465.1 and so on) individually? Since I want to run DESMAN with different synthetic data with different species, it would be grateful if I can get the full codes to manipulate data to get genomic_fas_map.uc files. Can you give me some codes for getting these files?
    image
    image
    Dragged files are the ones that I am unable to create.
  2. How much memory does it take to run complex strain mock with MEGAHIT? I'm receiving some memory errors.

It would be grateful to get answers to these questions.
Thank you very much!

@kmin940 kmin940 changed the title How to download nr databases? Error occured while diamond formatting nr.faa Many questions regarding data. Bold faced questions are priorities for me. Help! Oct 9, 2018
@kmin940 kmin940 changed the title Many questions regarding data. Bold faced questions are priorities for me. Help! Questions regarding complex synthetic mock data. Oct 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant