Skip to content
Gisle Vestergaard edited this page Nov 16, 2020 · 6 revisions

Welcome to the localDB_with_gene_synteny wiki!

Since NCBI does not include gene synteny data and context matters, I am making this repository to describe how you can build your own database which includes gene synteny information. Furthermore, I will include simple tools to identify what flanks your favorite genes. A major part is "cleaning" the NCBI databases and records. I seem to find new surprises each time I download another part of NCBI so the methods I am using may not be enough for your data, so remember to include a sanity check a each step. In this example I am creating a database of all plasmids and Archaeal genomes

Creating database

Plasmids

All plasmid sequences can be downloaded from the ftp server and unpacked like this
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/plasmid/plasmid.*.genomic.gbff.gz
gunzip *.gbff.gz
Now, we can start identifying "weird entries". The easiest is finding nonsensical genes such as 0 length etc.

Clone this wiki locally