Skip to content

Simple script to take *.fasta annotation files and compares that to a characteristic key for whether individual annotation files displayed a phenotype or not. Hypothetical proteins are removed and then the positive phenotype is compared to the negative phenotype for the difference in genome annotated genes.

Notifications You must be signed in to change notification settings

ACSoupir/Simple-Genome-Mining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Simple-Genome-Mining

Simple script to take protein *.fasta annotation files and compares that to a characteristic key for whether individual annotation files displayed a phenotype or not. Hypothetical proteins are removed and then the positive phenotype is compared to the negative phenotype for the difference in genome annotated proteins.

Annotations

The annotations for me were created using Prokka and the protein sequence fasta was saved in a folder called 'Annotations' within the project folder. A characteristic key was created within the project folder; first column is the annotation file under the heading 'Genome', then the following columns as phenotypes that have either a 1 or 0 in them to indicate whether the bacteria that the genome came from displayed that phenotype.

Other

This is obviously a really simple genome mining script that shouldn't be compared with genome wide annotation study scripts. My motivation for doing this script rather than the genome wide annotation studies was that I didn't have a great deal of genomes to work with (18 genomes) so the output files from a GWAS program (DBGWAS) was extremely large and I wasn't able to open it. This script is most likely not very specific in its discoveries and does not have any statistical methods that determine whether genes of the same name have the same sequence. I think a benefit of this script, however, over the other GWAS application is that I can easily run it for multiple phenotypes and it provides a place to begin looking in the genomes of the bacteria, and with the low number of annotations I am still able to filter down the number of genes a fair bit.

About

Simple script to take *.fasta annotation files and compares that to a characteristic key for whether individual annotation files displayed a phenotype or not. Hypothetical proteins are removed and then the positive phenotype is compared to the negative phenotype for the difference in genome annotated genes.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages