Simple script to take protein *.fasta annotation files and compares that to a characteristic key for whether individual annotation files displayed a phenotype or not. Hypothetical proteins are removed and then the positive phenotype is compared to the negative phenotype for the difference in genome annotated proteins.
The annotations for me were created using Prokka and the protein sequence fasta was saved in a folder called 'Annotations' within the project folder. A characteristic key was created within the project folder; first column is the annotation file under the heading 'Genome', then the following columns as phenotypes that have either a 1 or 0 in them to indicate whether the bacteria that the genome came from displayed that phenotype.
This is obviously a really simple genome mining script that shouldn't be compared with genome wide annotation study scripts. My motivation for doing this script rather than the genome wide annotation studies was that I didn't have a great deal of genomes to work with (18 genomes) so the output files from a GWAS program (DBGWAS) was extremely large and I wasn't able to open it. This script is most likely not very specific in its discoveries and does not have any statistical methods that determine whether genes of the same name have the same sequence. I think a benefit of this script, however, over the other GWAS application is that I can easily run it for multiple phenotypes and it provides a place to begin looking in the genomes of the bacteria, and with the low number of annotations I am still able to filter down the number of genes a fair bit.