Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
po2anno.pl	po2anno.pl

po2anno

po2anno.pl is a script to create an annotation comparison matrix from Proteinortho5 output.

Synopsis
Description
Usage
Options
- Mandatory options
- Optional options
Output
Run environment
Author - contact
Citation, installation, and license
Changelog

Synopsis

perl po2anno.pl -i matrix.proteinortho -d genome_fasta_dir/ -l -a > annotation_comparison.tsv

Description

Supplement an ortholog/paralog output matrix from a Proteinortho5 calculation with annotation information. The resulting tab-separated annotation comparison matrix (ACM) is mainly intended for the transfer of high quality annotations from reference genomes to homologs (orthologs and co-orthologs/paralogs) in a query genome (e.g. in conjunction with tbl2tab.pl). But of course it can also be used to have a quick glance at the annotation of genes present only in a couple of input genomes in comparison to the others.

Annotation is retrieved from multi-FASTA files created with cds_extractor.pl. See cds_extractor.pl for a description of the format. These files are used as input for the PO analysis and option -d for po2anno.pl.

Proteinortho5 (PO) has to be run with option -singles to include also genes without orthologs, so-called singletons/ORFans, for each genome in the PO matrix (see the PO manual). Additionally, option -selfblast is recommended to enhance paralog detection by PO.

Each orthologous group (OG) is listed in a row of the resulting ACM, the first column holds the OG numbers from the PO input matrix (i.e. line number minus one). The following columns specify the orthologous CDS for each input genome. For each CDS the ID, optionally the length in bp (option -l), gene, EC number(s), and product are shown depending on their presence in the CDS's annotation. The ID is in most cases the locus tag (see cds_extractor.pl). If several EC numbers exist for a single CDS they're separated by ';'. If an OG includes paralogs, i.e. co-orthologs from a single genome, these will be printed in the following row(s) without a new OG number in the first column. The order of paralogous CDSs within an OG is arbitrarily.

The OGs are sorted numerically via the query ID (see option -q). If option -a is set, the non-query OGs are appended to the output after the query OGs, sorted numerically via OG number.

Usage

`cds_extractor`

for i in *.[gbk|embl]; do perl cds_extractor.pl -i $i [-p|-n]; done

Proteinortho5

proteinortho5.pl -graph [-synteny] -cpus=# -selfblast -singles -identity=50 -cov=50 -blastParameters='-use_sw_tback [-seg no|-dust no]' *.[faa|ffn]

po2anno

perl po2anno.pl -i matrix.[proteinortho|poff] -d genome_fasta_dir/ -q query.[faa|ffn] -l -a > annotation_comparison.tsv

Options

Mandatory options

-i=str, -input=str

Proteinortho (PO) result matrix (*.proteinortho or *.poff), or piped STDIN (-)
-d=str, -dir_genome=str

Path to the directory including the genome multi-FASTA PO input files (*.faa or *.ffn), created with cds_extractor.pl

Optional options

-h, -help

Help (perldoc POD)
-q=str, -query=str

Query genome (has to be identical to the string in the PO matrix) [default = first one in alphabetical order]
-l, -length

Include length of each CDS in bp
-a, -all

Append non-query orthologous groups (OGs) to the output
-v, -version

Print version number to STDERR

Output

STDOUT

The resulting tab-delimited ACM is printed to STDOUT. Redirect or pipe into another tool as needed (e.g. cut, grep, head, or tail).

Run environment

The Perl script runs under Windows and UNIX flavors.

Author - contact

Andreas Leimbach (aleimba[at]gmx[dot]de; Microbial Genome Plasticity, Institute of Hygiene, University of Muenster)

Citation, installation, and license

For citation, installation, and license information please see the repository main README.md.

Changelog

v0.2.2 (23.10.2015)
- minor syntax changes to po2anno.pl and README
- changed option -g|-genome_dir to -d|-dir_genome for consistency with po2group_stats.pl
v0.2.1 (07.09.2015)
- get rid of underscores in product annotation strings (from cds_extractor.pl)
- debugged hard-coded relative path for $genome_file_path
v0.2 (15.01.2015)
- give number of query-specific OGs and total query singletons/ORFans in final stat output
- changed final stat output to an easier readable format
- fixed bug: %Query_ID_Seen included also non-query IDs, which luckily had no consequences
v0.1 (18.12.2014)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

po2anno

po2anno

README.md

po2anno

Synopsis

Description

Usage

`cds_extractor`

Proteinortho5

po2anno

Options

Mandatory options

Optional options

Output

Run environment

Author - contact

Citation, installation, and license

Changelog

Files

po2anno

Directory actions

More options

Directory actions

More options

Latest commit

History

po2anno

Folders and files

parent directory

README.md

po2anno

Synopsis

Description

Usage

cds_extractor

Proteinortho5

po2anno

Options

Mandatory options

Optional options

Output

Run environment

Author - contact

Citation, installation, and license

Changelog

`cds_extractor`