rename_fasta_id.pl
is a script to rename fasta IDs according to regular expressions.
- Synopsis
- Description
- Usage
- Options
- Output
- Run environment
- Author - contact
- Citation, installation, and license
- Changelog
perl rename_fasta_id.pl -i file.fasta -p "NODE_.+$" -r "K-12_" -n -a c > out.fasta
or
zcat file.fasta.gz | perl rename_fasta_id.pl -i - -p "coli" -r "" -o > out.fasta
This script uses the built-in Perl substitution operator s///
to
replace strings in FASTA IDs. To do this, a pattern and a
replacement have to be provided (Perl regular expression syntax
can be used). The leading '>' character for the FASTA ID will be
removed before the substitution and added again afterwards. FASTA
IDs will be searched for matches with the pattern, and if found
the pattern will be replaced by the replacement.
IMPORTANT: Enclose the pattern and the replacement in quotation marks (' or ") if they contain characters that would be interpreted by the shell (e.g. pipes '|', brackets etc.).
For substitutions without any appendices in a UNIX OS you can of
course just use the great
sed
(see
man sed
), e.g.:
sed 's/^>pattern/>replacement/' file.fasta
perl rename_fasta_id.pl -i file.fasta -p "T" -r "a" -c -g -o
- -i, -input
Input FASTA file or piped STDIN (-) from a gzipped file
- -p, -pattern
Pattern to be replaced in FASTA ID
- -r, -replacement
Replacement to replace the pattern with. To entirely remove the pattern use '' or "" as input for -r.
- -h, -help
Help (perldoc POD)
- -c, -case-insensitive
Match pattern case-insensitive
- -g, -global
Replace pattern globally in the string
- -n, -numerate
Append a numeration/the count of the pattern hits to the replacement. This is e.g. useful to number contigs consecutively in a draft genome.
- -a, -append
Append a string after the numeration, e.g. 'c' for chromosome
- -o, -output
Verbose output of the substitutions that were carried out, printed to STDERR
- -v, -version
Print version number to STDERR
- STDOUT
The FASTA file with substituted ID lines is printed to STDOUT. Redirect or pipe into another tool as needed.
The Perl script runs under Windows and UNIX flavors.
Andreas Leimbach (aleimba[at]gmx[dot]de; Microbial Genome Plasticity, Institute of Hygiene, University of Muenster)
For citation, installation, and license information please see the repository main README.md.
- v0.1 (09.11.2014)