A script to merge multi-sequence RichSeq files into one single-entry 'artificial' sequence file.
- Synopsis
- Description
- Usage
- Output
- Dependencies
- Run environment
- Alternative software
- Author - contact
- Citation, installation, and license
- Changelog
perl cat_seq.pl multi-seq_file.embl
This script concatenates multiple sequences in a RichSeq file (embl or genbank, but also fasta) to a single artificial sequence. The first sequence in the file is used as a foundation to add the subsequent sequences, along with all features and annotations.
Optionally, a different output file format can be specified (fasta/embl/genbank).
perl cat_seq.pl multi-seq_file.gbk
perl cat_seq.pl multi-seq_file.embl [fasta|genbank]
for i in *.[embl|fasta|gbk]; do perl cat_seq.pl $i [embl|fasta|genbank]; done
If you're working only with fasta files UNIX's grep
is a faster choice to concatenate sequences.
grep -v ">" seq.fasta > seq_artificial.fasta
Subsequently add as a first line a fasta ID (starting with '>') with an editor.
- *_artificial.[embl|fasta|genbank]
Concatenated artificial sequence in the input format, or optionally the specified output sequence format.
- BioPerl (tested with version 1.006901)
The Perl script runs under Windows and UNIX flavors.
The EMBOSS (The European Molecular Biology Open Software Suite) application union can also be used for this task (http://emboss.sourceforge.net/apps/release/6.6/emboss/apps/union.html).
Andreas Leimbach (aleimba[at]gmx[dot]de; Microbial Genome Plasticity, Institute of Hygiene, University of Muenster)
For citation, installation, and license information please see the repository main README.md.
- v0.1 (08.02.2013)