Skip to content

4 Making alignments ( trimming and gap removal)

Vasco Elbrecht edited this page Sep 28, 2016 · 2 revisions

Youtube video tutorial: Making alignments with Geneious

PrimerMiner has batch downloaded and processed sequence data, now its time to take the OTU consensus sequences and make an alignment. This is required for visualisation and primer evaluation with PrimerMiner as R does not hold powerful DNA alignment tools and manual editing of data is often required. The visualizations generated with PrimerMiner can be used to manually find the best primers (we don't believe that this can be solved well with software, you know exactly what you want and software is to inflexible). You can also use the consensus sequences for primer design with other programs like primer3 etc. but we recommend a manual approach.

There are different possibilities / and software to generate alignments. We recommend using Geneious with a map to reference approach to a mitochondrial reference consensus sequence generated with the downloaded mitochondrial genomes.

Mapping reads to reference with Geneious

From each Group folder, import the OTU consensus files (Group_name_all_cons_cluster_Majority.fasta) and mitochondrial reads (Group_name_mito.fasta) into Geneious. We recommend using the MAFFT plugin to make an alignment of the mitochondrial COI sequences and extract the consensus sequence from that alignment with a 25% threshold.

The reads can then be mapped against the mitochondrial reads using Map to reference with High Sensitivity / Medium and 0-3 fine tuning. You might have to try out a few settings for your dataset, to get an alignment as accurate as possible. Identify your region of interest by searching for the primer binds (Fror example LCO / HCO) and extract the complete region (without the consensus sequence), with including XX bp extra depending on the primer length. For example, if you find the LCO primer binding site, add 25 bp on the left, because those will be clipped in the next steps (this is necessary because many sequences from data base still contain partial primer sequences).

You can save the alignment as a fasta file with File -> Export -> Selected Documents... -> FASTA (.*fasta). Make sure to export missing ends as gaps (-) and do not wrap sequences!

Remove gaps from the sequence & add selective trimming

Next we have to remove gaps from the alignments, as well as apply selective clipping for the primer binding regions using selectivetrim(). We added e.g. 25 bp extra for the LCO primer, and now apply 25 bp clipping left of the LCO primer bind region. The selective trim dies how ever not trim past the actual bind of the LCO primer, because these sequences are likely accurate. This similar to the clipping applied in the batch download process, you can use clipping on all sequences or apply clipping using selectivetrim(). We now recommend using selective trim.

selectivetrim(read, write, trimL=25, trimR=26, gaps=0.10, minsequL=100)

  • read Input fasta file (can contain gaps in alignment and untrimmed sequences).
  • write Save processed alignment as fasta file
  • trimL=25 and trimR=26 apply selective trimming on the lest and right of the primer binding regions. Make sure your alignment has extra bases left or right of the primer binding regions, as other wise the actual primer binding site will be affected by clipping. Selective trimming only goes up to the 3' end of the primer region, to not clip away information of the actual primer binding region.
  • gaps=0.10 can remove gaps from the alignment. Typically just a few sequences have false base pairs causing gaps in the alignment, which are below 10% abundance and thus removed.
  • minsequL=100 Sequences of shorter than 100 bp are removed after applying trimming.

The easies way of applying trimming and gap removal to your sequences is to run this function in a loop (see example data). The cleaned up alignments can next be visualized.