-
Notifications
You must be signed in to change notification settings - Fork 19
iva_qc output files
The script iva_qc
requires a prefix of output files.
The following assumes that you used out
for this prefix.
The main output file is out.stats.txt
, which is a human-redable
tab-delimited
list of stats and their values (one stat per line). An explanation
of the stats can be found here.
If for some reason you prefer to use a popular spreadsheet application, then
exactly the same information is in the tsv file out.stats.tsv
and
should meet your needs.
-
out.assembly_contigs_hit_ref.fasta
: a FASTA file of just the contigs that had a nucmer hit to the reference. -
out.assembly_vs_ref.coords
: nucmer file of hits between assembly and reference. -
out.assembly_v_ref.act.sh
: a script that will start ACT, showing the assembly compared against the reference. Choose 'concatenate sequences' when ACT starts. The reference is the top sequence, BLAST hits are in the middle, and the assembly is at the bottom. -
out.assembly_v_ref.blastn
: BLASTN results of comparing the assembly and reference. Needed to run the previous ACT script.
A PDF file iva_qc.contig_placement.pdf
is made that
shows the layout of the contigs on the reference, plus
contig and read coverage information. This is useful,
but if you really want to see what is happening then
use ACT with the script out.assembly_v_ref.act.sh
.
The file is made using the R script iva_qc.contig_placement.R
and the data files iva_qc.read_coverage_on_ref.fwd
and iva_qc.read_coverage_on_ref.rev
. Here is an explanation
of the plots:
- The main panel at the top shows the nucmer hits of the contigs to the reference, one contig per row. Two (or more) hits on the same row means a contig matches two (or more) distinct places of the reference.
- Roughly, blue means everything looks OK, red means it does not. This is only approximate and really the best thing is to use ACT.
- Contigs are shaded dark blue or red if they match in the forward orientation to the reference, and light blue or red if they match in the reverse orientation.
- The contigs heatmap has three tracks: the top two are black and show presence and absence of contig coverage. The third track is red and shows where there was good read coverage, but not contig coverage, i.e. the assembler should have assembled the region but missed it.
- The reads heatmap has three tracks. The top track (black) is where read coverage is good on both strands. The middle and bottom tracks (red) show low read coverage on the forwards and reverse strands.
- The two blue line plots at the bottom show the read depth on the forwards and reverse strands.
### out.gage/
This directory has the results of running the GAGE analysis.
Most files are cleaned
after running. Some of the remaining files are:
-
run.sh
: the script used to run the GAGE analysis. -
gage.out
: stdout fromrun.sh
. IVA gets the stats from this file. -
out.report
: more detailed file made by the GAGE analysis.
This directory contains the results of running RATT. Most files are cleaned after running. The remaining files are as follows.
-
run.sh
: the script used to run RATT, so you can easily rerun if you like to make annotation files for your assembly. -
run.sh.out
: the stdout from runningrun.sh
. IVA gets its stats from this file.
-
out.reads_mapped_to_assembly.bam[.bai]
: sorted indexed BAM file of reads mapped to the assembly. -
.reads_mapped_to_ref.bam[.bai]
: sorted indexed BAM file of reads mapped to the reference.
-
out.ref_cds_seqs.fa
: FASTA file of CDS sequences found in the reference genome. -
out.ref_cds_seqs_mapped_to_assembly.coords
: nucmer coords file of CDS sequences mapped to the assembly. -
out.reference.fa
: FASTA file of reference genome.