Skip to content

Commit

Permalink
version 0.0.4
Browse files Browse the repository at this point in the history
  • Loading branch information
Sebastian Mackowiak authored and mschilli87 committed Apr 13, 2017
1 parent 1af1f0f commit 7076e89
Show file tree
Hide file tree
Showing 14 changed files with 2,055 additions and 1,093 deletions.
136 changes: 17 additions & 119 deletions README
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Authors : Marc Friedlaender and Sebastian Mackowiak.
Date: 11/01/2011
Date: 12/08/2009

This is miRDeep2 developed by Marc Friedlaender and Sebastian Mackowiak.
miRDeep2 discovers active known or novel miRNAs from deep sequencing data (Solexa/Illumina, 454, ...).
Expand All @@ -23,7 +23,7 @@ Installation:
perl install.pl


2. without the install.pl script follow the instructions given in Sample Installation
2. without the install mirdeep script follow the instructions given in Sample Installation



Expand Down Expand Up @@ -116,6 +116,7 @@ Script Reference:
miRDeep2 analyses can be performed using the three scripts miRDeep2.pl, mapper.pl and quantifier.pl.



name:
miRDeep2.pl

Expand All @@ -129,7 +130,7 @@ arf format, an optional fasta file with known miRNAs of the analysing species an
species

output:
A spreadsheet and a html file with an overview of all detected miRNAs in the deep sequencing input data.
A spreadsheet and an html file with an overview of all detected miRNAs in the deep sequencing input data.


options:
Expand Down Expand Up @@ -170,21 +171,10 @@ results generated (result.html), a copy of the novel and known miRNAs contained
but in text format which allows easy parsing (result.csv), a copy of the performance survey
contained in the webpage but in text format (survey.csv) and a copy of the miRNA read signatures
contained in the pdfs but in text format (output.mrd).
The ids in files miRBase_mmu_v14.fa and precursors_ref_this_species.fa need to be similar to each other.
This is usually no problem if you downloaded both files from miRBase.
Otherwise it can happen that the quantifier fails to produce results.


Example use 2:

As in example use 1, except that the user has already run quantifier.pl and wants to use this
output to get information on the miRNAs not detected by miRDeep2 included in the html webpage.
miRBase.mrd is a file generated by quantifier.pl:

miRDeep2.pl reads_collapsed.fa genome.fa reads_collapsed_vs_genome.arf miRBase_mmu_v14.fa miRBase_rno_v14.fa -t Mouse -q miRBase.mrd 2>report.log

This command will generate the same type of files as example use 1 above.

Example use 2:

The user wishes to identify miRNAs in deep sequencing data from an animal with no related species
in miRBase:
Expand Down Expand Up @@ -218,13 +208,8 @@ Read input file:
-a input file is seq.txt format
-b input file is qseq.txt format
-c input file is fasta format
-e input file is fastq format
-d input file is a config file (see miRDeep2 documentation).
options -a, -b or -c must be given with option -d.


Preprocessing/mapping:
-g three-letter prefix for reads (by default 'seq')
-h parse to fasta format
-i convert rna to dna alphabet (to map against genome)
-j remove all entries that have a sequence that contains letters other than
Expand Down Expand Up @@ -295,52 +280,22 @@ mapper.pl reads.fa -c -h -i -j -k TCGTATGCCGTCTTCTGCTTGT -l 18 -m -s reads_colla



Example use 5: (experimental)
Example use 5:

The user has already removed 3' adapters in color space and has mapped the reads against the genome
using bwa/bowtie resulting in a sam file.
Note that each genome locus to which a read was aligned has to occur
in its own line. Otherwise only the first genome locus of each line will be taken!
The mapping output file is named mapped.sam.
The user wishes to generate the files 'reads_collapsed.fa'
and 'reads_collapsed_vs_genome.arf' as input to miRDeep2:

perl sam_reads_collapse.pl mapped.sam reads_collapsed.fa
perl bwa_sam_converter.pl -i mapped.sam -t read_1_to_1.txt -o reads_collapsed_vs_genome.arf


If read ids are already collapsed and in correct miRDeep2 format (eg. ">ABC_1_x10", see File Formats at the bottom or consult the online documentation)
then the sam file just needs to be converted:


perl bwa_sam_converter.pl -i mapped.sam -o reads_collapsed_vs_genome.arf


Example use 6:
using the BWA tool. The BWA output file is named reads_vs_genome.sam. Notice that the BWA output
contains extra fields that are not required for SAM format. Our converter requires these fields and
thus may not work with all types of SAM files. The user wishes to generate 'reads_collapsed.fa'
and 'reads_vs_genome.arf' to input to miRDeep2:

The user has sequencing data from different samples e.g. different cell-types. A config.txt file has to be created in which each line
designates file locations and a unique 3 letter code.
For instance:
sequencing_data_sample1.fa sd1
sequencing_data_sample2.fa sd2
sequencing_data_sample3.fa sd3
.
.
.
bwa_sam_converter.pl reads_vs_genome.sam reads.fa reads_vs_genome.arf

The use wishes then to pool these files and use the generated files reads.fa and reads_vs_genome.fa for the miRDeep2 analysis.


mapper.pl config.txt -d -c -i -j -l 18 -m -p genome_index -s reads.fa -t reads_vs_genome.arf

Since the reads_vs_genome.arf still contains the 3 letter code for each read mapped to genome the user can then later on
dilute the contribution of the different samples to a predicted or known miRNA.
It can also be used for example to define 'high confident' predictions if the results are filtered for miRNAs that have sequencing
evidence from at least two samples.
mapper.pl reads.fa -c -i -j -l 18 -m -s reads_collapsed.fa

###############################################################################################################################



name:
quantifier.pl

Expand All @@ -360,46 +315,18 @@ A 2 column table file called miRNA_expressed.csv with miRNA identifiers and its
miRNAs having 0 read counts, a signature file called miRBase.mrd, a file called expression.html that gives an overview of all miRNAs the input data
and a directory called pdfs that contains for each miRNA a pdf file showing its signature and structure.

[options]

[mandatory parameters]
-u list all values allowed for the species parameter that have an entry at UCSC

-p precursor.fa miRNA precursor sequences from miRBase
-m mature.fa miRNA sequences from miRBase
-r reads.fa your read sequences

[optional parameters]
-c [file] config.txt file with different sample ids... or just the one sample id
-s [star.fa] optional star sequences from miRBase
-t [species] e.g. Mouse or mmu
if not searching in a specific species all species in your files will be analyzed
else only the species in your dataset is considered
-y [time] optional otherwise its generating a new one
-d if parameter given pdfs will not be generated, otherwise pdfs will be generated
-o if parameter is given reads were not sorted by sample in pdf file, default is sorting
-k also considers precursor-mature mappings that have different ids, eg let7c
would be allowed to map to pre-let7a
-n do not do file conversion again
-x do not do mapping against precursor again
-g [int] number of allowed mismatches when mapping reads to precursors, default 1
-e [int] number of nucleotides upstream of the mature sequence to consider, default 2
-f [int] number of nucleotides downstream of the mature sequence to consider, default 5
-j do not create an output.mrd file and pdfs if specified

-w considers the whole precursor as the 'mature sequence'


options:
-t list all values allowed for the species parameter that have an entry at UCSC

example usage:
Assume we want to quantify C.elegans miRNAs then we would run the command
quantifier.pl -p precursors.fa -m mature.fa -r reads.fa -s star.fa -y now -t cel
quantifier.pl precursors.fa mature.fa reads.fa star.fa/none species/none timestamp/none pdf



#####################################################################################################################################



name:
make_html.pl

Expand Down Expand Up @@ -1136,32 +1063,3 @@ options:

notes:
-


##########################
File Formats
.fa
The fasta files that contain sequencing reads used by miRDeep2 are ordinary fasta files with a predefined identifier format. It comprises three values separated by underscore. The first value is a three letter code which is intended to be a tag for the sample a read is coming from. The second value is a running number that is used to make sure that identifiers are uniquely assigned to sequences from the same sample. The third value starts with and 'x' followed by an integer number that indicates the occurrence of a read sequence in a sample. The sequence in a fasta file that is supplied to miRDeep2 is not allowed to contain characters others than A, C, G, T and N. If the id line or the sequence line do not follow these conventions miRDeep2 will abort with a warning message. Example entry from a fasta file that can be supplied to miRDeep2

>PAN_123456_x969696
ATACAATCTACTGTCTTTCCT

.arf
The arf format is a proprietary file format generated and processed by miRDeep2. It contains information of reads mapped to a reference genome. Each line in such a file contains 13 columns. Example line:

#1 2 3 4 5 6 7 8 9 10 11 12 13
PAN_123456_x969696 21 1 21 ATACAATCTACTGTCTTTCCT chr22 21 46508682 46508702 ATACAATCTACTGTCTTTCCT + 1 mmmmmmmmmmmmmmmmmmmmm

1 read identifier
2 length of read sequence
3 start position in read sequence that is mapped
4 end position in read sequence that is mapped
5 read sequence
6 identifier of the genome-part to which a read is mapped to. This is either a scaffold id or a chromosome name
7 length of the genome sequence a read is mapped to
8 start position in the genome where a read is mapped to
9 end position in the genome where a read is mapped to
10 genome sequence to which a read is mapped
11 genome strand information. Plus means the read is aligned to the sense-strand of the genome. Minus means it is aligned to the antisense-strand of the genome.
12 Number of mismatches in the read mapping
13 Edit string that indicates matches by lowercase 'm' and mismatches by uppercase 'M'
Loading

0 comments on commit 7076e89

Please sign in to comment.