-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathREADME
255 lines (173 loc) · 8.29 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
____ ____ _ __ __ ____ _ _____ ____
| _ \ _ __ ___ / ___|_ __ __ _ _ __ | |__ | \/ / ___| / \ _|_ _| _ \
| |_) | '__/ _ \| | _| '__/ _` | '_ \| '_ \| |\/| \___ \ / _ \ _| |_| | | |_) |
| __/| | | (_) | |_| | | | (_| | |_) | | | | | | |___) / ___ \_ _| | | _ <
|_| |_| \___/ \____|_| \__,_| .__/|_| |_|_| |_|____/_/ \_\|_| |_| |_| \_\
|_|
Fast and Robust Phylogeny-Aware Multiple Sequence Alignment
+ tandem repeat unit insertions and deletions
___ _
| _ \_ _ _ _ _ _ (_)_ _ __ _
| / || | ' \| ' \| | ' \/ _` |
|_|_\\_,_|_||_|_||_|_|_||_\__, |
|___/
___ ___ _ __ __ ___ _
| _ \_ _ ___ / __|_ _ __ _ _ __| |_ | \/ / __| /_\
| _/ '_/ _ \ (_ | '_/ _` | '_ \ ' \| |\/| \__ \/ _ \
|_| |_| \___/\___|_| \__,_| .__/_||_|_| |_|___/_/ \_\
|_|
The easiest way to run ProGraphMSA with the recommended command line parameters
is to use the wrapper script "ProGraphMSA+TR.sh" included in this package. Just run
./ProGraphMSA+TR.sh <file>
or
./ProGraphMSA+TR.sh -o <output_file> <file>
ProGraphMSA+TR will run using ML distances, the WAG model, and will output an
alignment in FASTA format. Further it will use T-REKS from the file T-Reks.jar
to detect tandem repeats. To use TRUST adjust the installation path in
trust2treks.py and run
./ProGraphMSA+TR.sh --custom_tr_cmd trust2treks.py <file>
___ _ _ _
/ __|___ _ __ _ __ __ _ _ _ __| | | (_)_ _ ___
| (__/ _ \ ' \| ' \/ _` | ' \/ _` | | | | ' \/ -_)
\___\___/_|_|_|_|_|_\__,_|_||_\__,_| |_|_|_||_\___|
_
_ __ __ _ _ _ __ _ _ __ ___| |_ ___ _ _ ___
| '_ \/ _` | '_/ _` | ' \/ -_) _/ -_) '_(_-<
| .__/\__,_|_| \__,_|_|_|_\___|\__\___|_| /__/
|_|
USAGE:
./ProGraphMSA [--ancestral_seqs] [--all_trees] [-i <iterations>] [-T] [-I]
[-M] [-m] [-a] [-C <count>] [-F] [--custom_model <file>] [-w]
[-c <file>] [-r] [--custom_tr_cmd <command>] [--trd_output
<filename>] [--read_repeats <T-Reks format output>] [-R] ...
[--repalign] [--repeat_indel_ext <probability>]
[--repeat_indel_rate <rate>] [-A] [-P <distance>] [-p
<distance>] [-D <distance>] [-d <distance>] [-x <distance>]
[-s <probability>] [-l <distance>] [-E <probability>] [-e
<probability>] [-g <rate>] [-f] [--dna] [--codon] [--topology
<newick file>] [-t <newick file>] [-o <filename>] [--]
[--version] [-h] <fasta file>
Tandem-repeat related parameters:
=================================
-R, --repeats
use T-REKS to identify tandem repeats
--custom_tr_cmd <command>
custom command for detecting tandem-repeats
--trd_output <filename>
write TR detector output to file
--read_repeats <T-REKS format output>
read TR detector output from file
--repalign
re-align detected tandem repeat units
--repeat_indel_ext <probability>
repeat indel extension probability
--repeat_indel_rate <rate>
insertion/deletion rate for repeat units (per site)
Guide tree, distances, and substitution model:
==============================================
-i <iterations>, --iterations <iterations>
number of iterations re-estimating guide tree [default: 2]
-m, --mldist
use distances estimated by a Maximum-Likelihood method
-a, --nwdist
estimate initial distance tree from Needleman-Wunsch alignments
-D <distance>, --max_dist <distance>
maximum distance for alignment
-F, --estimate_aafreqs
estimate equilibrium amino acid frequencies from input data
-w, --darwin
use model of evolution from Darwin (GONNET matrix and different indel model parameters, otherwise WAG will be used)
--custom_model <file>
custom substitution model in qmat format
-c <file>, --cs_profile <file>
path to library of context-sensitive profiles (we distribute a copy in the 3rd_party folder)
-A, --no_force_align_m
do not force alignment of initial Methionine
Parameters for adjusting the indel model:
=========================================
-l <distance>, --edge_halflife <distance>
edge half-life (evolutionary distance at which the probability of re-using an unsused graph is halved)
-E <probability>, --end_indel_prob <probability>
probability of mismatching sequence ends (set to -1 to disable this feature)
-e <probability>, --gap_ext <probability>
gap extension probability
-g <rate>, --indel_rate <rate>
insertion/deletion rate
Input/Output:
=============
-f, --fasta
output fasta format (instead of stockholm)
-t <newick file>, --tree <newick file>
initial guide tree
-o <filename>, --output <filename>
Output file name
-I, --input_order
output sequences in input order (default: tree order)
--dna
align DNA sequence
--codon
align DNA sequence based on a codon model
--ancestral_seqs
output all ancestral sequences
<fasta file>
(required) input sequences
___ _ _ _ _ __
| _ )_ _(_) |__| (_)_ _ __ _ / _|_ _ ___ _ __ ___ ___ _ _ _ _ __ ___
| _ \ || | | / _` | | ' \/ _` | | _| '_/ _ \ ' \ (_-</ _ \ || | '_/ _/ -_)
|___/\_,_|_|_\__,_|_|_||_\__, | |_| |_| \___/_|_|_| /__/\___/\_,_|_| \__\___|
|___/
For building ProGraphMSA from source you need:
CMake >=2.8 (http://www.cmake.org)
tclap >=1.1.0 (http://tclap.sourceforge.net)
Eigen 2.0.x or 3.0.x (http://eigen.tuxfamily.org)
on Debian/Ubuntu you can install these programs/libraries with:
sudo apt-get install build-essential cmake libtclap-dev libeigen3-dev
On a Mac, libraries can be installed with Homebrew:
brew install gcc cmake eigen tclap
Note that homebrew installs these to non-default locations, so cmake should be
called with the following additional options:
-D EIGEN_INCLUDE_DIR=/usr/local/include/eigen3 \
-D CMAKE_C_COMPILER=/usr/local/bin/gcc-7 \
-D CMAKE_CXX_COMPILER=/usr/local/bin/c++-7 \
ProGraphML is written with C++11. Using the default Apple compiler is not
recommended due to libc++ incompatibilities.
Then perform the following command to configure/build/install ProGraphMSA:
cd BUILD
cmake .. (press "c" to configure and "g" to generate the Makefile, see below
for additional configuration options)
make ProGraphMSA
make install
An example build script is included (`build_ProGraphMSA.sh`) to illustrate this process.
It can be modified for your particular system.
Additional CMake configuration options (in "ccmake .."):
EIGEN2_INCLUDE_DIR: set this to the path, where Eigen is installed, if you use
Eigen 2.0.x or if Eigen has been installed at a non-default
location (default location: /usr/include/eigen3)
WITH_EIGEN3: set this to ON, if you want to compile ProGraphMSA with
Eigen 3.0.x
CMAKE_CXX_FLAGS: add options for the C++ compiler, like optimization flags,
or additional search paths for include files (-I <path>)
WITH_SSE: disable this option, if you build ProGraphMSA for a machine
that does not support SSE2
_ _ _ _
| || (_)__| |_ ___ _ _ _ _
| __ | (_-< _/ _ \ '_| || |
|_||_|_/__/\__\___/_| \_, |
|__/
ProGraphMSA was written by Adam Szalkowski:
Adam M. Szalkowski and Maria Anisimova. Graph-based modeling of tandem repeats
improves global multiple sequence alignment.
Nucleic Acids Research, Volume 41, Issue 17, 1 September 2013, Page e162,
https://doi.org/10.1093/nar/gkt628
The original source was hosted at Source Forge:
https://sourceforge.net/projects/prographmsa/
A copy of this is also hosted on Adam's github:
https://github.com/butzist/ProGraphMSA
ProGraphMSA is now maintained by the Anisimova group at
https://github.com/acg-team/ProGraphMSA
___ _ _ _ _
/ __|___ _ _| |_ _ _(_) |__ _ _| |_ ___ _ _ ___
| (__/ _ \ ' \ _| '_| | '_ \ || | _/ _ \ '_(_-<
\___\___/_||_\__|_| |_|_.__/\_,_|\__\___/_| /__/
Adam Szalkowski (@butzist) - Original author
Spencer Bliven (@sbliven)