MMseqs2 Release 2-23394
martin-steinegger
released this
05 Mar 16:28
·
2642 commits
to master
since this release
Changes since 1-c7a89 Release
New Features
- Translated searches (blastx and tblastn like search modes)
- Improvement splitting input sequences in
kmermatcher
(Less memory needed forlinclust
) linclust
supports nucleotide sequences (experimental feature, k-mer length is not yet optimized)search
supports nucleotide-nucleotide searches (preview, not stable yet)pssm2profile
module to print human readable profilesmsa2profile
has a gap match mode to to convert multiple sequences alignments without representative sequence to profile databases- Compute sequence identity in a similar way to BLAST if
--alignment-mode 3
is used apply
module to execute a arbitrary program on each entry of a mmseqs database. Like map from MapReduce.extractorf
can use start/stop codons from alternative translation tablesfilterdb
now can append entries from other databases by looking them upproteinaln2nucl
maps a protein alignment back to its original nucleotide sequencestaxonomy
now can blacklist nodes (per default the unclassified and others nodes)- tmp folder is automatically created, all workflow intermediate results are placed in a subfolder based on the hash of all paths and parameters
Performance Regressions Fixed
- Fixed regression when multiple mmseqs instances were running at the same time
Breaking Command Line Interface Changes
- Incremented index version, old precomputed indices have to be regenerated
- New Profile format, databases generated through
convertprofiledb
andmsa2profile
have to be regenerated - Clustering workflow is now by default cascaded. We replaced the
--cascaded
flag with--single-step-clustering
- Max sequence length of 32768 is now actually validated and enforced
- Each sequence database has now a dbtype file (AA=0, NUC=1, PROFILE=2)
- extractorf was reworked:
*--skip-incomplete
was split into two parameters--contig-start-mode
and--contig-end-mode
*--longest-orf
was reworked into--orf-start-mode
* removed--extend-min
parameter
Others
- Factor four times faster clustering workflow
- Improve speed of
linclust
by a factor of two - Remove 'X' from prefilter index (reduces memory and improves speed at the same sensitivity)
- Fix bugs for Query coverage mode (
--cov-mode 2
) - Clustering is now the same between single and multi threaded version
- Speedup of kmermatcher
- Fix bug in Clust hash. It can now cluster to 1.0 sequence identity
- Improve target profile search, set max-seqs to infinite for alignments.
- Improve speed of
align
if prefilter result fit into memory - Many usability improvements
- Improved suggestions of bash completion
- Expert modules are hidden by default, use
-h
flag to show everything - Speed up
mergeclusters
by a lot - Fix sequence identity print out bug if the id is less than 10%
- MPI Runner variable can now correctly contain further parameters (RUNNER="mpirun -np 4" was not working)
- Enforcing GCC 4.6 compatibilty in our continous integration
Devlopers
- MMseqs2 can now be included in framework mode to subprojects
- DBReader has a SHUFFLE mode