MMseqs2 Release 4-0b8cc
martin-steinegger
released this
04 Sep 15:31
·
2285 commits
to master
since this release
Changes since release 3-be8f6
New features
- Alternative alignments in search (
--alt-ali
). Find alignments by masking out previously found regions in the target sequence. - Added
map
workflow for fast near-exact mapping of reads - Added
easy-linclust
workflow, that works on FASTA files - Sequence lengths longer than 32k are now supported (default sequence length limit is now 65535)
createdb
shuffles the order of entries by default (--dont-shuffle
to disable), useful for database splits, where one split could take much longer than otherslinclust
now supports MPIlinclust
adds one hash for the whole sequence, to improve extract sequence matching- New sequence identity computation modes, where the normalization happens on the query or target length instead of alignment length
- New
--cov-mode
that computes the coverage only based on sequence lengths (--cov-mode 3
) search
/cluster
/linclust
workflows have learned--alignment-mode 4
for faster ungapped alignments- Translated
search
sorts now results by E-value and aggregates all ORFs under the corresponding contig identifier prefiltering
can now sort hits with score > 255 correctlyconvertalis
now works with profiles- Added generalized database transposition tool
swapdb
(swapresults
only makes sense for prefiltering/alignment results)
Performance
- Speedup
extractorf
with vectorization - Many performance improvements to reduce overhead for web server mode
createtsv
writes output in parallel- Avoid many unnecessary memory allocations in various modules
Bug fixes
covertmsa
does now correctly parses STOCKHOLM files without accession keys- In
search
when using splits less than--max-seqs
sequences would be the limit, now correctly computes the limit (max-seqs/Splits + 4*sqrt(max_seqs/Splits)) - Fix bug in MsaFilter where wrong sequences would be filtered
swapresults
will add an empty entry if a target entry has no corresponding query match, instead of no entry at allcreateindex
creates now correctly creates a tmp directory if no directory exists already- Fix query split runs for small input databases
result2stats
was reading the wrong first sequence (from query instead of target database)result2repseq
now writes the correct.dbtype
fileconvertalis
now reads the correctdbtype
for the target sequence- Fix empty REG_EMPTY bug on macOS
- Fix possible memory corruption when searching against database indexed by 'createindex'
- Report error if -DHAVE_MPI was set and MPI is not installed on the system
- Avoid race condition in
kmermatcher
(invalid parallel writing to vector) - Fix
msa2profile
header output format msa2profile
uses the FASTA readin mode by default now- Target profile databases and databases build with
--exact-kmer-matching
now correctly extract all k-mers - Fix identical score computation of alignment if clustering using profiles
- Nucleotide backtranslation
translateaa
would produce invalid codons for X
Others
- removed
--early-exit
- Output name of program called
Experimental new modules
- new fast alignment method
alignbykmer
Developers
- Cmake flag
-DHAVE_GPROF
for profiling MMseqs2 using gprof - Fixed most warnings
- SSTR does not use stringstreams anymore
- Refactored time measuring
Debug::INFO/WARNING/ERROR
is now used consistently across the codebase- If available (shellcheck)[https://github.com/koalaman/shellcheck] will critique shell scripts and fail the compilation