Release MMseqs2 Release 4-0b8cc · soedinglab/MMseqs2

Changes since release 3-be8f6

New features

Alternative alignments in search (--alt-ali). Find alignments by masking out previously found regions in the target sequence.
Added map workflow for fast near-exact mapping of reads
Added easy-linclust workflow, that works on FASTA files
Sequence lengths longer than 32k are now supported (default sequence length limit is now 65535)
createdb shuffles the order of entries by default (--dont-shuffle to disable), useful for database splits, where one split could take much longer than others
linclust now supports MPI
linclust adds one hash for the whole sequence, to improve extract sequence matching
New sequence identity computation modes, where the normalization happens on the query or target length instead of alignment length
New --cov-mode that computes the coverage only based on sequence lengths (--cov-mode 3)
search/cluster/linclust workflows have learned --alignment-mode 4 for faster ungapped alignments
Translated search sorts now results by E-value and aggregates all ORFs under the corresponding contig identifier
prefiltering can now sort hits with score > 255 correctly
convertalis now works with profiles
Added generalized database transposition tool swapdb (swapresults only makes sense for prefiltering/alignment results)

covertmsa does now correctly parses STOCKHOLM files without accession keys
In search when using splits less than --max-seqs sequences would be the limit, now correctly computes the limit (max-seqs/Splits + 4*sqrt(max_seqs/Splits))
Fix bug in MsaFilter where wrong sequences would be filtered
swapresults will add an empty entry if a target entry has no corresponding query match, instead of no entry at all
createindex creates now correctly creates a tmp directory if no directory exists already
Fix query split runs for small input databases
result2stats was reading the wrong first sequence (from query instead of target database)
result2repseq now writes the correct .dbtype file
convertalis now reads the correct dbtype for the target sequence
Fix empty REG_EMPTY bug on macOS
Fix possible memory corruption when searching against database indexed by 'createindex'
Report error if -DHAVE_MPI was set and MPI is not installed on the system
Avoid race condition in kmermatcher (invalid parallel writing to vector)
Fix msa2profile header output format
msa2profile uses the FASTA readin mode by default now
Target profile databases and databases build with --exact-kmer-matching now correctly extract all k-mers
Fix identical score computation of alignment if clustering using profiles
Nucleotide backtranslation translateaa would produce invalid codons for X

Cmake flag -DHAVE_GPROF for profiling MMseqs2 using gprof
Fixed most warnings
SSTR does not use stringstreams anymore
Refactored time measuring
Debug::INFO/WARNING/ERROR is now used consistently across the codebase
If available (shellcheck)[https://github.com/koalaman/shellcheck] will critique shell scripts and fail the compilation