29 Nov 23:43

martin-steinegger

4e23d5f

MMseqs2 Release 7-4e23d

Changes since release 6-f5a1c

New features

Simplified taxonomy. We add tools the tools to create the taxonomical annotated database createtaxdb. It is possible to filter result databaese based on taxonomy with filtertaxdb and addtaxonomy to append taxonomy information to result databases
index (createindex) support for translated target databaes searches
add nucleotide search (experimental)
support NEON CPU architecture (experimental)
improve performance of prefilter if L2 is greater 256K
easy-search automatically computes backtrace if requested by --format-output
Create search-2m workflow, similiar to 2bLCA but without the LCA computation
We add a database preload mode. Database preload mode 0: auto, 1: fread, 2: mmap, 3: mmap+touch. The processing time per query with fread is 15% faster but the read in is slower. mmap is use for the MMseqs2 webserver, it enables instance searches if the database is already in memory, mmap+touch uses mmap an touches every page.
We add a new tool touchdb, it loads the database in memory. This can be useuful for "--db-load-mode 2.
add local hard disks support --local-tmp for MPI runs. This reduces pressure from the NFS
Introduce sortresult tool to sort an unordered sequence db (e.g. from mergeresult)
prefilter supports now indexes with k-mer ranges > 2^31
convertkb can read multiple files
speed up mmap memory touch function

breaking changes

new index version. Recomputation of old indexes in needed
--format-output is now comma separated
changed taxonomy database format, old taxonomy databaes are not supported anymore

default parameter change

extractorfs default is now --orf-start-mode 1. This is important for translated searches in organisms with introns.

Bug fixes

Fix wrong alignment positions for translated searches
Fix of by one error in extratalignedregion
Fix bug in NcbiTaxonomy tool
Fix e-value threshold if -e < --e-profile

Developer

Update to newest ALP version

Assets 8

09 Oct 01:40

martin-steinegger

6-f5a1c

f5a1cdb

MMseqs2 Release 6-f5a1c

Changes since release 5-9375b

New features

Support user defined output format in convertalis.
Add parameters for gap open and gap extension costs.
Improve substitution matrix support. Letters of alphabet can now be chose freely.
Add a few PAM matrices to the data folder. Chose them with the --sub-mat parameter.
Support IUPAC codes in translated search.
Add parameter to define a spaced k-mer pattern.
Add a new module ungappedprefilter. It computes an optimal ungapped score using a vectorized algorithm.

Bug fixes

Fix easy-linclust parameter parsing issue.
Fix coverage filtering in align when the parameter --realign is set.
Fix sequence identity computation in rescorediagonal --rescore-mode 2.
Fix apply MPI support.
Fix representative sequence output bug in result2repseq.
Fix possible MPI issues in modules creating symlinks.
Fix slightly wrong E-value computed in alignall module.

Known Issues

easy-search output has only one column. Workaround: Add parameter --format-output "".

Assets 8

04 Sep 23:20

martin-steinegger

5-9375b

9375baf

MMseqs2 Release 5-9375b

Changes since release 4-0b8cc

Bug fixes

bool flag parameters (e.g. -a) work again
swapresults will deterministically rank results
shellcompletion does not report run time anymore

Assets 8

04 Sep 15:31

martin-steinegger

4-0b8cc

0b8ccee

MMseqs2 Release 4-0b8cc

Changes since release 3-be8f6

New features

Alternative alignments in search (--alt-ali). Find alignments by masking out previously found regions in the target sequence.
Added map workflow for fast near-exact mapping of reads
Added easy-linclust workflow, that works on FASTA files
Sequence lengths longer than 32k are now supported (default sequence length limit is now 65535)
createdb shuffles the order of entries by default (--dont-shuffle to disable), useful for database splits, where one split could take much longer than others
linclust now supports MPI
linclust adds one hash for the whole sequence, to improve extract sequence matching
New sequence identity computation modes, where the normalization happens on the query or target length instead of alignment length
New --cov-mode that computes the coverage only based on sequence lengths (--cov-mode 3)
search/cluster/linclust workflows have learned --alignment-mode 4 for faster ungapped alignments
Translated search sorts now results by E-value and aggregates all ORFs under the corresponding contig identifier
prefiltering can now sort hits with score > 255 correctly
convertalis now works with profiles
Added generalized database transposition tool swapdb (swapresults only makes sense for prefiltering/alignment results)

Performance

Speedup extractorf with vectorization
Many performance improvements to reduce overhead for web server mode
createtsv writes output in parallel
Avoid many unnecessary memory allocations in various modules

Bug fixes

covertmsa does now correctly parses STOCKHOLM files without accession keys
In search when using splits less than --max-seqs sequences would be the limit, now correctly computes the limit (max-seqs/Splits + 4*sqrt(max_seqs/Splits))
Fix bug in MsaFilter where wrong sequences would be filtered
swapresults will add an empty entry if a target entry has no corresponding query match, instead of no entry at all
createindex creates now correctly creates a tmp directory if no directory exists already
Fix query split runs for small input databases
result2stats was reading the wrong first sequence (from query instead of target database)
result2repseq now writes the correct .dbtype file
convertalis now reads the correct dbtype for the target sequence
Fix empty REG_EMPTY bug on macOS
Fix possible memory corruption when searching against database indexed by 'createindex'
Report error if -DHAVE_MPI was set and MPI is not installed on the system
Avoid race condition in kmermatcher (invalid parallel writing to vector)
Fix msa2profile header output format
msa2profile uses the FASTA readin mode by default now
Target profile databases and databases build with --exact-kmer-matching now correctly extract all k-mers
Fix identical score computation of alignment if clustering using profiles
Nucleotide backtranslation translateaa would produce invalid codons for X

Others

removed --early-exit
Output name of program called

Experimental new modules

new fast alignment method alignbykmer

Developers

Cmake flag -DHAVE_GPROF for profiling MMseqs2 using gprof
Fixed most warnings
SSTR does not use stringstreams anymore
Refactored time measuring
Debug::INFO/WARNING/ERROR is now used consistently across the codebase
If available (shellcheck)[https://github.com/koalaman/shellcheck] will critique shell scripts and fail the compilation

Assets 8

28 May 08:11

martin-steinegger

3-be8f6

be8f616

MMseqs2 Release 3-be8f6

Changes since 2-23394 Release

New Features

Create simple workflows fasta/fastq in flat file out for clustering easy-cluster and searching easy-search
Add a new clustering greedy incremental clustering algorithm to the clust module which needs less memory
Make the new low memory clustering algorithm default if --cov-mode 1 is used in linclust and cluster
Add alignall module for all-against-all alignments of e.g. clusters
Improved Windows support
filterdb learned new modes

Bug fixes

Fix wrong merging code in linclust
Fix e-value issues in target-split case
Fix seg. fault in rescore diagonal if 'z' is used
Fix seg. fault when using masking in kmermatcher
Fix wrong filterdb default mode
prefilter overestimated the required amount of memory and refused to run
prefilter scores would saturate to early, now they have the full 2^16 range

Others

Profile searches do create less high scoring false positive through better compositional bias correction and masking of low complexity regions of profiles
Clustering supports now the whole 2^32 range instead the previously 2^31
Speed up clustering when using --cov-mode 1
Rework symlinks to the header databaes
Support profiles on query and target side in result2profile

Assets 8

05 Mar 16:28

martin-steinegger

2-23394

2339462

MMseqs2 Release 2-23394

Changes since 1-c7a89 Release

New Features

Translated searches (blastx and tblastn like search modes)
Improvement splitting input sequences in kmermatcher (Less memory needed for linclust)
linclust supports nucleotide sequences (experimental feature, k-mer length is not yet optimized)
search supports nucleotide-nucleotide searches (preview, not stable yet)
pssm2profile module to print human readable profiles
msa2profile has a gap match mode to to convert multiple sequences alignments without representative sequence to profile databases
Compute sequence identity in a similar way to BLAST if --alignment-mode 3 is used
apply module to execute a arbitrary program on each entry of a mmseqs database. Like map from MapReduce.
extractorf can use start/stop codons from alternative translation tables
filterdb now can append entries from other databases by looking them up
proteinaln2nucl maps a protein alignment back to its original nucleotide sequences
taxonomy now can blacklist nodes (per default the unclassified and others nodes)
tmp folder is automatically created, all workflow intermediate results are placed in a subfolder based on the hash of all paths and parameters

Performance Regressions Fixed

Fixed regression when multiple mmseqs instances were running at the same time

Breaking Command Line Interface Changes

Incremented index version, old precomputed indices have to be regenerated
New Profile format, databases generated through convertprofiledb and msa2profile have to be regenerated
Clustering workflow is now by default cascaded. We replaced the --cascaded flag with --single-step-clustering
Max sequence length of 32768 is now actually validated and enforced
Each sequence database has now a dbtype file (AA=0, NUC=1, PROFILE=2)
extractorf was reworked:
* --skip-incomplete was split into two parameters --contig-start-mode and --contig-end-mode
* --longest-orf was reworked into --orf-start-mode
* removed --extend-min parameter

Others

Factor four times faster clustering workflow
Improve speed of linclust by a factor of two
Remove 'X' from prefilter index (reduces memory and improves speed at the same sensitivity)
Fix bugs for Query coverage mode (--cov-mode 2)
Clustering is now the same between single and multi threaded version
Speedup of kmermatcher
Fix bug in Clust hash. It can now cluster to 1.0 sequence identity
Improve target profile search, set max-seqs to infinite for alignments.
Improve speed of align if prefilter result fit into memory
Many usability improvements
Improved suggestions of bash completion
Expert modules are hidden by default, use -h flag to show everything
Speed up mergeclusters by a lot
Fix sequence identity print out bug if the id is less than 10%
MPI Runner variable can now correctly contain further parameters (RUNNER="mpirun -np 4" was not working)
Enforcing GCC 4.6 compatibilty in our continous integration

Devlopers

MMseqs2 can now be included in framework mode to subprojects
DBReader has a SHUFFLE mode

Assets 8

29 Oct 10:04

martin-steinegger

1-c7a89

c7a89b6

MMseqs2 Release 1-c7a89

Changes since vNatBiotech Release

New Features

Taxonomy classification workflow with robust 2bLCA computation and fast LCA computation in O(N LogN)
Support reading .bz2 archives for createdb
Createdb can turn multiple fasta files into one database now
Extend prefilter score range to improve order of best hits after prefiltering.
Automatically split input sequence set based on system RAM in kmermatcher. Linclust can now run with less memory.

Performance Regressions Fixed

Fixed underperforming iterative-sequence-profile search without a precomputed index table

Breaking Command Line Interface Changes

Iterative-non-profile-search --sens-step-size changed to --sens-steps (Number of Iterations) (Does not break nested workflows anymore)

Others

Query coverage mode (--cov-mode 2) for searching
Clustering is now the same between single and multi threaded version
Bug fixes in rescorediagonal
Speedup of kmermatcher
Speedup and memory reduction of swapresults
Many usability improvements

Devlopers

MMseqs2 can now be included in framework mode to subprojects

Assets 4

08 Aug 16:06

milot-mirdita

vNatBiotech

50ecd38

Nature Biotechnology Release

Release for Nature Biotechnology

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New features

breaking changes

default parameter change

Bug fixes

Developer

New features

Bug fixes

Known Issues

Bug fixes

New features

Performance

Bug fixes

Others

Experimental new modules

Developers

New Features

Bug fixes

Others

New Features

Performance Regressions Fixed

Breaking Command Line Interface Changes

Others

Devlopers

New Features

Performance Regressions Fixed

Breaking Command Line Interface Changes

Others

Devlopers

Releases: soedinglab/MMseqs2

MMseqs2 Release 7-4e23d

New features

breaking changes

default parameter change

Bug fixes

Developer

MMseqs2 Release 6-f5a1c

New features

Bug fixes

Known Issues

MMseqs2 Release 5-9375b

Bug fixes

MMseqs2 Release 4-0b8cc

New features

Performance

Bug fixes

Others

Experimental new modules

Developers

MMseqs2 Release 3-be8f6

New Features

Bug fixes

Others

MMseqs2 Release 2-23394

New Features

Performance Regressions Fixed

Breaking Command Line Interface Changes

Others

Devlopers

MMseqs2 Release 1-c7a89

New Features

Performance Regressions Fixed

Breaking Command Line Interface Changes

Others

Devlopers

Nature Biotechnology Release