Skip to content
@MinishLab

The Minish Lab

Solving big problems with small models

Hello, we're minish!

We're a two-person (@pringled and @stephantul) open-source company, with a focus on Natural Language Processing.

We believe that if you make models fast enough, you unlock new possibilities.

Using our software, you can:

  • Ingest the entire English Wikipedia in 5 minutes
  • Classify tens of thousands of documents per second on CPU
  • Approximately deduplicate extremely large datasets in minutes
  • Build the fastest RAG application in the world
  • Easily evaluate which ANN algorithm works best for your data

Our projects:

  • model2vec: make tiny models that are still really really good.
  • potion: the best small model in the world. 100-500x faster than a sentence-transformer, and almost as good.
  • vicinity: consistent interfaces to many approximate nearest neighbor algorithms.
  • semhash: lightning-fast, super accuracte, approximate deduplication for your text datasets.

You can also find us on: 🤗 huggingface 👽 LinkedIn

Pinned Loading

  1. model2vec model2vec Public

    The Fastest State-of-the-Art Static Embeddings in the World

    Python 753 34

  2. semhash semhash Public

    Fast Semantic Text Deduplication

    Python 441 18

  3. vicinity vicinity Public

    Lightweight Nearest Neighbors with Flexible Backends

    Python 221 6

  4. tokenlearn tokenlearn Public

    Pre-train Static Word Embeddings

    Python 38 2

Repositories

Showing 9 of 9 repositories
  • model2vec Public

    The Fastest State-of-the-Art Static Embeddings in the World

    MinishLab/model2vec’s past year of commit activity
    Python 753 MIT 34 1 1 Updated Jan 25, 2025
  • vicinity Public

    Lightweight Nearest Neighbors with Flexible Backends

    MinishLab/vicinity’s past year of commit activity
    Python 221 MIT 6 0 0 Updated Jan 24, 2025
  • semhash Public

    Fast Semantic Text Deduplication

    MinishLab/semhash’s past year of commit activity
    Python 441 MIT 18 0 1 Updated Jan 24, 2025
  • MinishLab/minishlab.github.io’s past year of commit activity
    SCSS 0 MIT 0 0 0 Updated Jan 17, 2025
  • tokenlearn Public

    Pre-train Static Word Embeddings

    MinishLab/tokenlearn’s past year of commit activity
    Python 38 MIT 2 1 1 Updated Jan 15, 2025
  • .github Public

    Readme

    MinishLab/.github’s past year of commit activity
    0 0 0 0 Updated Jan 5, 2025
  • korok Public

    Lightweight Hybrid Search and Reranking

    MinishLab/korok’s past year of commit activity
    Python 7 MIT 1 0 0 Updated Dec 26, 2024
  • watertemplate Public template

    Template

    MinishLab/watertemplate’s past year of commit activity
    Makefile 1 MIT 1 0 0 Updated Dec 9, 2024
  • evaluation Public

    Code to evaluate performance for embeddings

    MinishLab/evaluation’s past year of commit activity
    Python 9 MIT 0 0 0 Updated Sep 25, 2024

Most used topics

Loading…