Skip to content

Swift-based vector database for on-device RAG using MLTensor and MLX Embedders

License

Notifications You must be signed in to change notification settings

rryam/VecturaKit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

# VecturaKit

VecturaKit is a Swift-based vector database designed for on-device applications, enabling advanced user experiences through local vector storage and retrieval. Inspired by [Dripfarm's SVDB](https://github.com/Dripfarm/SVDB), **VecturaKit** leverages `MLTensor` and [`swift-embeddings`](https://github.com/jkrukowski/swift-embeddings) for generating and managing embeddings. It provides two main modules: `VecturaKit` which supports different embedding models through `swift-embeddings`, and `VecturaMLXKit` that utilizes Apple's MLX framework for accelerated processing.

## Features

-   **On-Device Storage:** Store and manage vector embeddings directly on the device for enhanced privacy and reduced latency.
-   **Hybrid Search:** Combines vector similarity with BM25 text search for more comprehensive and relevant search results (`VecturaKit`).
-   **Batch Processing:** Efficiently add multiple documents in parallel for faster indexing.
-   **Persistent Storage:** Automatically saves and loads document data between app sessions.
-   **Configurable Search:** Customize search results with adjustable thresholds, result limits, and hybrid search weights.
-   **Custom Storage Location:** Specify a custom directory for database storage to suit specific app requirements.
-   **MLX Support:** Utilizes Apple's MLX framework for accelerated embedding generation and search capabilities (`VecturaMLXKit`).
-   **CLI Tool:** Includes a command-line interface for easy database management, testing, and debugging for both `VecturaKit` and `VecturaMLXKit`.

## Supported Platforms

-   macOS 14.0 or later
-   iOS 17.0 or later
-   tvOS 17.0 or later
-   visionOS 1.0 or later
-   watchOS 10.0 or later

## Installation

### Swift Package Manager

To integrate VecturaKit into your project using Swift Package Manager, add the following dependency in your `Package.swift` file:

```swift
dependencies: [
    .package(url: "https://github.com/rryam/VecturaKit.git", branch: "main"),
],

Dependencies

VecturaKit relies on the following Swift packages:

Usage

Core VecturaKit

  1. Import VecturaKit

    import VecturaKit
  2. Create Configuration and Initialize Database

    import Foundation
    import VecturaKit
    
    let config = VecturaConfig(
        name: "my-vector-db",
        directoryURL: nil,  // Optional custom storage location
        dimension: 384,     // Matches the default BERT model dimension
        searchOptions: VecturaConfig.SearchOptions(
            defaultNumResults: 10,
            minThreshold: 0.7,
            hybridWeight: 0.5,  // Balance between vector and text search
            k1: 1.2,           // BM25 parameters
            b: 0.75
        )
    )
    
    let vectorDB = try await VecturaKit(config: config)
  3. Add Documents

    Single document:

    let text = "Sample text to be embedded"
    let documentId = try await vectorDB.addDocument(
        text: text,
        id: UUID(),  // Optional, will be generated if not provided
        model: .id("sentence-transformers/all-MiniLM-L6-v2")  // Optional, this is the default
    )

    Multiple documents in batch:

    let texts = [
        "First document text",
        "Second document text",
        "Third document text"
    ]
    let documentIds = try await vectorDB.addDocuments(
        texts: texts,
        ids: nil,  // Optional array of UUIDs
         model: .id("sentence-transformers/all-MiniLM-L6-v2") // Optional model
    )
  4. Search Documents

    Search by text (hybrid search):

    let results = try await vectorDB.search(
        query: "search query",
        numResults: 5,      // Optional
        threshold: 0.8,     // Optional
        model: .id("sentence-transformers/all-MiniLM-L6-v2")  // Optional
    )
    
    for result in results {
        print("Document ID: \(result.id)")
        print("Text: \(result.text)")
        print("Similarity Score: \(result.score)")
        print("Created At: \(result.createdAt)")
    }

    Search by vector embedding:

    let results = try await vectorDB.search(
        query: embeddingArray,  // [Float] matching config.dimension
        numResults: 5,  // Optional
        threshold: 0.8  // Optional
    )
  5. Document Management

    Update document:

    try await vectorDB.updateDocument(
        id: documentId,
        newText: "Updated text",
        model: .id("sentence-transformers/all-MiniLM-L6-v2")  // Optional
    )

    Delete documents:

    try await vectorDB.deleteDocuments(ids: [documentId1, documentId2])

    Reset database:

    try await vectorDB.reset()

VecturaMLXKit (MLX Version)

VecturaMLXKit utilizes Apple's MLX framework for accelerated processing, offering optimized performance for on-device machine learning tasks.

  1. Import VecturaMLXKit

    import VecturaMLXKit
  2. Initialize Database

    import VecturaMLXKit
    import MLXEmbedders
    
    let config = VecturaConfig(
      name: "my-mlx-vector-db",
      dimension: 768 //  nomic_text_v1_5 model outputs 768-dimensional embeddings
    )
    let vectorDB = try await VecturaMLXKit(config: config, modelConfiguration: .nomic_text_v1_5)
  3. Add Documents

        let texts = [
            "First document text",
            "Second document text",
            "Third document text"
        ]
        let documentIds = try await vectorDB.addDocuments(texts: texts)
  4. Search Documents

     let results = try await vectorDB.search(
        query: "search query",
        numResults: 5,      // Optional
        threshold: 0.8     // Optional
    )
    
    for result in results {
        print("Document ID: \(result.id)")
        print("Text: \(result.text)")
        print("Similarity Score: \(result.score)")
        print("Created At: \(result.createdAt)")
    }
  5. Document Management

    Update document:

     try await vectorDB.updateDocument(
         id: documentId,
         newText: "Updated text"
     )

    Delete documents:

    try await vectorDB.deleteDocuments(ids: [documentId1, documentId2])

    Reset database:

    try await vectorDB.reset()

Command Line Interface

VecturaKit includes a command-line interface for both the standard and MLX versions, facilitating easy database management.

Standard CLI Tool

# Add documents
vectura add "First document" "Second document" "Third document" \
  --db-name "my-vector-db" \
  --dimension 384 \
  --model-id "sentence-transformers/all-MiniLM-L6-v2"

# Search documents
vectura search "search query" \
  --db-name "my-vector-db" \
  --dimension 384 \
  --threshold 0.7 \
  --num-results 5 \
  --model-id "sentence-transformers/all-MiniLM-L6-v2"

# Update document
vectura update <document-uuid> "Updated text content" \
  --db-name "my-vector-db" \
  --dimension 384 \
  --model-id "sentence-transformers/all-MiniLM-L6-v2"

# Delete documents
vectura delete <document-uuid-1> <document-uuid-2> \
  --db-name "my-vector-db" \
  --dimension 384

# Reset database
vectura reset \
  --db-name "my-vector-db" \
  --dimension 384

# Run demo with sample data
vectura mock \
  --db-name "my-vector-db" \
  --dimension 384 \
  --threshold 0.7 \
  --num-results 10 \
  --model-id "sentence-transformers/all-MiniLM-L6-v2"

Common options:

  • --db-name, -d: Database name (default: "vectura-cli-db")
  • --dimension, -v: Vector dimension (default: 384)
  • --threshold, -t: Minimum similarity threshold (default: 0.7)
  • --num-results, -n: Number of results to return (default: 10)
  • --model-id, -m: Model ID for embeddings (default: "sentence-transformers/all-MiniLM-L6-v2")

MLX CLI Tool

# Add documents
vectura-mlx add "First document" "Second document" "Third document" --db-name "my-mlx-vector-db"

# Search documents
vectura-mlx search "search query" --db-name "my-mlx-vector-db"  --threshold 0.7 --num-results 5

# Update document
vectura-mlx update <document-uuid> "Updated text content" --db-name "my-mlx-vector-db"

# Delete documents
vectura-mlx delete <document-uuid-1> <document-uuid-2> --db-name "my-mlx-vector-db"

# Reset database
vectura-mlx reset --db-name "my-mlx-vector-db"

# Run demo with sample data
vectura-mlx mock  --db-name "my-mlx-vector-db"

Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your improvements.

License

VecturaKit is released under the MIT License. See the LICENSE file for more information.