Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: knowledge - re-use embeddings on document layer (copy embeddings from docs with same content) #444

Merged

Conversation

iwilltry42
Copy link
Contributor

@iwilltry42 iwilltry42 commented Feb 14, 2025

This PR: re-use existing embeddings on a per-document (chunk) basis if content matches anything already in the DB
Also, the embedding model used for those embeddings must match.

Follow-Up: re-use existing documents on a per-file basis by comparing file checksums

Flow

  1. User A ingests file.txt into dataset foo using text-embedding-3-large -> embeddings get generated
  2. User B ingests file.txt into dataset bar using text-embedding-3-large -> embeddings from documents in dataset foo get re-used -> no embeddings get generated
  3. User C ingests file.txt into dataset spam using text-embedding-3-small -> embeddings get generated

Issue obot-platform/obot#1799

@iwilltry42 iwilltry42 force-pushed the feat/knowledge-embeddings-cache branch from ffc4d96 to 42b4326 Compare February 18, 2025 10:15
@iwilltry42 iwilltry42 force-pushed the feat/knowledge-embeddings-cache branch from 23fb895 to 8ebb151 Compare February 18, 2025 10:21
@iwilltry42 iwilltry42 changed the title feat: re-use embeddings on document layer (copy embeddings from docs with same content) feat: knowledge - re-use embeddings on document layer (copy embeddings from docs with same content) Feb 20, 2025
@iwilltry42 iwilltry42 marked this pull request as ready for review February 20, 2025 17:02
@iwilltry42 iwilltry42 merged commit 73689ed into obot-platform:main Feb 21, 2025
2 checks passed
@iwilltry42 iwilltry42 deleted the feat/knowledge-embeddings-cache branch February 21, 2025 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants