Skip to content

Commit

Permalink
Replace null tokens and new lines in imported text
Browse files Browse the repository at this point in the history
  • Loading branch information
woodwardmw committed Jan 23, 2024
1 parent 61a4233 commit 0bd47b6
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions app/core/vectordb/postgres4langchain.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,12 @@ def add_to_collection(self, docs: List[schema.Document], **kwargs) -> None:
"""Loads the document object as per chroma DB formats into the collection"""
data_list = []
for doc in docs:
doc.text = (doc.text
.replace("\n", " ")
.replace("\r", " ")
.replace("\t", " ")
.replace('\x00', '')
)
cur = self.db_conn.cursor()
cur.execute(
"SELECT 1 FROM embeddings WHERE source_id = %s", (doc.docId,))
Expand Down

0 comments on commit 0bd47b6

Please sign in to comment.