LocalRAG

Structure

src/: Source code for the project.
- record_manager_embedding.py: Module for document processing and embedding management using record manager.
- path_embedding.py: Module for document processing and embedding management by updating json file of documents that have been embedded.
- query.py: Module for creating and managing conversational retrieval chains to fetch relevant information.
- app.py: Module for creating Streamlit chatbot interface.
examples/: Example scripts demonstrating the usage of GENAI.
- example.py: A sample script showing the output of model based on query given.
docs/: HTML documentation for the library.
embedding_models/: Open-source text embedding models from HuggingFace.
- baai/: Models from BAAI.
- sentence-transformers/: Models from Sentence Transformers.
tests/: Test suite for the GENAI library.
requirements.txt: Lists all the Python dependencies for the project.
raw/: Directory where raw source documents reside.
chroma_index/: Directory of persisted Chroma vectorstore.
archive/: Notebooks used for testing.

Clone the repository and install the dependencies:

git clone https://github.com/fwenyin/GenAI.git
cd [repository directory]
pip install -r requirements.txt

To embed new documents, add the documents to the raw folder and run the path_embedding.py or record_embedding.py file.

If vectorstore already exists, ensure that embedding function used remains constant. Else, replace the pre-existing vectorstore.
Current chroma_index is created with Huggingface Embeddings using path_embedding.py module. Archive chroma_index is created with OpenAI Embeddings.

To query the existing vectorstore, use QueryHandler in query.py. Example of use is in examples.py.

To run the Streamlit chatbot interface, run:

streamlit run src/app.py

The full documentation is available in the docs/ directory. Open the index.html file in a web browser to access it.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
docs		docs
examples		examples
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt