- Given a combination of dataset (PDFs, Web pages, Docs), build a system which can answer the question in a generative way with reference to the section of the doc/page.
This project implements a question-answering system using Langchain, Chroma, and Ollama. The application takes user queries and searches a document database to provide relevant answers along with the sources.
-
Navigate to
nlp rag
folder. -
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install dependencies:
pip install -r requirements.txt
To populate the database with documents, use the populate_database.py
script. This script will load documents from the data
directory, split them into chunks, and store them in a Chroma database.
python populate_database.py --reset # Use --reset to clear the existing database
To populate the database with web pages, use the populate_database.py
script. This script will load documents from the data
directory, split them into chunks, and store them in a Chroma database.
python populate_database.py --reset --source 'WEB' --url <url> --limit <limit>
limit is optional.
E.g : python populate_database.py --reset --source 'WEB' --url 'https://www2.gov.bc.ca/gov/content/governments/organizational-structure/ministries-organizations/ministries/children-and-family-development' --limit 100
You can query the database using the query_data.py script. Pass your query as an argument to get an answer based on the context from the documents.
python query_data.py "Your query here"
You can also interact with the question-answering system using a Streamlit web interface. Run the app.py script to start the web application.
streamlit run app.py
This script handles querying the database and returning responses based on the context.
This script provides the embedding function used to embed documents and queries.
This script handles loading documents, splitting them into chunks, and populating the Chroma database.
This script implements a Streamlit web application for querying the database interactively.
python populate_database.py --reset
python query_data.py "Your query here"
streamlit run app.py
Please send an email to [email protected] for a demo or to get know more about this prototype.