Comparative Analysis of RAG

NEO4J vs PINECONE

Refer to pre-requisite section for the environment set up prior to starting this experiment.

Clone this repository.

Add relevant API keys and configurations in .env file.

Technology Stack

Neo4j	Pinecone	Langchain	TogetherAI	OpenAI	Hugging Face

Overview

Purpose: Conducting a comprehensive analysis of RAG (Retriever-Augmented Generation) pipeline using Neo4J and Pinecone to test:

Performance of graph vs vector databases
Effects of splitting methods
Retrieval methods and strategies
Quality of responses through various rag metrics
Reranking and multi-query on failed responses

Significance:

Insights on capabilities and limitations of a RAG pipeline
Understanding the available technology stack
Exploring non-traditional uses of databases
Implementation of practical projects
Suitability for specific use-cases

Experiment Design:

DATA

In this project all the experiments are done on the Constitution of Pakistan pdf file. To evaluate the RAG, questions with ground truth can be found here: questions.json

Running PINECONE Rag

Select your desired experiment by defining the following options in RAG_with_Pinecone.py

retrieval_method = 'cosine' #What you defined at the time of pinecone creation

chunker = 'recursive' ##recursive, semantic, sentence, character, paragraph

embeddingtype = 'openai'  #openai, HF, langchain, spacy, empty string will invoke gpt4all

llmtype = 'gpt4' #llama2, llama3, Qwen, empty string will invoke Mixtral

embedding_dimension = 1536  ##change to 384=gpt4all embedding,

index_name = pinecone_index

This script will load the input file embeddings into pinecone index, generate responses to the questions and write it to a json file in output folder.

Running SimpleNEO4J Rag

You may run into errors while running this on your local system. Suggestion: Run this on google colab

Setup: Traditional method where the exact data indexed is the data retrieved. Uses two approaches:

Similarity search: only focuses on retrieving matches with top similarity score (cosine or mmr)
Hybrid search: search takes into account prominent keywords in addition to the similarity coefficient

Select your desired experiment by defining the following options in SimpleNeo4J_RAG.ipynb

retrieval_method = 'cosine' #euclidean, mmr, cosine  (mmr was running into an error)

chunker = 'recursive' #recursive, semantic, sentence, character, paragraph

embeddingtype = 'langchain'  #openai, HF, langchain, spacy, empty string will invoke gpt4all

llmtype = 'gpt4' #llama2, llama3, Qwen, empty string will invoke Mixtral

embedding_dimension = 3072  ##change to 384=gpt4all embedding

index_name = "vector"  # default index name

This script will load the input file embeddings into neo4j instance, generate responses to the questions and write it to the json file in output folder.

Running AdvancedNEO4J Rag

Advanced NEO4J RAG strategies have been inspired by: Implementing-Advanced-Retrieval-RAG-Strategies-With-Neo4j You may run into errors while running this on your local system. Suggestion: Run this on google colab

Setup: Four advanced strategies of retrieval were implemented to balance precision embeddings and context retention:

Parent retriever: Instead of indexing entire documents, data is divided into smaller chunks called Parent and Child documents. Child documents are indexed to better represent specific concepts, while parent documents are retrieved to ensure context retention.
Hypothetical Questions: Documents are processed to determine potential questions they might answer. These questions are then indexed for better representation of specific concepts, while parent documents are retrieved to ensure context retention.
Summaries: Instead of indexing the entire document, a summary of the document is created and indexed. Similarly, the parent document is retrieved in a RAG application.

Select your desired experiment by defining the following options in AdvancedNeo4J_RAG_with_strategies.ipynb

retrieval_method = 'cosine' #take it from LoadingDatatoNeo4j

chunker = 'semantic' #take it from LoadingDatatoNeo4j  #recursive, semantic, sentence, character, paragraph

embeddingtype = 'langchain'  #openai, HF, langchain, spacy, empty string will invoke gpt4all

llmtype = 'gpt4' #llama2, llama3, Qwen, empty string will invoke Mixtral

embedding_dimension = 1536  ##change to 384=gpt4all embedding,

RAG Evaluation

Description: Ragas provides several metrics to evaluate various aspects of your RAG systems. We are using six metrics to evaluate the RAG pipeline results:

Run Evaluation_Metrics.py to generate evaluation score for each question. Here is how the output of evaluation looks like: scores.xlsx

Pre-requisite Set up

Set up accounts on Neo4j, Pinecone, OpenAI, Hugging Face, and TogetherAI.
Add the API keys and configuration in .env file.
Create instance in neo4j.
Create indexes in pinecone. You can create upto 5 indexes in Pinecone free version. Create a new index for each dimension.

TogetherAI is used to facilate the inference from several LLMs on local machine.

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
RAG-NEO4J		RAG-NEO4J
README assets		README assets
__pycache__		__pycache__
ignore-sample-runs		ignore-sample-runs
input		input
output		output
streamlit		streamlit
utils		utils
.env		.env
.gitattributes		.gitattributes
.gitignore		.gitignore
Evaluation_Metrics.py		Evaluation_Metrics.py
LoadingData_Pinecone.py		LoadingData_Pinecone.py
RAG_with_Pinecone.py		RAG_with_Pinecone.py
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparative Analysis of RAG

NEO4J vs PINECONE

Technology Stack

Overview

Running PINECONE Rag

Running SimpleNEO4J Rag

Running AdvancedNEO4J Rag

RAG Evaluation

Pre-requisite Set up

About

Releases

Packages

Contributors 2

Languages

AyeshaAmjad0828/RAG-Neo4J-Pinecone

Folders and files

Latest commit

History

Repository files navigation

Comparative Analysis of RAG

NEO4J vs PINECONE

Technology Stack

Overview

Running PINECONE Rag

Running SimpleNEO4J Rag

Running AdvancedNEO4J Rag

RAG Evaluation

Pre-requisite Set up

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages