Note: Before proceeding with the steps below, ensure you follow the instructions outlined in the README.md file.
Step 1: Configure OpenAI key in data_indexing/.env
environment file as follows.
openai_key=key
Step 2: Install Python 3 (Python 3.11+
)
Step 3: Create virtual environment and activate
$ pip install virtualenv
$ virtualenv genai-env
$ source genai-env/bin/activate
Now depending on the source (Local
, Azure
or AWS
) follow any one of the below processes.
Follow below steps to index local files.
Step 1: Install Dependencies
$ cd data_indexing
$ pip install -r requirements.txt
Step 2: Setup files
Put files to be indexed in data_indexing/data
directory.
Step 3: Run Script
# python3 data_indexing.py <source> <local_directory_name> <index_name>
$ python3 data_indexing.py local data test_index_1
Follow below steps to index files stored in Azure blob containers.
Step 1: Install Dependencies
$ cd data_indexing
$ pip install -r requirements.txt
$ pip install azure-storage-blob
Step 2: Configure Azure credentials
- Login to Azure Portal
- Go to Storage accounts section and select appropriate storage account.
- In the left panel under
Security + networking
selectAccess keys
. - Configure
Connection string
indata_indexing/.env
as follows.
az_connection_str="connection_string"
Step 3: Run Script
# python3 data_indexing.py <source> <container_name> <index_name>
$ python3 data_indexing.py azure rag-index test_index_1
Follow below steps to index files stored in AWS S3 bucket.
Step 1: Install Dependencies
$ cd data_indexing
$ pip install -r requirements.txt
$ pip install boto3
Step 2: Configure AWS credentials
- Login to AWS Console
- In the upper right corner of the console, choose your account name or number.
- Choose
Security Credentials
. - In the
Access keys
section, chooseCreate access key
. - Configure
Access key
andSecret access key
indata_indexing/.env
as follows.
aws_access_key_id=access_key_id
aws_secret_access_key=secret_access_key
Step 3: Run Script
# python3 data_indexing.py <source> <bucket_name> <index_name>
$ python3 data_indexing.py aws rag-index22 test_index_1