Indexer
This documentation outlines the usage of the API endpoints for a NestJS application featuring two main controllers: Indexer Controller and Chat Controller. Each controller is responsible for handling specific operations within the application, with a total of 8 endpoints described below.
Start NestJs application with the correct access with your environment.
- Node>=21.x.x
- yarn>=0.5.x
yarn build
yarn start:dev
sequenceDiagram
participant ui as Web SDK
participant api as Index API
participant consumer as KafkaConsumer
participant compose as ComposeDB
participant indexer as Indexer API
participant gpt as OpenAI
participant chroma as ChromaDB
participant unstructured as Unstructured.IO
Note over ui, unstructured: Indexing Flow
Note left of ui: Crawl
ui->>+api: Index a document
api->>+indexer: Crawl the document
indexer->>+unstructured: Format document as text
unstructured->>+indexer: Fetch formatted document
indexer->>+api: Return content
api->>-compose: Index document to Ceramic
compose->>+consumer: Successfully indexed to Ceramic
consumer->>+api: Able to index Ceramic
api->>+ui: Succesfully initializes indexing
Note left of ui: Embedding
consumer->>+indexer: Request embeddings for indexed document
indexer->>+gpt: Get embeddings
indexer->>+consumer: Return Embeddings
consumer->>+compose: Index Embeddings
compose->>+consumer: Successfully indexed to Ceramic
Note left of ui: Index
consumer->>+indexer: Index to ChromaDb with the embeddings
indexer->>+chroma: Index docs with metadata and embeddings
chroma->>+indexer: Return Success
indexer->>+consumer: Publish Success
Note over ui, unstructured: Discovery Flow
Note left of ui: Discovery
ui->>+api: Search query
api->>indexer: Request documents
indexer->>+gpt: Request suggestions
gpt->>+indexer: Return suggestions
indexer->>+indexer: Update query with suggest
indexer->>+chroma: Request documents
chroma->>+indexer: Return docs with similarity
indexer->>+api: Get most relevant indexItemIds
api->>+compose: Get metadata for ids
compose->>+api: Return metadata
api->>+ui: Forward to Web SDK
The Indexer Controller is designed for crawling, embedding extraction, and indexing operations. Below are the details of its endpoints:
-
Method:
POST
-
Endpoint: /indexer/crawl
-
Description: Crawls the document content from a given URL using Unstructured.io API.
-
Body Parameters:
url (string)
: The URL of the document to crawl.
-
Response: Returns a key-value pair of 'content' (string) representing the textual content of the document.
-
Example Usage
- curl request:
curl -X POST http://localhost:3012/indexer/crawl \ -H "Content-Type: application/json" \ -H 'X-API-KEY: your_api_key_here' \ -d '{"url": "https://example.com"}'
- Python request:
url = 'http://localhost:3012/indexer/crawl' payload = {'url': 'https://example.com'} response = requests.post(url, json=payload) print(response.text) # { "content" : "This is a sample content...." }
-
Method:
POST
-
Endpoint: /indexer/embeddings
-
Description: Extracts embeddings for the given document using OpenAI embeddings.
-
Body Parameters:
content (string)
: The textual content of the document.
-
Response: Returns a list of floats representing the embedding vector.
-
Example Usage
- curl request
curl -X POST http://localhost:3012/indexer/embeddings \ -H "Content-Type: application/json" \ -H 'X-API-KEY: your_api_key_here' \ -d '{"content": "Document content goes here"}'
- Python request:
import requests url = 'http://localhost:3012/indexer/embeddings' payload = {'content': 'Document content goes here'} response = requests.post(url, json=payload) print(response.text) # { "embedding" : [0.0012, 0.21, ... ] }
-
Method:
POST
-
Endpoint: /indexer/index
-
Description: Adds a document to the ChromaDB database with the appropriate metadata and content.
-
Body Parameters: An object containing the following keys:
indexId (string)
: The id string of the IndexindexTitle (string)
: The title of the indexindexCreatedAt (date)
: The create timestamp of indexindexUpdatedAt (date)
: The last update timestamp of indexindexDeletedAt (date)
: The delete timestamp of indexindexControllerDID (string)
: The owner key of the indexwebPageId (string)
: The id string of the webpagewebPageTitle (string)
: The title of the web pagewebPageUrl (string)
: The url of the web pagewebPageCreatedAt (date)
: The create timestamp of web pagewebPageContent (string)
: The string of content of web pagewebPageUpdatedAt (date)
: The last update timestamp of indexwebPageDeletedAt (date)
: The delete timestamp of indexvector (number[])
: The embedding of the WebPageContent
-
Response: Returns a success or error message.
-
Example Usage
- curl request:
curl -X POST http://localhost:3012/indexer/index \ -H "Content-Type: application/json" \ -H 'X-API-KEY: your_api_key_here' \ -d '{"indexId": "1", "indexTitle": "Title", ...}'
- Python request:
import requests url = 'http://localhost:3012/indexer/index' payload = { 'indexId': '1', 'indexTitle': 'Title', # Add other fields as necessary } response = requests.post(url, json=payload) print(response.text) # 200, { "message": "Index item IndexItemID_0 succesfully upddated" }
-
Method:
PUT
-
Endpoint: /indexer/index
-
Description: Updates the given document metadata or content with the given sublist of keys and their updated values.
-
Body Parameters: An object containing a subset of keys from the document model and their new values.
-
Response: Returns a success or error message.
-
Example Usage
- curl request:
curl -X PUT http://localhost:3012/indexer/index \ -H "Content-Type: application/json" \ -d '{"indexId": "1", "indexTitle": "Updated Title"}'
- Python request:
import requests url = 'http://localhost:3012/indexer/index' payload = {'indexId': '1', 'indexTitle': 'Updated Title'} response = requests.put(url, json=payload) print(response.text) # 200, { "message": "Index item IndexItemID_0 succesfully upddated" }
-
Method:
DELETE
-
Endpoint: /indexer/index
-
Description: Deletes the given index items from the "indexId".
-
Body Parameters: An object with the key indexId.
indexId (string)
: The id string of the Index
-
Response: Returns a success or error message.
-
Example Usage
- curl request:
curl -X DELETE http://localhost:3012/indexer/index \ -H "Content-Type: application/json" \ -d '{"indexId": "1"}'
- Python request:
import requests url = 'http://localhost:3012/indexer/index' payload = {'indexId': '1'} response = requests.delete(url, json=payload) print(response.text) # 200, { "message": "Index IndexID_0 wirh IndexItemIDS [ 'IndexItemID_3', 'IndexItemID_5' ] succesfully deleted" }
-
Method:
DELETE
-
Endpoint:
/indexer/item
-
Description: Deletes the given index item from the "indexId" and "indexItemId".
-
Body Parameters: An object with the keys indexId and indexItemId.
indexId (string)
: The id string of the IndexindexItemId (string)
: The id string of the IndexItem
-
Response: Returns a success or error message.
-
Example Usage
- curl request:
curl -X DELETE http://localhost:3012/indexer/item \ -H "Content-Type: application/json" \ -d '{"indexId": "1", "indexItemId": "2"}'
- Python request:
import requests url = 'http://localhost:3012/indexer/item' payload = {'indexId': '1', 'indexItemId': '2'} response = requests.delete(url, json=payload) print(response) # 200, { "message": "Index item IndexItemID_0 succesfully deleted" }
The Chat Controller handles operations related to generating content based on a given input and querying the database.
-
Method:
POST
-
Endpoint:
/chat/stream
-
Description: For a given "question" and "chat_history", generates content for the question.
-
Body Parameters: An object containing
-
question (string)
: The string of last chat input -
chat_history (string)
: The list of input objects from both user and agent with message role and content -
indexIds (string[])
: The list of id strings to ask
-
-
Response: Returns "answer" text and "source" which are the list of "webPageId".
-
Example Usage
- curl request:
curl -X POST http://localhost:3012/chat/stream \ -H "Content-Type: application/json" \ -d '{"question": "What is AI?", "chat_history": "...", "index_id": "1", "model_type": "...", "chain_type": "..."}'
- Python request:
import requests url = 'http://localhost:3012/chat/stream' payload = { 'question': 'What is AI?', 'chat_history': '...', 'index_id': '1', 'model_type': '...', 'chain_type': '...' } response = requests.post(url, json=payload) print(response.text) # { # "content": "AI is the current trend....", # "sources": [ "webPageId_9", "webPageId_20", ... ] # }
Below, you will find detailed descriptions and usage instructions for our three main endpoints: query, search, and autocomplete.
-
Method:
POST
-
URL:
discovery/query
-
Description: Returns a list of item results for user searches within specified index(es) using a given query string. It also supports metadata filtering through ChromaDB .
-
Body Parameters: An object containing
-
query (string)
: The query string to search. -
indexIds (string[])
: Array of index IDs to search within. -
page (int)
: The page number of results to return. -
limit (int)
: The number of results per page. -
filters (Object)
: Filters to apply on the search results (ChromaFilter). -
sort (int)
: The field to sort the results by. -
desc (int)
: Boolean indicating whether the sorting should be in descending order.
-
-
Response: Returns "items" which are the list of "webPageId".
-
Example Usage
- curl request:
curl -X POST http://localhost:3012/discovery/query \ -H "Content-Type: application/json" \ -d '{ "query": "string", "indexIds": ["string"], "page": 0, "limit": 0, "filters": { "indexCreatedAt": { "$gte": "2024-02-28T11:08:59.353Z" }, "sort": "string", "desc": true }'
- Python request:
import requests url = 'http://localhost:3012/discovery/query' payload = { "query": "string", "indexIds": ["string"], "page": 0, "limit": 0, "filters": { "indexCreatedAt": { "$gte": "2024-02-28T11:08:59.353Z" } }, "sort": "string", "desc": true } response = requests.post(url, json=payload) print(response.text)
-
Method:
POST
-
URL:
discovery/{db}/search/{indexIds}
-
Description: Performs a search using embeddings in ChromaDB. This endpoint is similar to the query endpoint but focuses on embedding-based searches and will support multiple embedding models in the future.
-
Body Parameters: An object containing
-
embedding
: The embedding to search. -
model
: The embedding model, eg.text-embedding-ada-002
. -
indexIds
: Array of index IDs to search within. -
page
: The page number of results to return. -
limit
: The number of results per page. -
filters
: Filters to apply on the search results (ChromaFilter). -
sort
: The field to sort the results by. -
desc
: Boolean indicating whether the sorting should be in descending order.
-
-
Response: Returns "items" which are the list of "webPageId".
-
Example Usage
- curl request:
curl -X POST http://localhost:3012/discovery/search \ -H "Content-Type: application/json" \ -H 'X-API-KEY: your_api_key_here' \ -d '{ "embedding": [0.1, 0.2, ....], "indexIds": ["string"], "page": 0, "limit": 0, "filters": "ChromaFilter", "sort": "string", "desc": true }'
- Python request:
import requests url = 'http://localhost:3012/discovery/search' payload = { "embedding": [0.1, 0.2, ...], "indexIds": ["string"], "page": 0, "limit": 0, "filters": "ChromaFilter", "sort": "string", "desc": true } response = requests.post(url, json=payload) print(response.text) # 200, { "items": [ # { "webPageItemId": "sknfljdfd", "similarity": 0.92 }, # { "webPageItemId": "mcdafşdş", "similarity": 0.81 }, # ...... # ] }
-
Method:
POST
-
URL:
discovery/autocomplete
-
Description: Expands the given query via the openai.chat.completions endpoint to increase the specificity of the semantic content. It utilizes the content of the given index documents to provide autocomplete suggestions.
-
Body Parameters:
indexIds
: Array of index IDs to search within for autocomplete suggestions.query
: The initial query string for which to provide autocomplete suggestions.n
: The number of autocomplete suggestions to return.
-
Response: Returns "answer" text and "source" which are the list of "webPageId".
-
Example Usage
- curl request:
curl -X POST http://localhost:3012/discovery/autocomplete \ -H "Content-Type: application/json" \ -H 'X-API-KEY: your_api_key_here' \ -d '{ "indexIds": ["string"], "query": "string", "n": 0 }'
- Python request:
import requests url = 'http://localhost:3012/discovery/autocomplete' payload = { "indexIds": ["string"], "query": "string", "n": 0 } response = requests.post(url, json=payload) print(response.text) # 200, { "items": [ # { "webPageItemId": "sknfljdfd", "similarity": 0.92 }, # { "webPageItemId": "mcdafşdş", "similarity": 0.81 }, # ...... # ]