Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector database with ChromaDB #217

Merged
merged 2 commits into from
Jun 24, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
843 changes: 843 additions & 0 deletions examples/chromadb/Vectordb_with_chroma.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,843 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "Tce3stUlHN0L"
},
"source": [
"##### Copyright 2023 Google LLC."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "tuOe1ymfHZPu"
},
"outputs": [],
"source": [
"# @title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CsVPnR8VbXE6"
},
"source": [
"# Gemini API: Document Q&A with ChromaDB"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "awKO767lQIWh"
},
"source": [
"<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/google-gemini/cookbook/blob/main/examples/chromadb/Vectordb_with_chroma.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
" </td>\n",
"</table>\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YtwZ8DZGJfUv"
},
"source": [
"## Overview\n",
"\n",
"This tutorial demonstrates how to use the Gemini API to create a vector database and retrieve answers to questions from the database. Moreover, you will use [ChromaDB](https://docs.trychroma.com/){:.external}, an open-source Python tool that creates embedding databases. ChromaDB allows you to:\n",
"\n",
"* Store embeddings as well as their metadata\n",
"* Embed documents and queries\n",
"* Search through the database of embeddings\n",
"\n",
"In this tutorial, you'll use embeddings to retrieve an answer from a database of vectors created with ChromaDB."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "akuOzK4dJl3j"
},
"source": [
"## Setup\n",
"\n",
"First, download and install ChromaDB and the Gemini API Python library."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"id": "JbXe7Oodc5dP"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m163.1/163.1 kB\u001b[0m \u001b[31m1.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m717.3/717.3 kB\u001b[0m \u001b[31m16.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m559.5/559.5 kB\u001b[0m \u001b[31m9.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.4/2.4 MB\u001b[0m \u001b[31m46.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m92.0/92.0 kB\u001b[0m \u001b[31m9.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m62.4/62.4 kB\u001b[0m \u001b[31m6.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m41.3/41.3 kB\u001b[0m \u001b[31m2.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m6.8/6.8 MB\u001b[0m \u001b[31m75.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m59.9/59.9 kB\u001b[0m \u001b[31m6.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m107.0/107.0 kB\u001b[0m \u001b[31m11.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m67.3/67.3 kB\u001b[0m \u001b[31m6.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25h Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n",
" Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n",
" Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m283.7/283.7 kB\u001b[0m \u001b[31m23.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.7/1.7 MB\u001b[0m \u001b[31m44.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m67.6/67.6 kB\u001b[0m \u001b[31m7.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m145.0/145.0 kB\u001b[0m \u001b[31m12.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m75.6/75.6 kB\u001b[0m \u001b[31m8.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m71.9/71.9 kB\u001b[0m \u001b[31m8.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m53.6/53.6 kB\u001b[0m \u001b[31m6.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m77.9/77.9 kB\u001b[0m \u001b[31m8.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m6.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m46.0/46.0 kB\u001b[0m \u001b[31m4.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m52.5/52.5 kB\u001b[0m \u001b[31m5.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m130.5/130.5 kB\u001b[0m \u001b[31m14.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m341.4/341.4 kB\u001b[0m \u001b[31m32.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.4/3.4 MB\u001b[0m \u001b[31m83.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.2/1.2 MB\u001b[0m \u001b[31m61.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m130.2/130.2 kB\u001b[0m \u001b[31m14.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m307.7/307.7 kB\u001b[0m \u001b[31m27.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m86.8/86.8 kB\u001b[0m \u001b[31m10.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25h Building wheel for pypika (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n"
]
}
],
"source": [
"!pip install -U -q google-generativeai\n",
"!pip install -q chromadb"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jwmKt115PxK8"
},
"source": [
"Then import the modules you'll use in this tutorial."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "muuhsDmmKdHi"
},
"outputs": [],
"source": [
"import textwrap\n",
"import chromadb\n",
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"import google.generativeai as genai\n",
"\n",
"from IPython.display import Markdown\n",
"from chromadb import Documents, EmbeddingFunction, Embeddings"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "FQOGMejVu-6D"
},
"source": [
"## Configure your API key\n",
"\n",
"To run the following cell, your API key must be stored in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) for an example.\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "ysayz8skEfBW"
},
"outputs": [],
"source": [
"from google.colab import userdata\n",
"api_key = userdata.get('GOOGLE_API_KEY')\n",
"\n",
"genai.configure(api_key=api_key)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fegnGFpMS4AI"
},
"source": [
"Key Point: Next, you will choose a model. Any embedding model will work for this tutorial, but for real applications it's important to choose a specific model and stick with it. The outputs of different models are not compatible with each other.\n",
"\n",
"**Note**: At this time, the Gemini API is [only available in certain regions](https://ai.google.dev/available_regions)."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"id": "Km5d13_FS2Q_"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"models/embedding-001\n",
"models/text-embedding-004\n"
]
}
],
"source": [
"for m in genai.list_models():\n",
" if 'embedContent' in m.supported_generation_methods:\n",
" print(m.name)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3XWKXoXwOGxS"
},
"source": [
"### Data\n",
"\n",
"Here is a small set of documents you will use to create an embedding database:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"id": "k8nsbhFJKmG-"
},
"outputs": [],
"source": [
"DOCUMENT1 = \"Operating the Climate Control System Your Googlecar has a climate control system that allows you to adjust the temperature and airflow in the car. To operate the climate control system, use the buttons and knobs located on the center console. Temperature: The temperature knob controls the temperature inside the car. Turn the knob clockwise to increase the temperature or counterclockwise to decrease the temperature. Airflow: The airflow knob controls the amount of airflow inside the car. Turn the knob clockwise to increase the airflow or counterclockwise to decrease the airflow. Fan speed: The fan speed knob controls the speed of the fan. Turn the knob clockwise to increase the fan speed or counterclockwise to decrease the fan speed. Mode: The mode button allows you to select the desired mode. The available modes are: Auto: The car will automatically adjust the temperature and airflow to maintain a comfortable level. Cool: The car will blow cool air into the car. Heat: The car will blow warm air into the car. Defrost: The car will blow warm air onto the windshield to defrost it.\"\n",
"DOCUMENT2 = \"Your Googlecar has a large touchscreen display that provides access to a variety of features, including navigation, entertainment, and climate control. To use the touchscreen display, simply touch the desired icon. For example, you can touch the \\\"Navigation\\\" icon to get directions to your destination or touch the \\\"Music\\\" icon to play your favorite songs.\"\n",
"DOCUMENT3 = \"Shifting Gears Your Googlecar has an automatic transmission. To shift gears, simply move the shift lever to the desired position. Park: This position is used when you are parked. The wheels are locked and the car cannot move. Reverse: This position is used to back up. Neutral: This position is used when you are stopped at a light or in traffic. The car is not in gear and will not move unless you press the gas pedal. Drive: This position is used to drive forward. Low: This position is used for driving in snow or other slippery conditions.\"\n",
"\n",
"documents = [DOCUMENT1, DOCUMENT2, DOCUMENT3]"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "yDzxArLeOexD"
},
"source": [
"## Creating the embedding database with ChromaDB\n",
"\n",
"You will create a [custom function](https://docs.trychroma.com/embeddings#custom-embedding-functions){:.external} for performing embedding using the Gemini API. By inputting a set of documents into this custom function, you will receive vectors, or embeddings of the documents.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UoHhS32txd_r"
},
"source": [
"### API changes to Embeddings with model embedding-001\n",
"\n",
"For the new embeddings model, embedding-001, there is a new task type parameter and the optional title (only valid with task_type=`RETRIEVAL_DOCUMENT`).\n",
"\n",
"These new parameters apply only to the newest embeddings models.The task types are:\n",
"\n",
"Task Type | Description\n",
"--- | ---\n",
"RETRIEVAL_QUERY\t| Specifies the given text is a query in a search/retrieval setting.\n",
"RETRIEVAL_DOCUMENT | Specifies the given text is a document in a search/retrieval setting.\n",
"SEMANTIC_SIMILARITY\t| Specifies the given text will be used for Semantic Textual Similarity (STS).\n",
"CLASSIFICATION\t| Specifies that the embeddings will be used for classification.\n",
"CLUSTERING\t| Specifies that the embeddings will be used for clustering."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"id": "mF7Uu1kCQsT0"
},
"outputs": [],
"source": [
"class GeminiEmbeddingFunction(EmbeddingFunction):\n",
" def __call__(self, input: Documents) -> Embeddings:\n",
" model = 'models/embedding-001'\n",
" title = \"Custom query\"\n",
" return genai.embed_content(model=model,\n",
" content=input,\n",
" task_type=\"retrieval_document\",\n",
" title=title)[\"embedding\"]"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "HrDWLyopPNBf"
},
"source": [
"Now you will create the vector database. In the `create_chroma_db` function, you will instantiate a [Chroma client](https://docs.trychroma.com/getting-started){:.external}. From there, you will create a collection, which is where you store your embeddings, documents, and any metadata. Note that the embedding function from above is passed as an argument to the `create_collection`.\n",
"\n",
"Next, you use the `add` method to add the documents to the collection."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"id": "OITXgxZlLoXU"
},
"outputs": [],
"source": [
"def create_chroma_db(documents, name):\n",
" chroma_client = chromadb.Client()\n",
" db = chroma_client.create_collection(name=name, embedding_function=GeminiEmbeddingFunction())\n",
"\n",
" for i, d in enumerate(documents):\n",
" db.add(\n",
" documents=d,\n",
" ids=str(i)\n",
" )\n",
" return db"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"id": "RJ3Fq0yzL10B"
},
"outputs": [],
"source": [
"# Set up the DB\n",
"db = create_chroma_db(documents, \"googlecarsdatabase\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2QbwFgfXp-fL"
},
"source": [
"Confirm that the data was inserted by looking at the database:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"id": "kQ9PHUL_l-hf"
},
"outputs": [
{
"data": {
"application/vnd.google.colaboratory.intrinsic+json": {
"repr_error": "Out of range float values are not JSON compliant: nan",
"type": "dataframe"
},
"text/html": [
"\n",
" <div id=\"df-9a4c6e2d-430f-442b-bf47-ef703a8d151d\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ids</th>\n",
" <th>embeddings</th>\n",
" <th>metadatas</th>\n",
" <th>documents</th>\n",
" <th>uris</th>\n",
" <th>data</th>\n",
" <th>included</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>[-0.020994942635297775, -0.03876612335443497, ...</td>\n",
" <td>None</td>\n",
" <td>Operating the Climate Control System Your Goo...</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>embeddings</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>[0.017410801723599434, -0.04757162556052208, -...</td>\n",
" <td>None</td>\n",
" <td>Your Googlecar has a large touchscreen display...</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>metadatas</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2</td>\n",
" <td>[-0.03194405511021614, -0.023281503468751907, ...</td>\n",
" <td>None</td>\n",
" <td>Shifting Gears Your Googlecar has an automatic...</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>documents</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-9a4c6e2d-430f-442b-bf47-ef703a8d151d')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-9a4c6e2d-430f-442b-bf47-ef703a8d151d button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-9a4c6e2d-430f-442b-bf47-ef703a8d151d');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-a34e785c-c158-40b0-bf8f-22cc6c976bfa\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-a34e785c-c158-40b0-bf8f-22cc6c976bfa')\"\n",
" title=\"Suggest charts\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-a34e785c-c158-40b0-bf8f-22cc6c976bfa button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
"\n",
" </div>\n",
" </div>\n"
],
"text/plain": [
" ids embeddings metadatas \\\n",
"0 0 [-0.020994942635297775, -0.03876612335443497, ... None \n",
"1 1 [0.017410801723599434, -0.04757162556052208, -... None \n",
"2 2 [-0.03194405511021614, -0.023281503468751907, ... None \n",
"\n",
" documents uris data included \n",
"0 Operating the Climate Control System Your Goo... None None embeddings \n",
"1 Your Googlecar has a large touchscreen display... None None metadatas \n",
"2 Shifting Gears Your Googlecar has an automatic... None None documents "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame(db.peek(3))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Tu5zRErgsQ8u"
},
"source": [
"## Getting the relevant document\n",
"\n",
"`db` is a Chroma collection object. You can call `query` on it to perform a nearest neighbors search to find similar embeddings or documents.\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"id": "gQdJMbTSLtKE"
},
"outputs": [],
"source": [
"def get_relevant_passage(query, db):\n",
" passage = db.query(query_texts=[query], n_results=1)['documents'][0][0]\n",
" return passage"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"id": "nWYXXKJ6t6Hy"
},
"outputs": [
{
"data": {
"text/markdown": [
"Your Googlecar has a large touchscreen display that provides access to a variety of features, including navigation, entertainment, and climate control. To use the touchscreen display, simply touch the desired icon. For example, you can touch the \"Navigation\" icon to get directions to your destination or touch the \"Music\" icon to play your favorite songs."
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Perform embedding search\n",
"passage = get_relevant_passage(\"touch screen features\", db)\n",
"Markdown(passage)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "s8PNRMpOQkm5"
},
"source": [
"Now that you have found the relevant passage in your set of documents, you can use it make a prompt to pass into the Gemini API."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"id": "Qkhu4iazLy3G"
},
"outputs": [],
"source": [
"def make_prompt(query, relevant_passage):\n",
" escaped = relevant_passage.replace(\"'\", \"\").replace('\"', \"\").replace(\"\\n\", \" \")\n",
" prompt = (\"\"\"You are a helpful and informative bot that answers questions using text from the reference passage included below. \\\n",
" Be sure to respond in a complete sentence, being comprehensive, including all relevant background information. \\\n",
" However, you are talking to a non-technical audience, so be sure to break down complicated concepts and \\\n",
" strike a friendly and converstional tone. \\\n",
" If the passage is irrelevant to the answer, you may ignore it.\n",
" QUESTION: '{query}'\n",
" PASSAGE: '{relevant_passage}'\n",
"\n",
" ANSWER:\n",
" \"\"\").format(query=query, relevant_passage=escaped)\n",
"\n",
" return prompt"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "hMEjbz4EswQ6"
},
"source": [
"Pass a query to the prompt:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"id": "b6_Y-GOymaXu"
},
"outputs": [
{
"data": {
"text/markdown": [
"You are a helpful and informative bot that answers questions using text from the reference passage included below. Be sure to respond in a complete sentence, being comprehensive, including all relevant background information. However, you are talking to a non-technical audience, so be sure to break down complicated concepts and strike a friendly and converstional tone. If the passage is irrelevant to the answer, you may ignore it.\n",
" QUESTION: 'How do you use the touchscreen in the Google car?'\n",
" PASSAGE: 'Your Googlecar has a large touchscreen display that provides access to a variety of features, including navigation, entertainment, and climate control. To use the touchscreen display, simply touch the desired icon. For example, you can touch the Navigation icon to get directions to your destination or touch the Music icon to play your favorite songs.'\n",
"\n",
" ANSWER:\n",
" "
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query = \"How do you use the touchscreen in the Google car?\"\n",
"prompt = make_prompt(query, passage)\n",
"Markdown(prompt)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VRy6yXzcPxLB"
},
"source": [
"Now use the `generate_content` method to to generate a response from the model."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"id": "EwfyxFM6Giy9"
},
"outputs": [
{
"data": {
"text/markdown": [
"The Google car has a big touchscreen that you can use to control things like navigation, music, and even the temperature! It's super simple to use; you just tap the icon you want. For example, to get directions, you'd tap the navigation icon, and to play music, you'd tap the music icon. \n"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = genai.GenerativeModel('gemini-1.5-flash-latest')\n",
"answer = model.generate_content(prompt)\n",
"Markdown(answer.text)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ThTbjAJ7eGP5"
},
"source": [
"## Next steps\n",
"\n",
"To learn more about how you can use the embeddings, check out the [examples](https://ai.google.dev/examples?keywords=embed) available. To learn how to use other services in the Gemini API, visit the [Python quickstart](https://ai.google.dev/gemini-api/docs/get-started/python)."
]
}
],
"metadata": {
"colab": {
"name": "Vectordb_with_chroma.ipynb",
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}