Merge branch 'main' into thinking

google-gemini · Dec 19, 2024 · 2c813a4 · 2c813a4
2 parents a3d177a + 1005832
commit 2c813a4
Show file tree

Hide file tree

Showing 9 changed files with 755 additions and 39 deletions.
diff --git a/examples/Analyze_a_Video_Historic_Event_Recognition.ipynb b/examples/Analyze_a_Video_Historic_Event_Recognition.ipynb
@@ -268,7 +268,7 @@
       "source": [
         "model = genai.GenerativeModel(model_name=\"models/gemini-1.5-flash\", safety_settings=safety_settings,\n",
         "                              system_instruction=system_prompt)\n",
-        "response = model.generate_content([video_file])\n",
+        "response = model.generate_content([\"Analyze that video please\",video_file])\n",
         "print(response.text)"
       ]
     },

diff --git a/examples/Analyze_a_Video_Summarization.ipynb b/examples/Analyze_a_Video_Summarization.ipynb
@@ -269,7 +269,7 @@
       ],
       "source": [
         "model = genai.GenerativeModel(model_name=\"models/gemini-1.5-flash\", system_instruction=system_prompt)\n",
-        "response = model.generate_content([video_file])\n",
+        "response = model.generate_content([\"Summarise that video please.\",video_file])\n",
         "print(response.text)"
       ]
     },

diff --git a/examples/README.md b/examples/README.md
@@ -4,36 +4,36 @@
 
 This is a collection of fun examples for the Gemini API. 
 
-* [Agents and Automatic Function Calling](https://github.com/google-gemini/cookbook/blob/main/examples/Agents_Function_Calling_Barista_Bot.ipynb): Create an agent (Barrista-bot) to take your coffee order. 
-* [Classify and Analyze a Video](https://github.com/google-gemini/cookbook/blob/main/examples/Analyze_a_Video_Classification.ipynb): This notebook uses multimodal capabilities of the Gemini model to classify the species of animals shown in a video.
-* [Anomaly Detection](https://github.com/google-gemini/cookbook/blob/main/examples/Anomaly_detection_with_embeddings.ipynb): Use embeddings to detect anomalies in your datasets.
-* [Analyze a Video with Summarization](https://github.com/google-gemini/cookbook/blob/main/examples/Analyze_a_Video_Summarization.ipynb): This notebook shows how you can use Gemini API's multimodal capabilities for video summarization.
-* [Apollo 11 - long context example](https://github.com/google-gemini/cookbook/blob/main/examples/Apollo_11.ipynb): Search a 400 page transcript from Apollo 11.
-* [Clasify text with emeddings](https://github.com/google-gemini/cookbook/blob/main/examples/Classify_text_with_embeddings.ipynb): Use embeddings from the Gemini API with Keras.
-* [Guess the shape](https://github.com/google-gemini/cookbook/blob/main/examples/Guess_the_shape.ipynb): A simple example of using images in prompts.
-* [Market a Jet Backpack](https://github.com/google-gemini/cookbook/blob/main/examples/Market_a_Jet_Backpack.ipynb): Create a marketing campaign from a product sketch.
-* [Object detection](https://github.com/google-gemini/cookbook/blob/main/examples/Object_detection.ipynb): Extensive examples with object detection, including with multiple classes, OCR, visual question answering, and even an interactive demo.  
-* [Opossum search](https://github.com/google-gemini/cookbook/blob/main/examples/Opossum_search.ipynb): Code generation with the Gemini API. Just for fun, you'll prompt the model to create a web app called "Opossum Search" that searches Google with "opossum" appended to the query.
-* [Search Wikipedia with ReAct](https://github.com/google-gemini/cookbook/blob/main/examples/Search_Wikipedia_using_ReAct.ipynb): Use ReAct prompting with Gemini 1.5 Flash to search Wikipedia interactively.
-* [Search Re-ranking with Embeddings](https://github.com/google-gemini/cookbook/blob/main/examples/Search_reranking_using_embeddings.ipynb): Use embeddings to re-rank search results.
-* [Story writing with prompt chaining.ipynb](https://github.com/google-gemini/cookbook/blob/main/examples/Story_Writing_with_Prompt_Chaining.ipynb): Write a story using two powerful tools: prompt chaining and iterative generation.
-* [Talk to documents](https://github.com/google-gemini/cookbook/blob/main/examples/Talk_to_documents_with_embeddings.ipynb): This is a basic intro to Retrieval Augmented Generation (RAG). Use embeddings to search through a custom database.
-* [Upload files to Colab](https://github.com/google-gemini/cookbook/blob/main/examples/Upload_files_to_Colab.ipynb): This is a helper notebook that shows how to upload files from your local computer to Colab. Note: to upload files to the Gemini API (text, code, images, audio, video), check out the [Files quickstart](https://github.com/google-gemini/cookbook/blob/main/quickstarts/File_API.ipynb).
-* [Voice Memos](https://github.com/google-gemini/cookbook/blob/main/examples/Voice_memos.ipynb): You'll use the Gemini API to help you generate ideas for your next blog post, based on voice memos you recorded on your phone, and previous articles you've written.
-* [Translate a public domain](https://github.com/google-gemini/cookbook/blob/main/examples/Translate_a_Public_Domain_Book.ipynb): In this notebook, you will explore Gemini model as a translation tool, demonstrating how to prepare data, create effective prompts, and save results into a `.txt` file.
-* [Working with Charts, Graphs, and Slide Decks](https://github.com/google-gemini/cookbook/blob/main/examples/Working_with_Charts_Graphs_and_Slide_Decks.ipynb): Gemini models are powerful multimodal LLMs that can process both text and image inputs. This notebook shows how Gemini 1.5 Flash model is capable of extracting data from various images.
-* [Entity extraction](https://github.com/google-gemini/cookbook/blob/main/examples/Entity_Extraction.ipynb): Use Gemini API to speed up some of your tasks, such as searching through text to extract needed information. Entity extraction with a Gemini model is a simple query, and you can ask it to retrieve its answer in the form that you prefer.
-* [Generate a company research report using search grounding](https://github.com/google-gemini/cookbook/blob/main/examples/search_grounding_for_research_report.ipynb): Use search grounding to write a company research report with Gemini 1.5 Flash.
+* [Agents and Automatic Function Calling](./Agents_Function_Calling_Barista_Bot.ipynb): Create an agent (Barrista-bot) to take your coffee order. 
+* Video Analysis: Three notebooks using multimodal capabilities of the Gemini model to [classify the species of animals](./Analyze_a_Video_Classification.ipynb) for a video, [summarize one](./Analyze_a_Video_Summarization.ipynb) or [recognizing when it happened](./Analyze_a_Video_Historic_Event_Recognition.ipynb), 
+* [Anomaly Detection](./Anomaly_detection_with_embeddings.ipynb): Use embeddings to detect anomalies in your datasets.
+* [Analyze a Video with Summarization](./Analyze_a_Video_Summarization.ipynb): This notebook shows how you can use Gemini API's multimodal capabilities for video summarization.
+* [Apollo 11 - long context example](./Apollo_11.ipynb): Search a 400 page transcript from Apollo 11.
+* [Clasify text with emeddings](./Classify_text_with_embeddings.ipynb): Use embeddings from the Gemini API with Keras.
+* [Guess the shape](./Guess_the_shape.ipynb): A simple example of using images in prompts.
+* [Market a Jet Backpack](./Market_a_Jet_Backpack.ipynb): Create a marketing campaign from a product sketch.
+* [Object detection](./Object_detection.ipynb): Extensive examples with object detection, including with multiple classes, OCR, visual question answering, and even an interactive demo.  
+* [Opossum search](./Opossum_search.ipynb): Code generation with the Gemini API. Just for fun, you'll prompt the model to create a web app called "Opossum Search" that searches Google with "opossum" appended to the query.
+* [Search Wikipedia with ReAct](./Search_Wikipedia_using_ReAct.ipynb): Use ReAct prompting with Gemini 1.5 Flash to search Wikipedia interactively.
+* [Search Re-ranking with Embeddings](./Search_reranking_using_embeddings.ipynb): Use embeddings to re-rank search results.
+* [Story writing with prompt chaining.ipynb](./Story_Writing_with_Prompt_Chaining.ipynb): Write a story using two powerful tools: prompt chaining and iterative generation.
+* [Talk to documents](./Talk_to_documents_with_embeddings.ipynb): This is a basic intro to Retrieval Augmented Generation (RAG). Use embeddings to search through a custom database.
+* [Upload files to Colab](./Upload_files_to_Colab.ipynb): This is a helper notebook that shows how to upload files from your local computer to Colab. Note: to upload files to the Gemini API (text, code, images, audio, video), check out the [Files quickstart](https://github.com/google-gemini/cookbook/blob/main/quickstarts/File_API.ipynb).
+* [Voice Memos](./Voice_memos.ipynb): You'll use the Gemini API to help you generate ideas for your next blog post, based on voice memos you recorded on your phone, and previous articles you've written.
+* [Translate a public domain](./Translate_a_Public_Domain_Book.ipynb): In this notebook, you will explore Gemini model as a translation tool, demonstrating how to prepare data, create effective prompts, and save results into a `.txt` file.
+* [Working with Charts, Graphs, and Slide Decks](./Working_with_Charts_Graphs_and_Slide_Decks.ipynb): Gemini models are powerful multimodal LLMs that can process both text and image inputs. This notebook shows how Gemini 1.5 Flash model is capable of extracting data from various images.
+* [Entity extraction](./Entity_Extraction.ipynb): Use Gemini API to speed up some of your tasks, such as searching through text to extract needed information. Entity extraction with a Gemini model is a simple query, and you can ask it to retrieve its answer in the form that you prefer.
+* [Generate a company research report using search grounding](./search_grounding_for_research_report.ipynb): Use search grounding to write a company research report with Gemini 1.5 Flash.
 
 ### Integrations
 
 * [Personalized Product Descriptions with Weaviate](weaviate/personalized_description_with_weaviate_and_gemini_api.ipynb): Load data into a Weaviate vector DB, build a semantic search system using embeddings from the Gemini API, create a knowledge graph and generate unique product descriptions for personas using the Gemini API and Weaviate.
 
 ### Folders
 
-* [Prompting examples](https://github.com/google-gemini/cookbook/tree/main/examples/prompting): A directory with examples of various prompting techniques. 
-* [JSON Capabilities](https://github.com/google-gemini/cookbook/tree/main/examples/json-capabilities): A directory with guides containing different types of tasks you can do with JSON schemas.
-* [Automate Google Workspace tasks with the Gemini API](https://github.com/google-gemini/cookbook/tree/main/examples/Apps_script_and_Workspace_codelab): This codelabs shows you how to connect to the Gemini API using Apps Script, and uses the function calling, vision and text capabilities to automate Google Workspace tasks - summarizing a document, analyzing a chart, sending an email and generating some slides directly. All of this is done from a free text input.
-* [Langchain examples](https://github.com/google-gemini/cookbook/tree/main/examples/langchain): A directory with multiple examples using Gemini with Langchain.
+* [Prompting examples](./prompting): A directory with examples of various prompting techniques. 
+* [JSON Capabilities](./json-capabilities): A directory with guides containing different types of tasks you can do with JSON schemas.
+* [Automate Google Workspace tasks with the Gemini API](./Apps_script_and_Workspace_codelab): This codelabs shows you how to connect to the Gemini API using Apps Script, and uses the function calling, vision and text capabilities to automate Google Workspace tasks - summarizing a document, analyzing a chart, sending an email and generating some slides directly. All of this is done from a free text input.
+* [Langchain examples](./langchain): A directory with multiple examples using Gemini with Langchain.
 
-There are even more examples in the [quickstarts](https://github.com/google-gemini/cookbook/tree/main/quickstarts) folder and in the [Awesome Gemini page](../Awesome_gemini.md).
+There are even more examples in the [quickstarts](../quickstarts) folder and in the [Awesome Gemini page](../Awesome_gemini.md).
diff --git a/gemini-2/README.md b/gemini-2/README.md
@@ -15,10 +15,11 @@ Explore Gemini 2.0’s capabilities through the following notebooks using Google
 * [Search tool](./search_tool.ipynb) \- Quick start using the Search tool with the unary and Live APIs in the GenAI SDK
 * [Spatial understanding](./spatial_understanding.ipynb) \- Comprehensive overview of 2D spatial understanding capabilities with the GenAI SDK
 * [Spatial understanding (3D)](./spatial_understanding_3d.ipynb) \- Comprehensive overview of 3D spatial understanding capabilities with the GenAI SDK
+* [Video understanding](./video_understanding.ipynb) \- Comprehensive overview Gemini 2.0 video understanding capabilities with the GenAI SDK
 * [Gemini 2.0 Flash Thinking](./thinking.ipynb) \- Introduction to the experimental Gemini 2.0 Flash Thinking model with the GenAI SDK
 
 Or explore on your own local machine.
 
-* [Live API starter script](./live_api_starter.py) \- A locally runnable Python script using GenAI SDK that supports streaming audio in and audio + video out from your machine
+* [Live API starter script](./live_api_starter.py) \- A locally runnable Python script using GenAI SDK that supports streaming audio + video (camera or screen) in and audio from your machine
 
 Also find websocket-specific examples in the [`websockets`](./websockets/) directory.
diff --git a/gemini-2/get_started.ipynb b/gemini-2/get_started.ipynb
@@ -11,7 +11,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 1,
+      "execution_count": null,
       "metadata": {
         "cellView": "form",
         "id": "9r9Ggw012g9c"
@@ -379,7 +379,7 @@
       "source": [
         "## Configure model parameters\n",
         "\n",
-        "You can include parameter values in each call that you send to a model to control how the model generates a response. Learn more about [experimenting with parameter values](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/prompts/adjust-parameter-values)."
+        "You can include parameter values in each call that you send to a model to control how the model generates a response. Learn more about [experimenting with parameter values](https://ai.google.dev/gemini-api/docs/text-generation?lang=node#configure)."
       ]
     },
     {
@@ -431,7 +431,7 @@
       "source": [
         "## Configure safety filters\n",
         "\n",
-        "The Gemini API provides safety filters that you can adjust across multiple filter categories to restrict or allow certain types of content. You can use these filters to adjust what's appropriate for your use case. See the [Configure safety filters](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/configure-safety-filters) page for details."
+        "The Gemini API provides safety filters that you can adjust across multiple filter categories to restrict or allow certain types of content. You can use these filters to adjust what's appropriate for your use case. See the [Configure safety filters](https://ai.google.dev/gemini-api/docs/safety-settings) page for details."
       ]
     },
     {
@@ -714,7 +714,7 @@
       "source": [
         "## Generate JSON\n",
         "\n",
-        "The [controlled generation](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/control-generated-output) capability in Gemini API allows you to constraint the model output to a structured format. You can provide the schemas as Pydantic Models or a JSON string."
+        "The [controlled generation](https://ai.google.dev/gemini-api/docs/structured-output?lang=python#generate-json) capability in Gemini API allows you to constraint the model output to a structured format. You can provide the schemas as Pydantic Models or a JSON string."
       ]
     },
     {
@@ -962,7 +962,7 @@
       "source": [
         "## Function calling\n",
         "\n",
-        "[Function calling](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/function-calling) lets you provide a set of tools that it can use to respond to the user's prompt. You create a description of a function in your code, then pass that description to a language model in a request. The response from the model includes the name of a function that matches the description and the arguments to call it with."
+        "[Function calling](https://ai.google.dev/gemini-api/docs/function-calling) lets you provide a set of tools that it can use to respond to the user's prompt. You create a description of a function in your code, then pass that description to a language model in a request. The response from the model includes the name of a function that matches the description and the arguments to call it with."
       ]
     },
     {
@@ -1912,7 +1912,7 @@
       "source": [
         "## Use context caching\n",
         "\n",
-        "[Context caching](https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-overview) lets you to store frequently used input tokens in a dedicated cache and reference them for subsequent requests, eliminating the need to repeatedly pass the same set of tokens to a model.\n",
+        "[Context caching](https://ai.google.dev/gemini-api/docs/caching?lang=python) lets you to store frequently used input tokens in a dedicated cache and reference them for subsequent requests, eliminating the need to repeatedly pass the same set of tokens to a model.\n",
         "\n",
         "Context caching is only available for stable models with fixed versions (for example, `gemini-1.5-pro-002`). You must include the version postfix (for example, the `-002` in `gemini-1.5-pro-002`). It is not yet available of Gemini 2.0 because it is an experimental model. You can find more caching examples [here](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Caching.ipynb)."
       ]
@@ -2107,7 +2107,7 @@
       "source": [
         "## Get text embeddings\n",
         "\n",
-        "You can get text embeddings for a snippet of text by using `embed_content` method. All models produce an output with 768 dimensions by default. However, some models give users the option to choose an output dimensionality between 1 and 768. See [Vertex AI text embeddings API](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings) for more details."
+        "You can get text embeddings for a snippet of text by using `embed_content` method. All models produce an output with 768 dimensions by default. However, some models give users the option to choose an output dimensionality between 1 and 768. See the [embeddings guide](https://ai.google.dev/gemini-api/docs/embeddings) for more details."
       ]
     },
     {

diff --git a/gemini-2/live_api_starter.ipynb b/gemini-2/live_api_starter.ipynb
@@ -656,7 +656,7 @@
         "- If you're interested in the low level details of using the websockets directly, see the [websocket version of this tutorial](../gemini-2/websockets/live_api_starter.ipynb).\n",
         "- Try the [Tool use in the live API tutorial](../gemini-2/live_api_tool_use.ipynb) for an walkthrough of Gemini-2's new tool use capabilities.\n",
         "- There is a [Streaming audio in Colab example](../gemini-2/websockets/live_api_streaming_in_colab.ipynb), but this is more of a **demo**, it's **not optimized for readability**.\n",
-        "- Other nice Gemini 2.0 examples can also be found in the [Cookbook](https://github.com/google-gemini/cookbook/blob/main/gemini-2/).\n"
+        "- Other nice Gemini 2.0 examples can also be found in the [Cookbook](https://github.com/google-gemini/cookbook/blob/main/gemini-2/), in particular the [video understanding](./video_understanding.ipynb) and the [spatial understanding](./spatial_understanding.ipynb) ones.\n"
       ]
     }
   ],

diff --git a/gemini-2/spatial_understanding.ipynb b/gemini-2/spatial_understanding.ipynb