Starter applet for real-time voice interaction with Gemini. #12

danielrosehill · 2024-12-31T23:58:36Z

Description of the feature request:

Hello, Gemini team!

I'm sure this is already on your radar, but nevertheless I thought I would drop in with a quick request for a starter applet for some voice project, perhaps a speech-to-speech app with Gemini.

What problem are you trying to solve with this feature?

I would be extremely interested in exploring the capability of this particular aspect of the Realtime API for interacting with Gemini, but more usefully, bundling up transcripts into organized documents and syncing them into Google Drive.

Any other information you'd like to share?

Sure!

I've been engaged for some time with the question of how to better address the gap in many LLM apps at the moment when it comes to actually storing outputs. I think that for all the worthy attention paid to managing prompts, it's a pity that more thought hasn't been put into where to actually store the often useful things we get from interacting with models.

Given the fact that Gemini sits perfectly within the Workspace ecosystem and as a Workspace user myself, it occurred to me that a great app could be concocted by bringing the Realtime API together with some kind of backend logic to route and store outputs in Google Drive.

I think that this combination could be extremely powerful and support many interesting use cases for both personal users, but particularly for business users.

If you want a slightly better pitch, here are a couple of use cases that I would have in mind. These are intended to highlight how an app like this could be an excellent addition to hybrid workflows:

Business Commuter interacts with Gemini Realtime API during their commute, then uses voice commands to bundle up the entire conversation, or more usefully, aspects of it, and saves it to Google Drive.
Assuming that an integration with Google Drive could be easily achieved, provide users with some settings to make the best use of this interaction. For example, specific save words could be configured to route outputs into specific folders so that users could choose where to direct different types of interaction with the model.

This could be a great way, in my opinion, to bring together the power of voice workflows with traditional workflows, allowing those capturing ideas and working with Gemini on the go to bring those forward to on-site team members.

Giom-V · 2025-01-06T16:47:41Z

Hello @danielrosehill,

You should be able to find multiple code examples for the real-time API in the Cookbook, in particular (this python ccode](https://github.com/google-gemini/cookbook/blob/main/gemini-2/live_api_starter.py).

Giom-V added the enhancement New feature or request label Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Starter applet for real-time voice interaction with Gemini. #12

Starter applet for real-time voice interaction with Gemini. #12

danielrosehill commented Dec 31, 2024

Giom-V commented Jan 6, 2025

Starter applet for real-time voice interaction with Gemini. #12

Starter applet for real-time voice interaction with Gemini. #12

Comments

danielrosehill commented Dec 31, 2024

Description of the feature request:

What problem are you trying to solve with this feature?

Any other information you'd like to share?

Giom-V commented Jan 6, 2025