Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starter applet for real-time voice interaction with Gemini. #12

Open
danielrosehill opened this issue Dec 31, 2024 · 1 comment
Open
Labels
enhancement New feature or request

Comments

@danielrosehill
Copy link

Description of the feature request:

Hello, Gemini team!

I'm sure this is already on your radar, but nevertheless I thought I would drop in with a quick request for a starter applet for some voice project, perhaps a speech-to-speech app with Gemini.

What problem are you trying to solve with this feature?

I would be extremely interested in exploring the capability of this particular aspect of the Realtime API for interacting with Gemini, but more usefully, bundling up transcripts into organized documents and syncing them into Google Drive.

Any other information you'd like to share?

Sure!

I've been engaged for some time with the question of how to better address the gap in many LLM apps at the moment when it comes to actually storing outputs. I think that for all the worthy attention paid to managing prompts, it's a pity that more thought hasn't been put into where to actually store the often useful things we get from interacting with models.

Given the fact that Gemini sits perfectly within the Workspace ecosystem and as a Workspace user myself, it occurred to me that a great app could be concocted by bringing the Realtime API together with some kind of backend logic to route and store outputs in Google Drive.

I think that this combination could be extremely powerful and support many interesting use cases for both personal users, but particularly for business users.

If you want a slightly better pitch, here are a couple of use cases that I would have in mind. These are intended to highlight how an app like this could be an excellent addition to hybrid workflows:

  • Business Commuter interacts with Gemini Realtime API during their commute, then uses voice commands to bundle up the entire conversation, or more usefully, aspects of it, and saves it to Google Drive.

  • Assuming that an integration with Google Drive could be easily achieved, provide users with some settings to make the best use of this interaction. For example, specific save words could be configured to route outputs into specific folders so that users could choose where to direct different types of interaction with the model.

This could be a great way, in my opinion, to bring together the power of voice workflows with traditional workflows, allowing those capturing ideas and working with Gemini on the go to bring those forward to on-site team members.

@Giom-V
Copy link
Contributor

Giom-V commented Jan 6, 2025

Hello @danielrosehill,

You should be able to find multiple code examples for the real-time API in the Cookbook, in particular (this python ccode](https://github.com/google-gemini/cookbook/blob/main/gemini-2/live_api_starter.py).

@Giom-V Giom-V added the enhancement New feature or request label Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants