You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm sure this is already on your radar, but nevertheless I thought I would drop in with a quick request for a starter applet for some voice project, perhaps a speech-to-speech app with Gemini.
What problem are you trying to solve with this feature?
I would be extremely interested in exploring the capability of this particular aspect of the Realtime API for interacting with Gemini, but more usefully, bundling up transcripts into organized documents and syncing them into Google Drive.
Any other information you'd like to share?
Sure!
I've been engaged for some time with the question of how to better address the gap in many LLM apps at the moment when it comes to actually storing outputs. I think that for all the worthy attention paid to managing prompts, it's a pity that more thought hasn't been put into where to actually store the often useful things we get from interacting with models.
Given the fact that Gemini sits perfectly within the Workspace ecosystem and as a Workspace user myself, it occurred to me that a great app could be concocted by bringing the Realtime API together with some kind of backend logic to route and store outputs in Google Drive.
I think that this combination could be extremely powerful and support many interesting use cases for both personal users, but particularly for business users.
If you want a slightly better pitch, here are a couple of use cases that I would have in mind. These are intended to highlight how an app like this could be an excellent addition to hybrid workflows:
Business Commuter interacts with Gemini Realtime API during their commute, then uses voice commands to bundle up the entire conversation, or more usefully, aspects of it, and saves it to Google Drive.
Assuming that an integration with Google Drive could be easily achieved, provide users with some settings to make the best use of this interaction. For example, specific save words could be configured to route outputs into specific folders so that users could choose where to direct different types of interaction with the model.
This could be a great way, in my opinion, to bring together the power of voice workflows with traditional workflows, allowing those capturing ideas and working with Gemini on the go to bring those forward to on-site team members.
The text was updated successfully, but these errors were encountered:
Description of the feature request:
Hello, Gemini team!
I'm sure this is already on your radar, but nevertheless I thought I would drop in with a quick request for a starter applet for some voice project, perhaps a speech-to-speech app with Gemini.
What problem are you trying to solve with this feature?
I would be extremely interested in exploring the capability of this particular aspect of the Realtime API for interacting with Gemini, but more usefully, bundling up transcripts into organized documents and syncing them into Google Drive.
Any other information you'd like to share?
Sure!
I've been engaged for some time with the question of how to better address the gap in many LLM apps at the moment when it comes to actually storing outputs. I think that for all the worthy attention paid to managing prompts, it's a pity that more thought hasn't been put into where to actually store the often useful things we get from interacting with models.
Given the fact that Gemini sits perfectly within the Workspace ecosystem and as a Workspace user myself, it occurred to me that a great app could be concocted by bringing the Realtime API together with some kind of backend logic to route and store outputs in Google Drive.
I think that this combination could be extremely powerful and support many interesting use cases for both personal users, but particularly for business users.
If you want a slightly better pitch, here are a couple of use cases that I would have in mind. These are intended to highlight how an app like this could be an excellent addition to hybrid workflows:
Business Commuter interacts with Gemini Realtime API during their commute, then uses voice commands to bundle up the entire conversation, or more usefully, aspects of it, and saves it to Google Drive.
Assuming that an integration with Google Drive could be easily achieved, provide users with some settings to make the best use of this interaction. For example, specific save words could be configured to route outputs into specific folders so that users could choose where to direct different types of interaction with the model.
This could be a great way, in my opinion, to bring together the power of voice workflows with traditional workflows, allowing those capturing ideas and working with Gemini on the go to bring those forward to on-site team members.
The text was updated successfully, but these errors were encountered: