You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Given that your tool is exceptionally well-programmed and functions seamlessly across various applications, it would be beneficial to incorporate voice-typing processing through Gemini or other suitable LLMs.
It would be fantastic to have a dedicated button within your app, rather than relying on inferior voice-typing solutions. Perhaps my perspective is incorrect; if so, please correct me.
The text was updated successfully, but these errors were encountered:
This is an interesting request that we could think about adding in the future.
There's actually a very nice dedicated model for this by OpenAI called Whisper, however, running it locally requires ~4 GB of vram/ram and almost everyone certainly wouldn't be able to run it alongside a local LLM. There's actually 1 project I found that does what you requested with this: https://github.com/savbell/whisper-writer
A way to get free & accessible state of the art transcription would be using the Gemini API and asking the multimodal Gemini 2.0 for a transcript. However, I'm unsure what the latency would be like.
This is not something I can immediately work on, and I'd also like to hear what others think about this proposal first.
Given that your tool is exceptionally well-programmed and functions seamlessly across various applications, it would be beneficial to incorporate voice-typing processing through Gemini or other suitable LLMs.
It would be fantastic to have a dedicated button within your app, rather than relying on inferior voice-typing solutions. Perhaps my perspective is incorrect; if so, please correct me.
The text was updated successfully, but these errors were encountered: