Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added voice commands demo. #52

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

DebrisHauler
Copy link

I wanted to get commands working for my unity game. And I've put together a bare bones scene to share what I've got working. Simply open 6 - Voice Commands.unity, run the scene, and start talking into the mic.

Some notes:
-It listens forever without running out of memory. I added a LoopingMicrophone class that leverages Unity's looping audio clip which overwrites itself as it wraps around. It is set in a public variable as a 10 second looping clip.
-For matching transcription to commands, I used a custom longest common substring algorithm. The custom part means I recursively exclude older spoken words in order to forgive previous user ramblings.
-A similarity threshold is set on the VoiceCommandsManager. So lowering it to around 0.6 allows the user deviate a bit from the command say by throwing in an unnecessary word. 1.0 will require the phrase to be spoken exactly.
-I know maybe this doesn't fit with some coding style Macoron had setup, but this is working for me and I just want to share. Please take it, rework it, or leave it.

@Macoron
Copy link
Owner

Macoron commented Aug 15, 2023

First of all, very impressive work! Here is my quick test of it:

command.mp4

For integration into the main repository this will require some extra work, like:

  1. Moving looping microphone to existing MicrophoneRecord
  2. Making this work with WhisperStream and standard WhisperManager
  3. Making commands set editable from inspector

That's ok if you want to share it as it is. I'll leave this PR open until me or someone else will want to rework it. Thank you for sharing this!

@DebrisHauler
Copy link
Author

It was awesome to see your video & response today! Those three points make sense. I won't have the time to integrate, but feel free to change anything to better fit your framework.

@konsnos
Copy link

konsnos commented Dec 2, 2024

Some notes on this.

I have implemented voice commands with some changes from the Streaming Demo. It's a bit laggy and doesn't seem as good as @DebrisHauler's work but I think it's decent.

I'm only listening to OnSegmentUpdated from the stream and checking if the words are included there.
For settings in MicrophoneRecord I reduced the Max Length Sec to 8 and Vad Context Sec to 10 and it's able to keep on listening for a lot of time while I'm also running other ML algorithms on an iPhone 12 Pro.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants