AI-Waifu is an IoT-based voice assistant designed as Task 2 for the ECL-108 IoT Workshop 2. The project employs two ESP32 DevKit boards communicating via UART to capture and process voice input, then produce synthesized speech output. It utilizes a MEMS microphone (INMP441) and an I2S audio amplifier (MAX98357A) for audio input and output, respectively. The Google Text-to-Speech (TTS) API generates responses with a Japanese female voice.
- Real-time Speech Recognition: Captures audio from a MEMS microphone and converts it to text.
- Speech Synthesis: Outputs natural-sounding speech through a speaker using an I2S audio amplifier.
- Dual ESP32 Communication: Coordinates tasks between two ESP32 boards over UART.
- I2S Protocol: Ensures high-fidelity audio transfer for both input and output.
- 2 × ESP32 DevKit Boards
- INMP441 MEMS Microphone: I2S-based, digital MEMS microphone.
- MAX98357A Audio Amplifier: I2S-based audio amplifier.
- Speaker: Connected to the MAX98357A.
- UART Communication Cables
- Jumper Wires
-
Audio Capture and Processing (ESP32 #1):
- Captures real-time audio input via the INMP441 microphone.
- Uses the I2S protocol to convert analog signals into digital data.
- Transmits the converted text data to the second ESP32 using UART.
-
Text-to-Speech and Output (ESP32 #2):
- Receives text data through UART.
- Sends the text to the Google TTS API to generate audio output.
- Plays the synthesized speech through the speaker using the MAX98357A amplifier.
ai-waifu/
├── ChatGPT Voice Assistant/
│ ├── ESP32_Speech_to_Text_t/
│ │ ├── Audio.cpp
│ │ ├── Audio.h
│ │ ├── CloudSpeechClient.cpp
│ │ ├── CloudSpeechClient.h
│ │ ├── ESP32_Speech_to_Text_t.ino
│ │ ├── I2S.cpp
│ │ ├── I2S.h
│ │ ├── network_param.h
│ ├── ESP32_Text_to_Speech/
│ │ ├── ESP32_Text_to_Speech.ino
├── ckt.png
├── flowchart.png
├── w2t2.pdf
INMP441 Pin | ESP32 Pin |
---|---|
VCC | 3.3V |
GND | GND |
WS | IO15 |
SCK | IO14 |
SD | IO32 |
MAX98357A Pin | ESP32 Pin |
---|---|
VIN | 3.3V |
GND | GND |
BCK | IO27 |
LRCK | IO25 |
DIN | IO26 |
ESP32 #1 Pin | ESP32 #2 Pin |
---|---|
TX (IO17) | RX (IO16) |
RX (IO16) | TX (IO17) |
GND | GND |
- Install Arduino IDE.
- Add ESP32 board support.
- Obtain a Google Cloud TTS API Key.
- Speech-to-Text (ESP32 #1):
UploadESP32_Speech_to_Text_t.ino
to the first ESP32. - Text-to-Speech (ESP32 #2):
UploadESP32_Text_to_Speech.ino
to the second ESP32.
- Edit
network_param.h
:#define WIFI_SSID "your_wifi_ssid" #define WIFI_PASSWORD "your_wifi_password" #define API_KEY "your_google_tts_api_key"
- Initializes the I2S interface for the INMP441 microphone.
- Captures audio and sends it for cloud-based speech recognition.
- Transmits recognized text data over UART.
- Receives text data from UART.
- Requests speech synthesis from Google TTS API.
- Outputs the received audio stream through MAX98357A using I2S.
-
No Audio Input Detection:
- Verify microphone connections.
- Check I2S pin configuration in
I2S.cpp
.
-
UART Communication Failure:
- Ensure both boards have matching baud rates.
- Confirm TX/RX connections.
- Avoid connecting GND pins of both ESP32 Devkit boards together.
-
Google TTS Errors:
- Validate your API key.
- Check internet connectivity.
Check out the project in action: AI-Waifu Demo on LinkedIn
- Support for additional languages and voices.
- Integration with more advanced NLP models for contextual responses.