This is a simple Node.js web application that allows users to upload an audio file and transcribe it using OpenAI's Whisper API.
The app processes the audio, returns the transcription to the user, and also provides the option to download the transcription as a plain text file.
The filename of the text file is dynamically generated based on the first three words of the transcription and a timestamp.
Important:
- using this app requires setting up billing with open AI and paying for credits.
- Audio Upload:
Users can upload audio files via a web form. - OpenAI Whisper Integration:
The app sends the uploaded audio to OpenAI's Whisper API for transcription. - Transcription Display:
Once transcribed, the text is displayed on the web page. - Downloadable Transcription:
Users can download the transcription as a.txt
file, with a filename that includes the first three words of the transcription and a timestamp. - Supports Various Audio Formats:
The app supports common audio file formats, such asm4a
,mp3
,wav
,ogg
, etc.
- Installation
- Usage
- Environment variables
- Dynamically creating filename from transcription + timestamp
- Downloadable transcription as plain text file
- Downloadable transcription
- Error handling
- Supported audio formats
- Known issues
- Contributing
- License
- Appendices
- Notes
- Acknowledgments
Follow these steps to install and run the app on your local machine:
- Node.js (version 16 or higher)
- An OpenAI API key (You can generate this by signing up for OpenAI here)
- Clone the repository:
git clone https://github.com/your-username/ai-whisper.git
cd ai-whisper
- Install dependencies: Run the following command to install the required packages:
npm install
- Set up environment variables:
Create a
.env
file in the root of your project and add your OpenAI API key:
touch .env
Add the following content to the .env
file:
OPENAI_API_KEY=your-openai-api-key-here
- Run the application: Start the Express server:
node server.js
- Access the application:
Open your browser and go to
http://localhost:${PORT}
to see the audio upload form.
- Open your browser and navigate to
http://localhost:${PORT}
. - Upload an audio file in any of the supported formats (see Supported Audio Formats below).
- Click "Upload and Transcribe".
- Once the transcription is complete, the result will be displayed on the screen.
- A download link will appear allowing you to download the transcription as a
.txt
file.
The application uses a .env
file to store sensitive information like the OpenAI API key. The following environment variable needs to be set:
OPENAI_API_KEY
: Your API key from OpenAI, which is required to use Whisper.
We use Multer to handle file uploads. Files are temporarily stored in the uploads/
directory. The app processes the uploaded file by sending it to the Whisper API for transcription.
Here’s a quick breakdown of the process:
- Users upload a file through the web form.
- Multer stores the file in the
uploads/
directory. - The server reads the file and sends it to OpenAI’s Whisper API for transcription.
The Whisper API supports the following audio file formats:
.flac
.m4a
.mp3
.mp4
.mpeg
.mpga
.oga
.ogg
.wav
.webm
Once the transcription is successfully generated, a .txt
file containing the transcription is saved in the uploads/
directory. The filename is dynamically generated based on:
- The first three words¹ of the transcription (spaces are replaced by underscores
_
). - A timestamp in the format
YYYY-MM-DD--minutes-seconds
.
For example, a transcription that starts with "Four score and seven" and was transcribed at 2024-10-15 14:25:05
will be saved as:
Four_score_and_2024-10-15--25-05.txt
Once the transcription is generated, the app provides a download link so users can download the transcription in plain text format. The text file contains only the transcription text and no additional content (e.g., no labels like "Transcription Result").
The link will look like:
<a href="/download-transcription/Four_score_and_2024-10-15--25-05.txt">
Download Transcription
</a>
The application includes basic error handling to manage potential issues:
- File format validation: If a file with an unsupported format is uploaded, the app returns an error message.
- API request errors: If there is an issue with the request to OpenAI's API (e.g., an invalid API key or network issue), the app logs the error and informs the user.
- General errors: Other errors, such as file system issues, are logged and the user is informed that transcription failed.
If an error occurs during transcription, the user will see a message like:
<h1>Transcription Failed</h1>
<p>{"error": "An error occurred during transcription"}</p>
__dirname Not Defined
: Since this project uses ES modules, we manually define__dirname
usingfileURLToPath
anddirname
from theurl
module.- Large File Handling: Currently, the app stores uploaded files temporarily in memory. For larger audio files, you might want to extend file handling capabilities (e.g., adding file size validation or streaming).
Come on down! Just follow these steps:
- Fork the repository.
- Create a new feature branch (
git checkout -b feature-name
). - Commit your changes (
git commit -am 'Add feature'
). - Push the branch (
git push origin feature-name
). - Create a Pull Request.
Please follow the project's coding conventions. Also include appropriate tests.
MIT.
- To test all this, I used the Gettysburg address by Abraham Lincoln. If you see things like four score and seven years ago, that's where that's coming from.
Whisper AI by OpenAI
At various points I tested the transcription quality with poems by Osip Mandelstam.
Thank you! • SL