WhisperNote

A simple Python script to Transcribe audio and perform Speaker Diarization using OpenAI's Whisper and pyannote.audio.

Based on Majdoddin's work discussed on GitHub and available as a Google Colab Notebook.

Running the script

This Project was tested only on Linux, using CPU only and GPU configurations. While it is expected to work on other platforms, it is not guaranteed.

Benchmarks

Input File: 10 minutes of audio in .mp3, of an interview between 2 people.

Transcription

CPU: 2.36 minutes GPU: 2.05 minutes

Citations

pyannote/speaker-diarization pyannote/segmentation

@inproceedings{Bredin2021,
  Title = {{End-to-end speaker segmentation for overlap-aware resegmentation}},
  Author = {{Bredin}, Herv{\'e} and {Laurent}, Antoine},
  Booktitle = {Proc. Interspeech 2021},
  Address = {Brno, Czech Republic},
  Month = {August},
  Year = {2021},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

WhisperNote

Running the script

Benchmarks

Transcription

Citations

Files

README.md

Latest commit

History

README.md

File metadata and controls

WhisperNote

Running the script

Benchmarks

Transcription

Citations