A simple Python script to Transcribe audio and perform Speaker Diarization using OpenAI's Whisper and pyannote.audio.
Based on Majdoddin's work discussed on GitHub and available as a Google Colab Notebook.
This Project was tested only on Linux, using CPU only and GPU configurations. While it is expected to work on other platforms, it is not guaranteed.
Input File: 10 minutes of audio in .mp3, of an interview between 2 people.
CPU: 2.36 minutes GPU: 2.05 minutes
pyannote/speaker-diarization pyannote/segmentation
@inproceedings{Bredin2021,
Title = {{End-to-end speaker segmentation for overlap-aware resegmentation}},
Author = {{Bredin}, Herv{\'e} and {Laurent}, Antoine},
Booktitle = {Proc. Interspeech 2021},
Address = {Brno, Czech Republic},
Month = {August},
Year = {2021},
}