The pyannote.audio module makes diarization (speaker identification) quite easy, and the results appear to be reasonably accurate. However, distributing a python program to non-technical users that includes such code proved elusive. I found many online posts that essentially implied that distribution of the pytorch modules using PyInstaller couldn't be done. In addition, I found it challenging to successfully embed pyannote's libraries, eliminating the need to explain to my users how to supply a pyannote access token.
This works for me. Your mileage may vary. I welcome your input in an effort to expand this example to cover more circumstances.
- So far, I've got it working on Windows 11 and macOS 13.7.1 (Ventura). I don't expect that Linux will be much different than macOS.
- My main Windows computer has a GPU that uses CUDA 12. If you need CUDA 11, you will need a different version of PyTorch.
- My main macOS computer is a Mac Mini with an Apple M2 chip.
- I'm using Python 3.11.6, pyannote.audio 3.3.2, PyTorch 2.6.0, pyannote models "speaker-diarization-3.1" and "segmentation-3.0", and PyInstaller 6.11.1. These are all current module versions as I write this.
- On Windows, I've tested the distributable executable on Windows 10 and Windows 11 computers. One test computer is an 8 year old laptop with a CUDA 12 GPU, and another is a very old (> 10 years) desktop with an NVidia GeForce GT 545 video card. No GPU is accessible on this second test computer, so it used "cpu" rather than "cuda" during diarization. On macOS, I tested on a Mac Mini with an M2 processor and a MacBook with an M1 processor.
- Processing time varies widely between computers.
At present, create a virtual environment and use pip to import requirements.txt. Then run "Diarize.py" to see what the program does. There is a boolean variable
called EMBED that switches between the normal mode that downloads models from HuggingFace and the embedded or "offline" mode. You will need to create a file called
"pyannote_access_token.txt" that contains your pyannote access token to use the normal mode. Then use "setup-Diarize.py" to create a distributable build.
Alternately, you can use pip to import "requirements with faster whisper.txt" and use "Diarized_Transcript.py". At this time, this program requires a
pyannote access token for normal on-line diarization and will mix diarization information with transcription information created by faster-whisper. I have not
written a PyInstaller file for this version yet.
Unfortunately, two of the four sample audio files this sample user are too large for GitHub. You'll have to substitute your own files and alter the code.
I found several descriptions of how to embed the pyannote models, including pyannote.audio's tutorial's section for offline use. While these descriptions seemed straight forward, I could not get them to work. They assumed a little too much, said a little too little, and were unclear about what exactly needed to be included and edited. There appears to be just a touch of magic in how one renames the pytorch binary model files, and I was using model names that made sense to me rather than what is required. I eventually stumbled upon a bug report that made clear how important the precise names of the model files were.
- I created a directory called "models".
- I copied the "config.yaml" file from pyannote's speaker diarization 3.1 module, which I edited and renamed "speaker-diarization-3.1-config.yaml". (This file name is probably not mandatory. See the file in the repository for details.)
- I copied the "pytorch_model.bin" file from pyannote's wespeaker-voxceleb-resnet34-LM module, which I renamed pyannote_model_wespeaker-voxceleb-resnet34-LM.bin. (I have not included this file in my repository.)
- I copied the "pytorch_model.bin" file from oyannote's segmentation-3.0 module, which I renamed pytorch_model_segmentation-3.0.bin. (I have not included this file in my repository.)
- You must use these exact new file names for the two models. Otherwise, as I understand it, the models won't load properly.
- I then changed the pyannote Pipeline command in my code from:
to:
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1", use_auth_token=ACCESS_TOKEN)
pipeline = Pipeline.from_pretrained("models/speaker-diarization-3.1-config.yaml")
PyInstaller could not initially build an executable for my sample code that would run. What I describe may not be the optimal way to accomplish this task, but this is what worked after several failed attempts that should have worked. Suggestions for improving my techniques are welcome. To test the PyInstaller executable on Windows, I loaded a Command Prompt and ran the executable there. This process allowed me to see where the executable was failing. Essentially the same process worked on macOS. I built the distributable executable and ran it in a Terminal window using the command line. This revealed 4 python modules that needed to be included in the distribution but were not. See setup-Diarize.py. I added the following to my PyInstaller setup command: On Windows:
--add-binary=venv311/Lib/site-packages/asteroid_filterbanks:asteroid_filterbanks
--add-binary=venv311/Lib/site-packages/lightning_fabric:lightning_fabric
--add-binary=venv311/Lib/site-packages/pyannote:pyannote
--add-binary=venv311/Lib/site-packages/speechbrain:speechbrain
On macOS:
--add-binary=venv311/lib/python3.11/site-packages/asteroid_filterbanks:asteroid_filterbanks
--add-binary=venv311/lib/python3.11/site-packages/lightning_fabric:lightning_fabric
--add-binary=venv311/lib/python3.11/site-packages/pyannote:pyannote
--add-binary=venv311/lib/python3.11/site-packages/speechbrain:speechbrain
Following the PyInstaller call, I added the following to copy the necessary model files to the distribution:
shutil.copytree('models', 'dist/Diarize/models')
I hope this provides enough information that you can figure out how to get this working for you.