Speaker Identification Toolkit

This repository provides a comprehensive toolkit for processing audio and video files, with a focus on speaker diarization, speaker identification, audio extraction, and dataset creation. By leveraging tools like ffmpeg, pyannote.audio, and other Python libraries, the scripts enable efficient and accurate workflows for handling audio data.

Workflow

1. Extract Audio from Videos - STABLE

Run dataset-creation.py to extract English audio tracks from video files.
Now uses PyDub instead of ffmpeg, extracts 44100khz PCM-16bit Mono wav files.
Uses multi-threading for faster, scalable performance.
Optional Script: organize-videos.py will extract Season and Episode info from the file names, and rename them accordingly.
- This keeps your video files and generated wavs/jsons, uniquenly named like S02E13.wav and S02E13.json, etc.

1B. Use UVR or a similar vocal isolation project - UVR Project

Diarizing audio with background noise, music, etc. will result in very poor diarization results, ex. singing from background music will be labeled as a speaker, etc.

2. Generate Speaker Diarization Data - STABLE

Run diarize-dataset.py to process the extracted WAV files and produce JSON files containing diarization data.
Uses PyDub instead of ffmpeg. Requires a HuggingFace Token.

3. Identify the Target Speaker - BETA RELEASE - ACTIVE PROJECT

Run identify-speaker.py to play audio segments from diarization files and interactively map the target speaker.

4. Isolate and clean-up the Audio - BETA RELEASE

Run isolate-trim.py to extract and trim the target speaker's audio segments, preparing them for dataset creation.

Dependencies

Ensure the following are installed:

Python 3.9
ffmpeg: Install via your system's package manager or from the official site.

Configuration

The scripts automatically create necessary directories and pause execution for users to populate them with required data. Ensure the following directory structure is in place:

Video Input Directory: base-folder/videos
WAV Output Directory: base-folder/wavs
JSON Output Directory: base-folder/jsons
Speaker Mapping File: base-folder/mappings.csv
Processed Speaker Output Directory: base-folder/targeted

Name	Name	Last commit message	Last commit date
Latest commit ThatJeffGuy Update README.md Dec 29, 2024 adc960e · Dec 29, 2024 History 48 Commits
LICENSE	LICENSE	Initial commit	Dec 15, 2024
README.md	README.md	Update README.md	Dec 29, 2024
dataset-creation.py	dataset-creation.py	Add files via upload	Dec 18, 2024
diarize-dataset.py	diarize-dataset.py	Update diarize-dataset.py	Dec 18, 2024
identify-speaker.py	identify-speaker.py	Add files via upload	Dec 23, 2024
identify-speaker.py.old	identify-speaker.py.old	Add files via upload	Dec 20, 2024
isolate-trim.py	isolate-trim.py	Add files via upload	Dec 18, 2024
organize-videos.py	organize-videos.py	Add files via upload	Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speaker Identification Toolkit

Workflow

1. Extract Audio from Videos - STABLE

1B. Use UVR or a similar vocal isolation project - UVR Project

2. Generate Speaker Diarization Data - STABLE

3. Identify the Target Speaker - BETA RELEASE - ACTIVE PROJECT

4. Isolate and clean-up the Audio - BETA RELEASE

Dependencies

Configuration

About

Releases

Packages

Languages

License

ThatJeffGuy/speaker-identification-toolkit

Folders and files

Latest commit

History

Repository files navigation

Speaker Identification Toolkit

Workflow

1. Extract Audio from Videos - STABLE

1B. Use UVR or a similar vocal isolation project - UVR Project

2. Generate Speaker Diarization Data - STABLE

3. Identify the Target Speaker - BETA RELEASE - ACTIVE PROJECT

4. Isolate and clean-up the Audio - BETA RELEASE

Dependencies

Configuration

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages