Models can now be stored alongside their pipelines in the same repository, streamlining gating mechanism:
- accept
pyannote/speaker-diarization-x.x
pipeline user agreement acceptpyannote/segmentation-3.0
model user agreementacceptpyannote/wespeaker-voxceleb-resnet34-LM
model user agreement- load pipeline with
Pipeline.from_pretrained("pyannote/speaker-diarization-3.1", token=True)
Clipping and speaker/source alignment issues in speech separation pipeline have been fixed.
- BREAKING(hub): rename
use_auth_token
totoken
- BREAKING(cache): rely on
huggingface_hub
caching directory (PYANNOTE_CACHE
is no longer used) - BREAKING(inference):
Inference
now only supports already instantiated models - BREAKING(task): drop support for
multilabel
training inSpeakerDiarization
task - BREAKING(task): drop support for
warm_up
option inSpeakerDiarization
task - BREAKING(task): drop support for
weigh_by_cardinality
option inSpeakerDiarization
task - BREAKING(task): drop support for
vad_loss
option inSpeakerDiarization
task
- improve(hub): add support for pipeline repos that also include underlying models
- feat(clustering): add support for
k-means
clustering - feat(model): add
wav2vec_frozen
option to freeze/unfreezewav2vec
inSSeRiouSS
architecture - feat(task): add support for manual optimization in
SpeakerDiarization
task - feat(utils): add
hidden
option toProgressHook
- feat(utils): add
FilterByNumberOfSpeakers
protocol files filter - feat(core): add
Calibration
class to calibrate logits/distances into probabilities - feat(metric): add detection, precision, and recall diarization metrics
- improve(model): improve WavLM (un)freezing support for
SSeRiouSS
architecture (@clement-pages) - improve(task): improve
SpeakerDiarization
training with manual optimization (@clement-pages)
- fix(model): improve WavLM (un)freezing support for
ToTaToNet
architecture (@clement-pages) - fix(separation): fix clipping issue in speech separation pipeline (@joonaskalda)
- fix(separation): fix alignment between separated sources and diarization (@Lebourdais and @clement-pages)
- fix(separation): prevent leakage removal collar from being applied to diarization (@clement-pages)
- fix(separation): fix
PixIT
training with manual optimization (@clement-pages) - fix(doc): fix link to pytorch (@emmanuel-ferdman)
- fix(task): fix corner case with small (<9) number of validation samples (@antoinelaurent)
- fix(doc): fix default embedding in
SpeechSeparation
andSpeakerDiarization
docstring (@razi-tm).
- setup: drop support for Python 3.8
- fix: fix support for
numpy==2.x
(@ibevers) - fix: fix support for
speechbrain==1.x
(@Adel-Moumen)
pyannote.audio
does speech separation: multi-speaker audio in, one audio channel per speaker out!
pip install pyannote.audio[separation]==3.3.0
- feat(task): add
PixIT
joint speaker diarization and speech separation task (with @joonaskalda) - feat(model): add
ToTaToNet
joint speaker diarization and speech separation model (with @joonaskalda) - feat(pipeline): add
SpeechSeparation
pipeline (with @joonaskalda) - feat(io): add option to select torchaudio
backend
- fix(task): fix wrong train/development split when training with (some) meta-protocols (#1709)
- fix(task): fix metadata preparation with missing validation subset (@clement-pages)
- improve(io): when available, default to using
soundfile
backend - improve(pipeline): do not extract embeddings when
max_speakers
is set to 1 - improve(pipeline): optimize memory usage of most pipelines (#1713 by @benniekiss)
- feat(task): add option to cache task training metadata to speed up training (with @clement-pages)
- feat(model): add
receptive_field
,num_frames
anddimension
to models (with @Bilal-Rahou) - feat(model): add
fbank_only
property toWeSpeaker
models - feat(util): add
Powerset.permutation_mapping
to help with permutation in powerset space (with @FrenchKrab) - feat(sample): add sample file at
pyannote.audio.sample.SAMPLE_FILE
- feat(metric): add
reduce
option todiarization_error_rate
metric (with @Bilal-Rahou) - feat(pipeline): add
Waveform
andSampleRate
preprocessors
- fix(task): fix random generators and their reproducibility (with @FrenchKrab)
- fix(task): fix estimation of training set size (with @FrenchKrab)
- fix(hook): fix
torch.Tensor
support inArtifactHook
- fix(doc): fix typo in
Powerset
docstring (with @lukasstorck) - fix(doc): remove mention of unsupported
numpy.ndarray
waveform (with @Purfview)
- improve(metric): add support for number of speakers mismatch in
diarization_error_rate
metric - improve(pipeline): track both
Model
andnn.Module
attributes inPipeline.to(device)
- improve(io): switch to
torchaudio >= 2.2.0
- improve(doc): update tutorials (with @clement-pages)
- BREAKING(model): get rid of
Model.example_output
in favor ofnum_frames
method,receptive_field
property, anddimension
property - BREAKING(task): custom tasks need to be updated (see "Add your own task" tutorial)
- community: add tutorial for offline use of
pyannote/speaker-diarization-3.1
(by @simonottenhauskenbun)
Providing num_speakers
to pyannote/speaker-diarization-3.1
now works as expected.
- fix(pipeline): fix support for setting
num_speakers
inpyannote/speaker-diarization-3.1
pipeline
pyannote/speaker-diarization-3.1
no longer requires unpopular ONNX runtime
- feat(model): add WeSpeaker embedding wrapper based on PyTorch
- feat(model): add support for multi-speaker statistics pooling
- feat(pipeline): add
TimingHook
for profiling processing time - feat(pipeline): add
ArtifactHook
for saving internal steps - feat(pipeline): add support for list of hooks with
Hooks
- feat(utils): add
"soft"
option toPowerset.to_multilabel
- fix(pipeline): add missing "embedding" hook call in
SpeakerDiarization
- fix(pipeline): fix
AgglomerativeClustering
to honornum_clusters
when provided - fix(pipeline): fix frame-wise speaker count exceeding
max_speakers
or detectednum_speakers
inSpeakerDiarization
pipeline
- improve(pipeline): compute
fbank
on GPU when requested
- BREAKING(pipeline): rename
WeSpeakerPretrainedSpeakerEmbedding
toONNXWeSpeakerPretrainedSpeakerEmbedding
- BREAKING(setup): remove
onnxruntime
dependency. You can still use ONNXhbredin/wespeaker-voxceleb-resnet34-LM
but you will have to installonnxruntime
yourself. - BREAKING(pipeline): remove
logging_hook
(useArtifactHook
instead) - BREAKING(pipeline): remove
onset
andoffset
parameter inSpeakerDiarizationMixin.speaker_count
You should now binarize segmentations before passing them tospeaker_count
- fix(pipeline): fix WeSpeaker GPU support
- feat(pipeline): send pipeline to device with
pipeline.to(device)
- feat(pipeline): add
return_embeddings
option toSpeakerDiarization
pipeline - feat(pipeline): make
segmentation_batch_size
andembedding_batch_size
mutable inSpeakerDiarization
pipeline (they now default to1
) - feat(pipeline): add progress hook to pipelines
- feat(task): add powerset support to
SpeakerDiarization
task - feat(task): add support for multi-task models
- feat(task): add support for label scope in speaker diarization task
- feat(task): add support for missing classes in multi-label segmentation task
- feat(model): add segmentation model based on torchaudio self-supervised representation
- feat(pipeline): check version compatibility at load time
- improve(task): load metadata as tensors rather than pyannote.core instances
- improve(task): improve error message on missing specifications
- BREAKING(task): rename
Segmentation
task toSpeakerDiarization
- BREAKING(pipeline): pipeline defaults to CPU (use
pipeline.to(device)
) - BREAKING(pipeline): remove
SpeakerSegmentation
pipeline (useSpeakerDiarization
pipeline) - BREAKING(pipeline): remove
segmentation_duration
parameter fromSpeakerDiarization
pipeline (defaults toduration
of segmentation model) - BREAKING(task): remove support for variable chunk duration for segmentation tasks
- BREAKING(pipeline): remove support for
FINCHClustering
andHiddenMarkovModelClustering
- BREAKING(setup): drop support for Python 3.7
- BREAKING(io): channels are now 0-indexed (used to be 1-indexed)
- BREAKING(io): multi-channel audio is no longer downmixed to mono by default.
You should update how
pyannote.audio.core.io.Audio
is instantiated:- replace
Audio()
byAudio(mono="downmix")
; - replace
Audio(mono=True)
byAudio(mono="downmix")
; - replace
Audio(mono=False)
byAudio()
.
- replace
- BREAKING(model): get rid of (flaky)
Model.introspection
If, for some weird reason, you wrote some custom code based on that, you should instead rely onModel.example_output
. - BREAKING(interactive): remove support for Prodigy recipes
- fix(pipeline): fix reproducibility issue with Ampere CUDA devices
- fix(pipeline): fix support for IOBase audio
- fix(pipeline): fix corner case with no speaker
- fix(train): prevent metadata preparation to happen twice
- fix(task): fix support for "balance" option
- improve(task): shorten and improve structure of Tensorboard tags
- setup: switch to torch 2.0+, torchaudio 2.0+, soundfile 0.12+, lightning 2.0+, torchmetrics 0.11+
- setup: switch to pyannote.core 5.0+, pyannote.database 5.0+, and pyannote.pipeline 3.0+
- setup: switch to speechbrain 0.5.14+
- BREAKING(pipeline): rewrite speaker diarization pipeline
- feat(pipeline): add option to optimize for DER variant
- feat(clustering): add support for NeMo speaker embedding
- feat(clustering): add FINCH clustering
- feat(clustering): add min_cluster_size hparams to AgglomerativeClustering
- feat(hub): add support for private/gated models
- setup(hub): switch to latest hugginface_hub API
- fix(pipeline): fix support for missing reference in Resegmentation pipeline
- fix(clustering) fix corner case where HMM.fit finds too little states
- BREAKING: complete rewrite
- feat: much better performance
- feat: Python-first API
- feat: pretrained pipelines (and models) on Huggingface model hub
- feat: multi-GPU training with pytorch-lightning
- feat: data augmentation with torch-audiomentations
- feat: Prodigy recipe for model-assisted audio annotation
- fix: make sure master branch is used to load pretrained models (#599)
- last release before complete rewriting
- fix: fix regression in Precomputed.call (#110, #105)
- chore: switch from keras to pytorch (with tensorboard support)
- improve: faster & better traning (
AutoLR
, advanced learning rate schedulers, improved batch generators) - feat: add tunable speaker diarization pipeline (with its own tutorial)
- chore: drop support for Python 2 (use Python 3.6 or later)
- feat: add python 3 support
- chore: rewrite neural speaker embedding using autograd
- feat: add new embedding architectures
- feat: add new embedding losses
- chore: switch to Keras 2
- doc: add tutorial for (MFCC) feature extraction
- doc: add tutorial for (LSTM-based) speech activity detection
- doc: add tutorial for (LSTM-based) speaker change detection
- doc: add tutorial for (TristouNet) neural speaker embedding
- feat: add LSTM-based speech activity detection
- feat: add LSTM-based speaker change detection
- improve: refactor LSTM-based speaker embedding
- feat: add librosa basic support
- feat: add SMORMS3 optimizer
- feat: add 'covariance_type' option to BIC segmentation
- chore: rename sequence generator in preparation of the release of TristouNet reproducible research package.
- first public version