Documentation: EE698R DEC based Diarization Model

Team Name: TensorSlow

Members: Aditya Singh (@adityajaas) and Shashi Kant Gupta (@shashikg)

This speaker diarization model uses Deep Embedding Clustering with a deep neural network initialized via a Residual Autoencoder to assign speaker labels to segments of the raw audio signal. Clustering is perfomed on x-vectors extracted using Desplanques et al.'s ECAPA-TDNN framework. We use Silero-VAD for voice audio detection.

Baseline Model: Spectral clustering is used for audio-label assignment.

API Documentation

Index

Defined in: utils.py
Defined in: baselineMethods.py
- func diarizationOracleNumSpkrs
- func diarizationEigenGapNumSpkrs
Defined in: optimumSpeaker.py
Defined in: DEC.py
Defined in: colab_demo_utils.py

Defined in: utils.py

class DiarizationDataset()

Defined in utils.py

class DiarizationDataset(dataset_name=None
                 data_dir=None, 
                 sr=16000, 
                 window_len=240, 
                 window_step=120, 
                 transform=None,
                 batch_size_for_ecapa=512,
                 vad_step=4,
                 split='full',
                 use_precomputed_vad= True,
                 use_oracle_vad= False,
                 skip_overlap= True)

Create an abstract class for loading the dataset. This class applies the necessary pre-processing and x-vector feature extraction methods to return the audio file as a bunch of segmented x-vector features to use it directly in the clustering algorithm to predict speaker labels. The module uses the pre-computed X-vectors if available otherwise extract it during the runtime.

Parameters:

Argument	Detail
`dataset_name:`	str, Name of the pre-existing dataset to use. Options: `ami`, `ami_dev`, `voxconverse`
`data_dir:`	str, Directory for any dataset other the options specified in `dataset_name`. Both `dataset_name` and `data_dir` cannot be None
`sr:`	int, Sampling rate of the audio signal
`window_len:`	int, Window length (in ms) of each of the audio segments to be passed for feature extraction
`window_step:`	int, Step (in ms) between two windows of audio segments to be passed for feature extraction
`transform:`	list, List of transforms like `mel-transform` to be performed on audio while preprocessing, default = None
`batch_size_for_ecapa:`	int, Batch size of audio segments while performing feature extraction using ECAPA-TDNN
`vad_step:`	int, Number of windows to split each audio chunk into. Argument used by Silero-VAD module
`split:`	str, Argument defining type of split of dataset, default = 'full' indicates no split
`use_precomputed_vad:`	bool, If True, downloads precomputed Voice Activity Detection label output for the dataset. Only available for dataset options specified in `dataset_name`
`use_oracle_vad:`	bool, If True, model does Voice Activity Detection directly from groundtruth rttm files bypassing the Silero VAD module.
`skip_overlap:`	bool, If True, model skips the windows with multiple speakers speaking by inspecting the groundtruth rttm files

Class Functions:

__getitem__: def __getitem__(self, idx)

Parameters:

Argument	Detail
`idx:`	int, Index to the required audio in the list of audio in root directory

Returns:

Variable	Detail
`audio_segments:`	torch.Tensor, (n_windows, features_len) Tensor of feature vectors of each audio segment window
`diarization_segments:`	torch.Tensor, (n_windows, n_spks) Tensor containing ground truth of speaker labels, 1 if i-th window has j-th speaker speaking, else 0
`audio_segments:`	torch.Tensor, (n_windows, features_len) Tensor of feature vectors of each audio segment window
`speech_segments:`	torch.Tensor, (n_windows,) Tensor with i-th value 1 if VAD returns presence of speech audio in i-th window, else 0
`label_path:`	str, Path of the rttm file containing labels for the 'idx' wav audio

read_rttm: def read_rttm(self, path)

Parameters:

Argument	Detail
`path:`	str, Path to the RTTM diarization file

Returns:

Variable	Detail
`rttm_out:`	numpy.ndarray, (..., 3) Array with column 1 holding start time of speaker, column 2 holding end time of speaker, and column 3 holding speaker label

def make_rttm()

def make_rttm(out_dir, name, labels, win_step):

Defined in utils.py

Create RTTM Diarization files for non-overlapping speaker labels in var labels. Assumes non-speech part to have value -1 and speech part to have some speaker label (0, 1, 2, ...).

Parameters:

Argument	Detail
`out_dir:`	str, Directory where the output RTTM diarization files to be saved
`name:`	str, name for the audio files for which diarization was predicted
`labels:`	int, Speaker/ Non-speech labels assigned to different audio segments based on the win_step used to extract feature vectors
`win_step:`	int, Step (in ms) between two windows of audio segments used for feature extraction

Returns:

Variable	Detail
`return variable:`	str, Path to the saved RTTM diarization file

def get_metrics()

def get_metrics(groundtruth_path, hypothesis_path, collar=0.25, skip_overlap=True):

Defined in utils.py

Evaluate the diarization results of all the predicted RTTM files present in hypothesis directory to the grountruth RTTM files present in groundtruth directory.

Parameters:

Argument	Detail
`groundtruth_path:`	str, directory of groundtruth rttm files
`hypothesis_path:`	str, directory of hypothesis rttm files
`collar:`	float, Duration (in seconds) of collars removed from evaluation around boundaries of reference segments
`skip_overlap:`	bool, If True, calculates Diarization Error Rate ignoring the overlapped region

Returns:

Variable	Detail
`metric:`	pyannote.metrics, Pyannote metric class having diarization DERs for all the files.

def plot_annot()

def plot_annot(name="IS1009a", collar=0.25, skip_overlap=True, groundtruth_path=None, hypothesis_path=None):

Defined in utils.py

Calculate the Diarization Error Rate for filename specified, and print the groundtruth and hypothesis time series plot.

Parameters:

Argument	Detail
`name:`	str, Name of the file whose time series plot is to be generated. File must be present in the `hypothesis_path` folder
`collar:`	float, Duration (in seconds) of collars removed from evaluation around boundaries of reference segments
`skip_overlap:`	bool, If True, calculates Diarization Error Rate ignoring the overlapped region
`groundtruth_path:`	str, Directory of groundtruth rttm files
`hypothesis_path:`	str, Directory of hypothesis rttm files

Defined in baselineMethods.py

def diarizationOracleNumSpkrs()

def diarizationOracleNumSpkrs(audio_dataset, method="KMeans"):

Defined in baselineMethods.py

Predict the diarization labels using the oracle number of speakers for all the audio files in audio_dataset with KMeans/ Spectral clustering algorithm.

Parameters:

Argument	Detail
`audio_dataset:`	utils.DiarizationDataset, Diarization dataset
`method:`	str, Name of the method to be used for clustering part. Supports: "KMeans" or "Spectral"

Returns:

Variable	Detail
`hypothesis_dir:`	str, Directory where all the predicted RTTM diarization files are saved

def diarizationEigenGapNumSpkrs()

def diarizationEigenGapNumSpkrs(audio_dataset):

Defined in baselineMethods.py

Predict the diarization labels using for all the audio files in audio_dataset with Spectral clustering algorithm. It uses Eigen principle to predict the optimal number of speakers. The module uses already implented spectral algorithm from here: https://github.com/wq2012/SpectralCluster

Parameters:

Argument	Detail
`audio_dataset:`	utils.DiarizationDataset, Diarization dataset

Returns:

Variable	Detail
`hypothesis_dir:`	str, Directory where all the predicted RTTM diarization files are saved

Defined in optimumSpeaker.py

Inspired from https://github.com/wq2012/SpectralCluster

class eigengap()

class eigengap(min_clusters=1, 
               max_clusters=100, 
               p_percentile=0.9, 
               gaussian_blur_sigma=2, 
               stop_eigenvalue=1e-2,
               thresholding_soft_multiplier=0.01, 
               thresholding_with_row_max=True)

Defined in optimumSpeaker.py

Utility function to decide the optimal number of speakers for clustering based on maximization of eigen-gap of the affinity matrix

Parameters:

Argument	Detail
`min_clusters:`	int, Minimum number of output clusters
`max_clusters:`	int, Maximum number of output clusters
`p_percentile:`	float, Parameter to computing p-th percentile for percentile based thresholding
`gaussian_blur_sigma:`	float, sigma value for standard deviation of gaussian kernel in scipy gaussian filter
`stop_eigenvalue:`	float, Minimum value of eigenvalue of Affinity matrix for its eigenvector to be considered in clustering
`thresholding_soft_mutiplier:`	float, Factor to multiply to cells with value less than threshold in row/percentile thresholding. Parameter value of 0.0 turn cells less than threshold to zero in the matrix
`thresholding_with_row_max:`	bool, True for row-max thresholding, False for percentile thresholding

Class Functions:

_get_refinement_operator:

def _get_refinement_operator(self, name)

Parameters:

Argument	Detail
`name:`	str, Get the input refinement operator. Available refinements- `'CropDiagonal'`, `'GaussianBlur'`, `'RowWiseThreshold'`, `'Symmetrize'`, `'Diffuse'`, `'RowWiseNormalize'`

Returns:

Variable	Detail
`CropDiagonal()`/`GaussianBlur()`/ `RowWiseThreshold()`/`Symmetrize()`/ `Diffuse()`/`RowWiseNormalize()`	optimumSpeaker.AffinityRefinementOperation, Returns specified refinement method class

compute_affinity_matrix:

def compute_affinity_matrix(self, X)

Compute the affinity matrix for a matrix X with row as each instance and column as features by calculating cosine similarity between pair of l2 normalized columns of X

Parameters:

Argument	Detail
`X:`	numpy.ndarray, (n_windows, n_features) Input matrix with column as features to compute affinity matrix between pair of columns

Returns:

Variable	Detail
`affinity:`	numpy.ndarray, (n_windows, n_windows) Symmetric array with (i,j)th value equal to cosine similiarity between i-th and j-th row

compute_sorted_eigenvectors:

def compute_sorted_eigenvectors(self, A)

Parameters:

Argument	Detail
`A:`	numpy.ndarray, (n_windows, n_windows) Symmetric array with (i,j)th value equal to cosine similiarity between i-th and j-th row

Returns:

Variable	Detail
`w:`	numpy.ndarray, Decreasing order sorted eigen values of affinity matrix A
`v:`	numpy.ndarray, Eigen vectors corresponding to eigen values returned

compute_number_of_clusters:

def compute_number_of_clusters(self, eigenvalues, max_clusters, stop_eigenvalue)

Parameters:

Argument	Detail
`eigenvalues:`	numpy.ndarray, Decreasing order sorted eigen values of affinity matrix between different windows
`max_clusters:`	int, Maximum number of clusters required. Default `'None'` puts no such limit to the number of clusters
`stop_eigenvalue:`	float, Minimum value of eigenvalue to be considered for deciding number of clusters. Eigenvalues below this value are discarded

Returns:

Variable	Detail
`max_delta_index:`	int, Index to the eigenvalue such that eigen gap is maximized. It gives the number of clusters determined by the function

find

def find(self, X)

Parameters:

Argument	Detail
`X:`	numpy.ndarray, (n_windows, n_features) Input matrix with column as features to compute affinity matrix between pair of columns

Returns:

Variable	Detail
`k:`	int, Number of clusters calculated after creating the affinity matrix, applying refinements, and using eigen-gap maximization. `self.min_clusters` ≤ `k` ≤ `self.max_clusters`

class AffinityRefinementOperation()

class AffinityRefinementOperation(metaclass=abc.ABCMeta)

Defined in optimumSpeaker.py

Meta class to the refinement operation classes passed as input to be perfomed on the data

Class Functions:

check_input:

def check_input(self, X)

Parameters:

Argument	Detail
`X:`	numpy.ndarray, Input array to be refined by refinement operators

Returns:

Variable	Detail
`ValueError()`\ `TypeError()`	ValueError/TypeError, Type Error if X is not a numpy array. Value error if X is not a 2D square matrix

refine:

def refine(self, X)

Abstract function redefined in various child classes of class AffinityRefinementOperation

Parameters:

Argument	Detail
`X:`	numpy.ndarray, Input array to be refined by refinement operators

class CropDiagonal()

class Cropdiagonal(AffinityRefinementOperation)

Defined in optimumSpeaker.py

Operator to replace diagonal element by the max non-diagonal value of row. Post operation, the matrix has similar properties to a standard Laplacian matrix. This also helps to avoid the bias during Gaussian blur and normalization.

Class Functions:

refine:

def refine(self, X)

Parameters:

Argument	Detail
`X:`	numpy.ndarray, Input array to be refined by refinement operators

Returns:

Variable	Detail
`Y:`	numpy.ndarray, Output array with Crop diagonal refinement applied

class GaussianBlur()

class GaussianBlur(AffinityRefinementOperation)
      def __init__(self, sigma = 1)

Defined in optimumSpeaker.py

Operator to apply gaussian filter to the input array. Uses scipy.ndimage.gaussian_filter

Parameters:

Argument	Detail
`sigma:`	float, Standard deviation for Gaussian kernel

Class Functions:

refine:

def refine(self, X)

Parameters:

Argument	Detail
`X:`	numpy.ndarray, Input array to be refined by refinement operators

Returns:

Variable	Detail
`Y:`	numpy.ndarray, Output array with gaussian filter applied

class RowWiseThreshold()

class RowWiseThreshold(AffinityRefinementOperation)
      def __init__(self,
                 p_percentile=0.95,
                 thresholding_soft_multiplier=0.01,
                 thresholding_with_row_max=False)

Defined in optimumSpeaker.py

Operator to apply row wise thresholding based on either percentile or row-max thresholding.

Parameters:

Argument	Detail
`p_percentile:`	float, Standard deviation for Gaussian kernel
`thresholding_soft_multiplier:`	float, Factor to multiply to cells with value less than threshold in row/percentile thresholding. Parameter value of 0.0 turn cells less than threshold to zero in the matrix
`thresholding_with_row_max:`	bool, `True` applies row-max based thresholding, `False` applies percentile based thresholding

Class Functions:

refine:

def refine(self, X)

Parameters:

Argument	Detail
`X:`	numpy.ndarray, Input array to be refined by refinement operators

Returns:

Variable	Detail
`Y:`	numpy.ndarray, Output array with row wise threshold applied

class Symmetrize()

class Cropdiagonal(AffinityRefinementOperation)

Defined in optimumSpeaker.py

Operator to return a symmetric matrix based on max{ X, X^T } from a given input matrix X.

Class Functions:

refine:

def refine(self, X)

Parameters:

Argument	Detail
`X:`	numpy.ndarray, Input array to be used to create a symmetric matrix

Returns:

Variable	Detail
`Y:`	numpy.ndarray, Output symmetric array

class Diffuse()

class Diffuse(AffinityRefinementOperation)

Defined in optimumSpeaker.py

Operator to return a diffused symmetric matrix X^TX from a given input matrix X.

Class Functions:

refine:

def refine(self, X)

Parameters:

Argument	Detail
`X:`	numpy.ndarray, Input array to be used to create a diffused symmetric matrix

Returns:

Variable	Detail
`Y:`	numpy.ndarray, Output diffused symmetric array

class RowWiseNormalize()

class RowWiseNormalize(AffinityRefinementOperation)

Defined in optimumSpeaker.py

Operator to normalize each row of input matrix X by the maximum value in the corresponding rows.

Class Functions:

refine:

def refine(self, X)

Parameters:

Argument	Detail
`X:`	numpy.ndarray, Input array to be row normalized

Returns:

Variable	Detail
`Y:`	numpy.ndarray, Output row normalized array

Defined in DEC.py

class ResidualAutoEncoder()

class ResidualAutoEncoder(ip_features,
                          hidden_dims=[500, 500, 2000, 30]))

Defined in DEC.py

Create a torch.nn.Module for a deep autoencoder composed of Residual Neural Network (ResNet) bloacks as the encoder and decoder layer. Activation used is ReLU. The bottleneck encoder output and final decoder output are not activated to avoid data loss due to ReLU activation.

Parameters:

Argument	Detail
`ip_features:`	int, Input features size
`hidden_dims:`	list of int, List of hidden dimension features. Last element on the list is the output dimension of bottleneck of the autoencoder

Returns:

Variable	Detail
`z:`	torch.Tensor, Output from the bottle encoder of the deep autoencoder network.
`xo:`	list of torch.Tensor, Output from each encoder except the bottle encoder of the deep autoencoder. First item of the list is the input given to the system.
`xr:`	list of torch.Tensor, Reconstruction of inputs to each encoder layer of autoencoder. xr is reversed so that i-th item in list xr is the reconstruction of i-th item in list xo. Eg. First item of xo is the input to the ResidualAutoEncoder network, and first item of xr is the reconstruction from the ResidualAutoEncoder network.

def load_encoder()

def load_encoder():

Defined in DEC.py

Load weights from the ResidualAutoEncoder trained on the training data.

Returns:

Variable	Detail
`model:`	_ResidualAutoEncoder, Model with input feature size of 192, and hidden layers of size 500, 500, 2000, 30. Weights of the model initialized to weight of the autoencoder trained on training data.

class ClusteringModule()

class ClusteringModule(nn.Module):
    def __init__(self,
                 num_clusters,
                 encoder, data,
                 cinit = "KMeans"):

Defined in DEC.py

Clustering module of the deep embedding clustering (DEC) algorithm. It uses the trained encoder of the ResidualAutoEncoder to initialize the DEC Clustering network. Kmeans is used to initialize centroids in the latent space.

Parameters:

Argument	Detail
`num_clusters:`	str, Number of clusters to create from the algorithm
`encoder:`	nn.Module, Pre-trained encoder for intializing the centroids. Encoder tranforms data to the latent space for clustering
`cinit:`	str, Initialization method of centroids of clusters. Default `KMeans`

Returns:

Variable	Detail
`q:`	torch.Tensor, Tensor of similarity between embedding points z_i and centroid mu_j. Assumes Student's t distribution as the kernel
`p:`	torch.Tensor, Tensor of target distribution based on soft assignment of q_i
`xo[0]`	torch.Tensor, Input data to the ResidualAutoEncoder
`xr[0]`	-torch.Tensor_, Reconstructed input by the ResidualAutoEncoder

Class Functions:

init_centroid:

def init_centroid(self,
                  data,
                  method = "KMeans")

Returns clustered data after calculating the optimal number of speakers using eigen-gap method, and then clustering the data based on the method specified.

Parameters:

Argument	Detail
`data:`	torch.Tensor, Input data to be clustered
`method:`	numpy.ndarray, Clustering method. Default `KMeans`. Options `KMeans`/`Spectral`

Returns:

Variable	Detail
`output:`	torch.Tensor, Tensor containing intialized centroids for the dataset

class DEC()

class DEC(self,
          num_clusters,
          encoder, data,
          cinit = "KMeans"):

Defined in DEC.py

Deep embedding clustering (DEC) algorithm. It uses the trained encoder of the ResidualAutoEncoder to initialize the DEC Clustering network. It calls ClusteringModule class to initialize the centroids.

Parameters:

Argument	Detail
`encoder:`	nn.Module, Pre-trained encoder for intializing the centroids. Encoder tranforms data to the latent space for clustering
`num_clusters:`	str, Number of clusters to create from the algorithm. Default `None` uses eigengap to determine number of clusters
`cinit:`	str, Initialization method of centroids of clusters. Default `KMeans`. Options `KMeans`/`Spectral`

Class Functions:

fit:

def fit(self,
        data,
        y_true = None,
        niter = 150,
        lrEnc = 1e-4,
        lrCC = 1e-4,
        verbose = False)

Trains the algorithm by measuring the KL Divergence between target and observed distributions. Also updates the ResidualAutoEncoder using MSE loss in parallel to improve the latent space project of the data for better clustering. Both the updates use the Adams optimizer and the objective function is a linear combination of KL Divergence between target and observed distribution, and MSE Loss between input data and its reconstruction by the ResidualAutoEncoder.

Parameters:

Argument	Detail
`data:`	torch.Tensor, Input data to be clustered
`y_true:`	numpy.ndarray, True labels of the data we aim to cluster. `predict()` and `clusterAccuracy()` functions are invoked only if y_true is not `None`
`niter`	int, Number of epochs to train the model for
`lrEnc`	float, Learning rate for updating the encoder
`lrCC`	float, Learning rate for updating the cluster centres
`verbose`	bool, `True` value activates the tqdm progress bar while training. `False` returns no updates when training

predict: def predict(self, data)

Predict the cluster label to the data by inspecting the label about which the observed distribution is maximized.

Parameters:

Argument	Detail
`data:`	torch.Tensor, Input data to be labels after clustering

Returns:

Variable	Detail
`y_pred:`	numpy.ndarray, Soft prediction labels of the data

clusterAccuracy: def clusterAccuracy(self, y_pred, y_true)

Predict the cluster labels accuracy as the maximum accuracy between y_pred and y_true for all the permutation of y_pred. This permutation is found by linear_sum_assignment optimization function of scipy.

Parameters:

Argument	Detail
`y_pred:`	numpy.ndarray, Prediction of the labels by DEC algorithm
`y_true`	numpy.ndarray, True labels of the data

Returns:

Variable	Detail
`accuracy:`	float, Cluster assignment accuracy
`reassignment:`	dict, dictionary with key as rows and value as cols indices for the optimal assignment

def diarizationDEC()

def diarizationDEC(audio_dataset,
                   num_spkr = None,
                   hypothesis_dir = None)

Defined in DEC.py

Compute diarization labels based on oracle number of speakers if num_spkr = 'oracle'. Used as an optimal benchmark for performance of DEC. If num_spkr = None, uses eigen-gap maximization in the ClusteringModule to determine the number of speakers.

Parameters:

Argument	Detail
`audio_dataset:`	utils.DiarizationDataset, Test diarization dataset
`num_spkr:`	str, `None` for calculating the optimal number of speakers from eigen-gap maximization. `oracle` for using the number of speakers in each window given with the data.
`hypothesis_dir:`	str, Directory to store the predicted speaker labels in the audio segments in an rttm file. `None` stores it in `./rttm_output/` directory

Returns:

Variable	Detail
`hypothesis_dir:`	str, Directory to the rttm files containing predicted speaker labels with their timestamps

Defined in: colab_demo_utils.py

def downloadYouTube()

def downloadYouTube(videourl, path):

Defined in colab_demo_utils.py

Download video from YouTube in .mp4 format using Video URL.

Parameters:

Argument	Detail
`videourl:`	str, URL of the YouTube video to download
`path:`	str, directory to save the YouTube video. If directory does not exist, it is created.

Returns:

Variable	Detail
`save_dir:`	str, Save directory location

def loadVideoFile()

def loadVideoFile(playvideo_file=False):

Defined in colab_demo_utils.py

Load video file either from YouTube or from your local directory into your current session working directory. Also extracts and stores its audio file in .wav format.

Parameters:

Argument	Detail
`playvideo_file:`	bool, If True, plays the video after loading in the working directory. Default=`False`

Returns:

Variable	Detail
`video_dir:`	str, Returns the path to the saved video

def read_rttm()

def read_rttm(path):

Defined in colab_demo_utils.py

Create hypothesis labels for each window using .rttm file.

Parameters:

Argument	Detail
`path:`	str, Path to the rttm file

Returns:

Variable	Detail
`hypothesis_labels:`	numpy.ndarray, (n_instances, 3) i-th row's first, second and third column contains start, end, and speaker id of the i-th instance of speech.

def combine_audio()

def combine_audio(vidname, audname, outname, fps):

Defined in colab_demo_utils.py

Combine cv2 processed silent video with its audio file to output the complete annotated video.

Parameters:

Argument	Detail
`vidname:`	str, Path to the silent video
`audname:`	str, Path to the audio file to be attached
`outname:`	str, Output video file name
`fps:`	int, Frame rate of the video

def createAnnotatedVideo()

def createAnnotatedVideo(audio_dataset, hypothesis_dir):

Defined in colab_demo_utils.py

Use cv2 to put annotations in the video using the hypothesis labels.

Parameters:

Argument	Detail
`audio_dataset:`	utils.DiarizationDataset, Dataset pipeline
`hypothesis_dir:`	str, Path to the directory with hypothesis labels rttm files

Returns:

Variable	Detail
`op_video_name:`	str, Annotated output video filename

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation_Model.md

Documentation_Model.md

Documentation: EE698R DEC based Diarization Model

API Documentation

Index

Defined in: utils.py

class DiarizationDataset()

def make_rttm()

def get_metrics()

def plot_annot()

Defined in baselineMethods.py

def diarizationOracleNumSpkrs()

def diarizationEigenGapNumSpkrs()

Defined in optimumSpeaker.py

class eigengap()

class AffinityRefinementOperation()

class CropDiagonal()

class GaussianBlur()

class RowWiseThreshold()

class Symmetrize()

class Diffuse()

class RowWiseNormalize()

Defined in DEC.py

class ResidualAutoEncoder()

def load_encoder()

class ClusteringModule()

class DEC()

def diarizationDEC()

Defined in: colab_demo_utils.py

def downloadYouTube()

def loadVideoFile()

def read_rttm()

def combine_audio()

def createAnnotatedVideo()

Files

Documentation_Model.md

Latest commit

History

Documentation_Model.md

File metadata and controls

Documentation: EE698R DEC based Diarization Model

API Documentation

Index

Defined in: utils.py

class DiarizationDataset()

def make_rttm()

def get_metrics()

def plot_annot()

Defined in baselineMethods.py

def diarizationOracleNumSpkrs()

def diarizationEigenGapNumSpkrs()

Defined in optimumSpeaker.py

class eigengap()

class AffinityRefinementOperation()

class CropDiagonal()

class GaussianBlur()

class RowWiseThreshold()

class Symmetrize()

class Diffuse()

class RowWiseNormalize()

Defined in DEC.py

class ResidualAutoEncoder()

def load_encoder()

class ClusteringModule()

class DEC()

def diarizationDEC()

Defined in: colab_demo_utils.py

def downloadYouTube()

def loadVideoFile()

def read_rttm()

def combine_audio()

def createAnnotatedVideo()