Skip to content
Shashi Kant edited this page May 11, 2021 · 2 revisions

Defined in DEC.py

Index

class ResidualAutoEncoder()

class ResidualAutoEncoder(ip_features,
                          hidden_dims=[500, 500, 2000, 30]))

Defined in DEC.py

Create a torch.nn.Module for a deep autoencoder composed of Residual Neural Network (ResNet) bloacks as the encoder and decoder layer. Activation used is ReLU. The bottleneck encoder output and final decoder output are not activated to avoid data loss due to ReLU activation.

Parameters:

Argument Detail
ip_features: int, Input features size
hidden_dims: list of int, List of hidden dimension features. Last element on the list is the output dimension of bottleneck of the autoencoder

Returns:

Variable Detail
z: torch.Tensor, Output from the bottle encoder of the deep autoencoder network.
xo: list of torch.Tensor, Output from each encoder except the bottle encoder of the deep autoencoder. First item of the list is the input given to the system.
xr: list of torch.Tensor, Reconstruction of inputs to each encoder layer of autoencoder. xr is reversed so that i-th item in list xr is the reconstruction of i-th item in list xo. Eg. First item of xo is the input to the ResidualAutoEncoder network, and first item of xr is the reconstruction from the ResidualAutoEncoder network.

def load_encoder()

def load_encoder():

Defined in DEC.py

Load weights from the ResidualAutoEncoder trained on the training data.

Returns:

Variable Detail
model: _ResidualAutoEncoder, Model with input feature size of 192, and hidden layers of size 500, 500, 2000, 30. Weights of the model initialized to weight of the autoencoder trained on training data.

class ClusteringModule()

class ClusteringModule(nn.Module):
    def __init__(self,
                 num_clusters,
                 encoder, data,
                 cinit = "KMeans"):

Defined in DEC.py

Clustering module of the deep embedding clustering (DEC) algorithm. It uses the trained encoder of the ResidualAutoEncoder to initialize the DEC Clustering network. Kmeans is used to initialize centroids in the latent space.

Parameters:

Argument Detail
num_clusters: str, Number of clusters to create from the algorithm
encoder: nn.Module, Pre-trained encoder for intializing the centroids. Encoder tranforms data to the latent space for clustering
cinit: str, Initialization method of centroids of clusters. Default KMeans

Returns:

Variable Detail
q: torch.Tensor, Tensor of similarity between embedding points z_i and centroid mu_j. Assumes Student's t distribution as the kernel
p: torch.Tensor, Tensor of target distribution based on soft assignment of q_i
xo[0] torch.Tensor, Input data to the ResidualAutoEncoder
xr[0] -torch.Tensor_, Reconstructed input by the ResidualAutoEncoder

Class Functions:

  1. init_centroid:
def init_centroid(self,
                  data,
                  method = "KMeans")

Returns clustered data after calculating the optimal number of speakers using eigen-gap method, and then clustering the data based on the method specified.

Parameters:

Argument Detail
data: torch.Tensor, Input data to be clustered
method: numpy.ndarray, Clustering method. Default KMeans. Options KMeans/Spectral

Returns:

Variable Detail
output: torch.Tensor, Tensor containing intialized centroids for the dataset

class DEC()

class DEC(self,
          num_clusters,
          encoder, data,
          cinit = "KMeans"):

Defined in DEC.py

Deep embedding clustering (DEC) algorithm. It uses the trained encoder of the ResidualAutoEncoder to initialize the DEC Clustering network. It calls ClusteringModule class to initialize the centroids.

Parameters:

Argument Detail
encoder: nn.Module, Pre-trained encoder for intializing the centroids. Encoder tranforms data to the latent space for clustering
num_clusters: str, Number of clusters to create from the algorithm. Default None uses eigengap to determine number of clusters
cinit: str, Initialization method of centroids of clusters. Default KMeans. Options KMeans/Spectral

Class Functions:

  1. fit:
def fit(self,
        data,
        y_true = None,
        niter = 150,
        lrEnc = 1e-4,
        lrCC = 1e-4,
        verbose = False)

Trains the algorithm by measuring the KL Divergence between target and observed distributions. Also updates the ResidualAutoEncoder using MSE loss in parallel to improve the latent space project of the data for better clustering. Both the updates use the Adams optimizer and the objective function is a linear combination of KL Divergence between target and observed distribution, and MSE Loss between input data and its reconstruction by the ResidualAutoEncoder.

Parameters:

Argument Detail
data: torch.Tensor, Input data to be clustered
y_true: numpy.ndarray, True labels of the data we aim to cluster. predict() and clusterAccuracy() functions are invoked only if y_true is not None
niter int, Number of epochs to train the model for
lrEnc float, Learning rate for updating the encoder
lrCC float, Learning rate for updating the cluster centres
verbose bool, True value activates the tqdm progress bar while training. False returns no updates when training
  1. predict: def predict(self, data)

Predict the cluster label to the data by inspecting the label about which the observed distribution is maximized.

Parameters:

Argument Detail
data: torch.Tensor, Input data to be labels after clustering

Returns:

Variable Detail
y_pred: numpy.ndarray, Soft prediction labels of the data
  1. clusterAccuracy: def clusterAccuracy(self, y_pred, y_true)

Predict the cluster labels accuracy as the maximum accuracy between y_pred and y_true for all the permutation of y_pred. This permutation is found by linear_sum_assignment optimization function of scipy.

Parameters:

Argument Detail
y_pred: numpy.ndarray, Prediction of the labels by DEC algorithm
y_true numpy.ndarray, True labels of the data

Returns:

Variable Detail
accuracy: float, Cluster assignment accuracy
reassignment: dict, dictionary with key as rows and value as cols indices for the optimal assignment

def diarizationDEC()

def diarizationDEC(audio_dataset,
                   num_spkr = None,
                   hypothesis_dir = None)

Defined in DEC.py

Compute diarization labels based on oracle number of speakers if num_spkr = 'oracle'. Used as an optimal benchmark for performance of DEC. If num_spkr = None, uses eigen-gap maximization in the ClusteringModule to determine the number of speakers.

Parameters:

Argument Detail
audio_dataset: utils.DiarizationDataset, Test diarization dataset
num_spkr: str, None for calculating the optimal number of speakers from eigen-gap maximization. oracle for using the number of speakers in each window given with the data.
hypothesis_dir: str, Directory to store the predicted speaker labels in the audio segments in an rttm file. None stores it in ./rttm_output/ directory

Returns:

Variable Detail
hypothesis_dir: str, Directory to the rttm files containing predicted speaker labels with their timestamps