-
Notifications
You must be signed in to change notification settings - Fork 0
DEC
class ResidualAutoEncoder(ip_features,
hidden_dims=[500, 500, 2000, 30]))
Defined in DEC.py
Create a torch.nn.Module for a deep autoencoder composed of Residual Neural Network (ResNet) bloacks as the encoder and decoder layer. Activation used is ReLU. The bottleneck encoder output and final decoder output are not activated to avoid data loss due to ReLU activation.
Parameters:
Argument | Detail |
---|---|
ip_features: |
int, Input features size |
hidden_dims: |
list of int, List of hidden dimension features. Last element on the list is the output dimension of bottleneck of the autoencoder |
Returns:
Variable | Detail |
---|---|
z: |
torch.Tensor, Output from the bottle encoder of the deep autoencoder network. |
xo: |
list of torch.Tensor, Output from each encoder except the bottle encoder of the deep autoencoder. First item of the list is the input given to the system. |
xr: |
list of torch.Tensor, Reconstruction of inputs to each encoder layer of autoencoder. xr is reversed so that i-th item in list xr is the reconstruction of i-th item in list xo. Eg. First item of xo is the input to the ResidualAutoEncoder network, and first item of xr is the reconstruction from the ResidualAutoEncoder network. |
def load_encoder():
Defined in DEC.py
Load weights from the ResidualAutoEncoder trained on the training data.
Returns:
Variable | Detail |
---|---|
model: |
_ResidualAutoEncoder, Model with input feature size of 192, and hidden layers of size 500, 500, 2000, 30. Weights of the model initialized to weight of the autoencoder trained on training data. |
class ClusteringModule(nn.Module):
def __init__(self,
num_clusters,
encoder, data,
cinit = "KMeans"):
Defined in DEC.py
Clustering module of the deep embedding clustering (DEC) algorithm. It uses the trained encoder of the ResidualAutoEncoder to initialize the DEC Clustering network. Kmeans is used to initialize centroids in the latent space.
Parameters:
Argument | Detail |
---|---|
num_clusters: |
str, Number of clusters to create from the algorithm |
encoder: |
nn.Module, Pre-trained encoder for intializing the centroids. Encoder tranforms data to the latent space for clustering |
cinit: |
str, Initialization method of centroids of clusters. Default KMeans
|
Returns:
Variable | Detail |
---|---|
q: |
torch.Tensor, Tensor of similarity between embedding points z_i and centroid mu_j. Assumes Student's t distribution as the kernel |
p: |
torch.Tensor, Tensor of target distribution based on soft assignment of q_i |
xo[0] |
torch.Tensor, Input data to the ResidualAutoEncoder |
xr[0] |
-torch.Tensor_, Reconstructed input by the ResidualAutoEncoder |
Class Functions:
def init_centroid(self,
data,
method = "KMeans")
Returns clustered data after calculating the optimal number of speakers using eigen-gap method, and then clustering the data based on the method specified.
Parameters:
Argument | Detail |
---|---|
data: |
torch.Tensor, Input data to be clustered |
method: |
numpy.ndarray, Clustering method. Default KMeans . Options KMeans /Spectral
|
Returns:
Variable | Detail |
---|---|
output: |
torch.Tensor, Tensor containing intialized centroids for the dataset |
class DEC(self,
num_clusters,
encoder, data,
cinit = "KMeans"):
Defined in DEC.py
Deep embedding clustering (DEC) algorithm. It uses the trained encoder of the ResidualAutoEncoder to initialize the DEC Clustering network. It calls ClusteringModule class to initialize the centroids.
Parameters:
Argument | Detail |
---|---|
encoder: |
nn.Module, Pre-trained encoder for intializing the centroids. Encoder tranforms data to the latent space for clustering |
num_clusters: |
str, Number of clusters to create from the algorithm. Default None uses eigengap to determine number of clusters |
cinit: |
str, Initialization method of centroids of clusters. Default KMeans . Options KMeans /Spectral
|
Class Functions:
def fit(self,
data,
y_true = None,
niter = 150,
lrEnc = 1e-4,
lrCC = 1e-4,
verbose = False)
Trains the algorithm by measuring the KL Divergence between target and observed distributions. Also updates the ResidualAutoEncoder using MSE loss in parallel to improve the latent space project of the data for better clustering. Both the updates use the Adams optimizer and the objective function is a linear combination of KL Divergence between target and observed distribution, and MSE Loss between input data and its reconstruction by the ResidualAutoEncoder.
Parameters:
Argument | Detail |
---|---|
data: |
torch.Tensor, Input data to be clustered |
y_true: |
numpy.ndarray, True labels of the data we aim to cluster. predict() and clusterAccuracy() functions are invoked only if y_true is not None
|
niter |
int, Number of epochs to train the model for |
lrEnc |
float, Learning rate for updating the encoder |
lrCC |
float, Learning rate for updating the cluster centres |
verbose |
bool, True value activates the tqdm progress bar while training. False returns no updates when training |
Predict the cluster label to the data by inspecting the label about which the observed distribution is maximized.
Parameters:
Argument | Detail |
---|---|
data: |
torch.Tensor, Input data to be labels after clustering |
Returns:
Variable | Detail |
---|---|
y_pred: |
numpy.ndarray, Soft prediction labels of the data |
Predict the cluster labels accuracy as the maximum accuracy between y_pred and y_true for all the permutation of y_pred. This permutation is found by linear_sum_assignment optimization function of scipy.
Parameters:
Argument | Detail |
---|---|
y_pred: |
numpy.ndarray, Prediction of the labels by DEC algorithm |
y_true |
numpy.ndarray, True labels of the data |
Returns:
Variable | Detail |
---|---|
accuracy: |
float, Cluster assignment accuracy |
reassignment: |
dict, dictionary with key as rows and value as cols indices for the optimal assignment |
def diarizationDEC(audio_dataset,
num_spkr = None,
hypothesis_dir = None)
Defined in DEC.py
Compute diarization labels based on oracle number of speakers if num_spkr = 'oracle'
. Used as an optimal benchmark for performance of DEC. If num_spkr = None
, uses eigen-gap maximization in the ClusteringModule to determine the number of speakers.
Parameters:
Argument | Detail |
---|---|
audio_dataset: |
utils.DiarizationDataset, Test diarization dataset |
num_spkr: |
str, None for calculating the optimal number of speakers from eigen-gap maximization. oracle for using the number of speakers in each window given with the data. |
hypothesis_dir: |
str, Directory to store the predicted speaker labels in the audio segments in an rttm file. None stores it in ./rttm_output/ directory |
Returns:
Variable | Detail |
---|---|
hypothesis_dir: |
str, Directory to the rttm files containing predicted speaker labels with their timestamps |