This repo is to list the references papers of Speaker Recognition Based on Deep Learning: An Overview
The analysis of the paper can refer to here:基于深度学习的说话人识别概述
All references are sorted in the order of the original paper, and the pdf document can be downloaded in the Release
on the right.
1 An overview of text-independent speaker recognition From features to supervectors .pdf
2 Forensic speaker recognition.pdf
3 The inference of identity in forensic speaker recognition.pdf
4 an overview of speaker identification accuracy and robustness issues.pdf
5 Talker‐Recognition Procedure Based on Analysis of Variance.pdf
6 Speaker Verification Using Adapted Gaussain Mixture Models.pdf
7 Support vector machines using GMM supervectors for speaker verification.pdf
8 Joint factor analysis versus eigenchannels in speaker recognition.pdf
9 Front-end factor analysis for speaker verification.pdf
10 Bayesian speaker verification with heavy tailed priors.pdf
11 Analysis of i-vector length normalization in speaker recognition systems.pdf
12 A novel scheme for speaker recognition using a phonetically-aware deep neural network.pdf
13 Deep neural networks for small footprint text-dependent speaker verification.pdf
14 Xvectors Robust DNN embeddings for speaker recognition.pdf
15 Voxceleb a large-scale speaker identification dataset.pdf
17 Speaker recognition by machines and humans A tutorial review.pdf
18 An overview of automatic speaker recognition technology.pdf
19 An overview of statistical pattern recognition techniques for speaker verification.pdf
20 Spoofing and countermeasures for speaker verification A survey.pdf
21 Speaker diarization A review of recent research.pdf
22 The Attacker's Perspective on Automatic Speaker Verification An Overview.pdf
23 Speaker verification using deep neural networks A review.pdf
24 Learning Discriminative Features for Speaker Identification and Verification.pdf
25 Centroid-based deep metric learning for speaker recognition.pdf
26 An End-to-End Text-Independent Speaker Identification System on Short Utterances.pdf
27 Combining deep embeddings of acoustic and articulatory features for speaker identification.pdf
28 A Memory Augmented Architecture for Continuous Speaker Identification in Meetings.pdf
29 X-vectors Robust neural embeddings for speaker recognition.pdf
30 A study of interspeaker variability in speaker verification.pdf
31 Deep neural network based posteriors for text-dependent speaker verification.pdf
32 Deep Neural Networks and Hidden Markov Models in i-vector-based Text-Dependent Speaker Verification.pdf
33 A deep neural network speaker verification system targeting microphone speech.pdf
34 Exploiting sequence information for text-dependent speaker verification.pdf
35 Text-dependent speaker verification based on i-vectors, neural networks and hidden Markov models.pdf
36 Deep neural network approaches to speaker and language recognition.pdf
37 Deep Neural Networks for extracting Baum-Welch statistics for Speaker Recognition.pdf
38 Time delay deep neural network-based universal background models for speaker recognition.pdf
39 Advances in deep neural network approaches.pdf
40 Phone-centric local variability vector for text-constrained speaker verification.pdf
41 A unified deep neural network for speaker and language recognition.pdf
42 The IBM 2016 speaker recognition system.pdf
43 Application of convolutional neural networks to speaker recognition in noisy conditions.pdf
44 Insights into deep neural networks for speaker recognition.pdf
45 Exploring robustness of DNN RNN for extracting speaker Baum-Welch statistics in mismatched conditions.pdf
47 Exploring the role of phonetic bottleneck features for speaker and language recognition.pdf
48 Analysis and Optimization of Bottleneck Features for Speaker RecognitionK.pdf
49 Augmenting short-term cepstral features with long-term discriminative features for speaker verification of telephone data.pdf
50 Combination of cepstral and phonetically discriminative features for speaker verification.pdf
51 Deep Neural Network Embeddings for Text-Independent Speaker Verification.pdf
52 Towards directly modeling raw speech signal for speaker verification using CNNs.pdf
53 Speaker recognition from raw waveform with sincnet.pdf
54 Avoiding speaker overfitting in end-to-end dnns using raw waveform for text-independent speaker verification.pdf
55 A complete end-to-end speaker verification system using deep neural networks From raw signals to verification result.pdf
56 Rawnet Advanced end-to-end deep neural network using raw waveforms for text-independent speaker verification.pdf
57 Short utterance compensation in speaker verification via cosine-based teacher-student learning of speaker embeddings.pdf
58 Ensemble additive margin softmax for speaker verification.pdf
59 Voxceleb2 Deep speaker recognition.pdf
60 Utterance-level aggregation for speaker recognition in the wild.pdf
61 Frequency and temporal convolutional attention for text-independent speaker recognition.pdf
62 End-to-end text-independent speaker verification with triplet loss on short utterances..pdf
63 Text-independent speaker verification based on triplet convolutional neural network embeddings.pdf
64 Seq2seq attentional siamese neural networks for text-dependent speaker verification.pdf
65 Orthogonal training for text-independent speaker verification.pdf
66 Orthogonality regularizations for end-to-end speaker.pdf
67 JHU-HLTCOE system for the VoxSRC speaker recognition challenge.pdf
68 Multi-resolution multi-head attention in deep speaker embedding.pdf
69 Deep speaker an end-to-end neural speaker embedding system.pdf
70 Deep speaker representation using orthogonal decomposition and recombination for speaker verification.pdf
71 Magneto X-vector magnitude estimation network plus offset for improved speaker recognition.pdf
72 Deep Speaker Embeddings for Short-Duration Speaker Verification.pdf
73 Deep Discriminative Embeddings for Duration Robust Speaker Verification.pdf
74 Boundary discriminative large margin cosine loss for text-independent speaker verification.pdf
75 Text-independent speaker verification using 3d convolutional neural networks.pdf
76 Deep speaker feature learning for text-independent speaker verification.pdf
77 Attention-based models for text-dependent speaker verification.pdf
78 Generalized end-to-end loss for speaker verification.pdf
79 End-to-end text-dependent speaker verification.pdf
80 Improving deep CNN networks with long temporal context for text-independent speaker verification.pdf
81 Deep speaker embedding learning with multi-level pooling for text-independent speaker verification.pdf
82 Speaker recognition for multi-speaker conversations using x-vectors.pdf
83 Speaker embedding extraction with phonetic information.pdf
84 Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification..pdf
85 Attentive statistics pooling for deep speaker embedding.pdf
86 Partial AUC optimization based deep speaker embeddings with class-center learning for text-independent speaker verification.pdf
87 Gaussian-constrained training for speaker verification.pdf
88 Margin matters Towards more discriminative deep neural network embeddings for speaker recognition.pdf
89 State-of-the-Art Speaker Recognition for Telephone and Video Speech The JHU-MIT Submission for NIST SRE18.pdf
90 Statistics pooling time delay neural network based on X-vector for speaker verification.pdf
91 Bayesian x-vector Bayesian Neural Network based x-vector System for Speaker Verification.pdf
92 Deep Speaker Embedding Extraction with Channel-Wise Feature Responses and Additive Supervision Softmax Loss Function..pdf
93 An improved deep embedding learning method for short duration speaker verification.pdf
94 An Effective Deep Embedding Learning Architecture for Speaker Verification..pdf
95 Speaker characterization using tdnn-lstm based speaker embedding.pdf
96 A time delay neural network architecture for efficient modeling of long temporal contexts.pdf
97 Self-supervised speaker embeddings.pdf
98 JHU-HLTCOE system for the VoxSRC speaker recognition challenge.pdf
99 Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks..pdf
100 The JHU Speaker Recognition System for the VOiCES 2019 Challenge.pdf
101 Densely Connected Time Delay Neural Network for Speaker Verification.pdf
102 Compact Speaker Embedding lrx-vector.pdf
103 Deep residual learning for image recognition.pdf
104 Improved RawNet with Feature Map Scaling for Text-independent Speaker Verification using Raw Waveforms.pdf
105 Wav2Spk A Simple DNN Architecture for Learning Speaker Embeddings from Waveforms..pdf
106 BERTPHONE Phonetically-aware Encoder Representations for Utterance-level Speaker and Language Recognition.pdf
107 Self-attention encoding and pooling for speaker recognition.pdf
108 Autospeech Neural architecture search for speaker recognition.pdf
109 Evolutionary Algorithm Enhanced Neural Architecture Search for Text-Independent Speaker Verification.pdf
110 A Comparative Re-Assessment of Feature Extractors for Deep Speaker Embeddings.pdf
111 State-of-the-art speaker recognition with neural network embeddings in NIST SRE18 and speakers in the wild evaluations.pdf
112 Am-mobilenet1d A portable model for speaker recognition.pdf
113 Attention is all you need.pdf
114 A structured self-attentive sentence embedding.pdf
115 Deeply Fused Speaker Embeddings for Text-Independent Speaker Verification..pdf
116 An Improved Deep Neural Network for Modeling Speaker Characteristics at Different Temporal Scales.pdf
117 Cnn with phonetic attention for text-independent speaker verification.pdf
118 Self multi-head attention for speaker recognition.pdf
119 Vector-Based Attentive Pooling for Text-Independent Speaker Verification.pdf
120 NetVLAD CNN architecture for weakly supervised place recognition.pdf
121 Ghostvlad for set-based face recognition.pdf
122 Exploring the encoding layer and loss function in end-to-end speaker and language recognition system.pdf
123 Spatial pyramid encoding with convex length normalization for text-independent speaker verification.pdf
124 Total variability layer in deep neural network embeddings for speaker verification.pdf
125 Improving Aggregation and Loss Function for Better Embedding Learning in End-to-End Speaker Verification System.pdf
126 Shortcut Connections Based Deep Speaker Embeddings for End-to-End Speaker Verification System.pdf
127 A deep neural network for short-segment speaker recognition.pdf
128 Improving multi-scale aggregation using feature pyramid module for robust speaker verification of variable-duration utterances.pdf
129 Sphereface Deep hypersphere embedding for face recognition.pdf
130 Angular Softmax for Short-Duration Text-independent Speaker Verification.pdf
131 On deep speaker embeddings for text-independent speaker recognition.pdf
132 Unified hypersphere embedding for speaker recognition.pdf
133 Dynamic Margin Softmax Loss for Speaker Verification.pdf
134 Large margin softmax loss for speaker verification.pdf
135 On Parameter Adaptation in Softmax-based Cross-Entropy Loss for Improved Convergence Speed and Accuracy in DNN-based Speaker Recognition.pdf
136 Additive margin softmax for face verification.pdf
137 Cosface Large margin cosine loss for deep face recognition.pdf
138 Arcface Additive angular margin loss for deep face recognition.pdf
139 Discriminative neural embedding learning for short-duration text-independent speaker verification.pdf
140 A discriminative feature learning approach for deep face recognition.pdf
141 Multi-Task Discriminative Training of Hybrid DNN-TVM Model for Speaker Verification with Noisy and Far-Field Speech.pdf
142 Multi-task learning for text-dependent speaker verification.pdf
143 Deep feature for text-dependent speaker verification.pdf
144 DNN based speaker embedding using content information for text-dependent speaker verification.pdf
145 On the Usage of Phonetic Information for Text-Independent Speaker Embedding Extraction..pdf
146 Collaborative joint training with multitask recurrent model for speech and speaker recognition.pdf
147 SNR-invariant multitask deep neural networks for robust speaker verification.pdf
148 Multi-task network for noise-robust keyword spotting and speaker verification using CTC-based soft VAD and global query attention.pdf
149 End-to-end attention based text-dependent speaker verification.pdf
150 Deep neural network-based speaker embeddings for end-to-end speaker verification.pdf
151 End-to-end DNN based speaker recognition inspired by i-vector and PLDA.pdf
153 Optimization of False Acceptance Rejection Rates and Decision Threshold for End-to-End Text-Dependent Speaker Verification Systems.pdf
154 Joint i-vector with end-to-end system for short duration text-independent speaker verification.pdf
155 Tristounet triplet loss for speaker turn embedding.pdf
156 Speaker verification by partial AUC optimization with mahalanobis distance metric learning.pdf
157 End-to-end Text-dependent Speaker Verification Using Novel Distance Measures..pdf
158 Facenet A unified embedding for face recognition and clustering.pdf
159 Prototypical networks for few-shot learning.pdf
160 Few shot speaker recognition using deep neural networks.pdf
161 In defence of metric learning for speaker recognition.pdf
162 Meta-learning for short utterance speaker recognition with imbalance length pairs.pdf
163 Angular Margin Centroid Loss for Text-independent Speaker Recognition.pdf
164 End-to-end losses based on speaker basis vectors and all-speaker hard negative mining for speaker verification.pdf
165 DIHARD II is still hard Experimental results and discussions from the DKU-LENOVO team.pdf
166 Speaker diarization using deep neural network embeddings.pdf
167 Diarization is Hard Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge..pdf
168 Speaker diarization using latent space clustering in generative adversarial network.pdf
169 Speaker diarization with lstm.pdf
170 Speaker segmentation using deep speaker vectors for fast speaker change scenarios.pdf
171 Context and Uncertainty Modeling for Online Speaker Change Detection.pdf
172 Pre-training of speaker embeddings for low-latency speaker change detection in broadcast news.pdf
173 Speaker, environment and channel change detection and clustering via the bayesian information criterion.pdf
174 Automatic segmentation, classification and clustering of broadcast news audio.pdf
176 An Unsupervised Neural Prediction Framework for Learning Speaker Embeddings Using Recurrent Neural Networks.pdf
178 Convolutional neural network for speaker change detection in telephone speaker diarization system.pdf
179 Speaker diarization with PLDA i-vector scoring and unsupervised calibration.pdf
180 Speaker diarization with i-vectors from DNN senone posteriors.pdf
181 Bayesian HMM Based x-Vector Clustering for Speaker Diarization.pdf
182 But system for the second dihard speech diarization challenge.pdf
183 Optimizing Bayesian HMM based x-vector clustering for the second DIHARD speech diarization challenge.pdf
184 A comparison of neural network feature transforms for speaker diarization.pdf
185 Speaker diarisation using 2D self-attentive combination of embeddings.pdf
186 Speaker embeddings incorporating acoustic conditions for diarization.pdf
187 Speaker diarization with session-level speaker embedding refinement using graph neural networks.pdf
188 LSTM based similarity measurement with spectral clustering for speaker diarization.pdf
189 Self-attentive similarity measurement strategies in speaker diarization.pdf
190 Multilayer bootstrap networks.pdf
191 Universal Background Sparse Coding and Multilayer Bootstrap Network for Speaker Clustering.pdf
192 An Investigation of Speaker Clustering Algorithms in Adverse Acoustic Environments.pdf
193 Fully supervised speaker diarization.pdf
194 DNN-based speaker clustering for speaker diarisation.pdf
195 Active learning based constrained clustering for speaker diarization.pdf
196 Discriminative neural clustering for speaker diarisation.pdf
197 Supervised online diarization with sample mean loss for multi-domain data.pdf
198 Diarization resegmentation in the factor analysis subspace.pdf
199 Speaker Diarization based on Bayesian HMM with Eigenvoice Priors.pdf
200 Neural speech turn segmentation and affinity propagation for speaker diarization.pdf