This project deals with the Automatic Speech Language Identification System (LID) to accurately identify the language of speech from its small sample.We have used 4 different languages namely, Indian English, Hindi, Bangla and Telugu. Artificial Neural Networks are used for making clear distintion between the speech samples of different languages.
Speech Features used for this purpose are Mel Frequency Cepstrum Coefficients (MFCCs) and MFCCs with Shifted Delta Cepstrum (SDCs). SDCs are used for capturing more of the temporal information.We have also used simple Gaussian Mixture Models (GMMs) as the basic approach to solve this problem.Please refer to the report for more detailed information.