Pema Galey. Development of automatic speech recognition for Dzongkha language using deep learning. Master's Degree(Electrical and Software Systems Engineering (International Program)). King Mongkut's University of Technology North Bangkok. Central Library. : King Mongkut's University of Technology North Bangkok, 2022.
Development of automatic speech recognition for Dzongkha language using deep learning
Abstract:
Natural Language Processing (NLP) is a field of Computer Science and Computational Linguistics that is concerned with the communication between humans and computers in natural language. Among many fields in the NLP, speech recognition has grown in popularity for detecting speech and converting it to the language text. However, this study on automatic speech recognition for Dzongkha is a first of a kind.
The Dzongkha ASR system is trained using the Dzongkha corpus of 10,566 utterances and lexicon with 12,605 unique words of the Dzongkha with 19 hours duration of audio corpus. Ten percent of the total corpus is used for testing and the rest 90% are used for the training model. Initially, the ASR model was trained with traditional statistical methods GMM like monophone and triphones and further improved the system by deep learning with DNN and TDNN models.
The DNN model has outperformed the GMM models but the TDNN model outperformed both GMM and DNN models. The TDNN achieved the word error rate (WER) of 24.07. To improve the performance, data augmentation using speed perturbation is applied to increase the data size and the i-vector feature is extracted for speaker adaptation. The TDNN + i-vector model yields the WER of 23.24 which outperformed the TDNN without the i-vector model by 3.45%. Overall, the TDNN + i-vector has achieved the best WER of 23.24 and this TDNN + i-vector model is assumed be state-of-the-art for the Dzongkha language.
King Mongkut's University of Technology North Bangkok. Central Library