With the new DFSMN model of smart audio or smart home devices, compared to previous generations of deep learning training speed mentioned 3 times, speech recognition speed increased by 2 times.
Xie Lei, a well-known speech recognition expert and professor at Northwestern Polytechnical University, said: “The open source DFSMN model of Ali is a breakthrough in the accuracy of speech recognition accuracy. It is the most representative of deep learning in the field of speech recognition in recent years. One of the results has a huge impact on the global academic community and the application of AI technology."
Speech recognition technology has always been an important part of human-computer interaction technology. With voice recognition technology, the machine can understand and speak like humans, and can think, understand and feedback. In recent years, with the use of deep learning technology, the performance of speech recognition systems based on deep neural networks has been greatly improved and has begun to become practical. Voice recognition based voice input, voice transcription, voice retrieval, and voice translation technologies have been widely used.
At present, the mainstream speech recognition system generally adopts a deep neural network-Hidden Markov Model (DNN-HMM) acoustic model based on a deep neural network. Its model structure is shown in FIG. The input of the acoustic model is the traditional speech waveform after windowing, frame division, and then extracted spectral features such as PLP, MFCC and FBK. The output of the model generally uses acoustic modeling units of different granularities, such as mono-phone, monophonic phonetic states, and tied phonetic states. From the input to the output, different neural network structures can be used to map the input acoustic features to the posterior probability of different output modeling units, and then combine the HMM for decoding to obtain the final recognition result.
See detailsAli TechnologyNo public.