Recently, the Alibaba Dharma Machine Intelligence Laboratory launched a new generation of speech recognition model DFSMN to increase the global speech recognition accuracy rate to 96.04% (this data test is based on LibriSpeech, the world's largest free speech recognition database).
The voice recognition team at the Boomerang Machine Intelligence Lab dominated the development of this model and announced the open source to companies and individuals worldwide. Compared to the most widely used LSTM models in the industry, the DFSMN model has faster training and higher recognition accuracy. With the new DFSMN model of smart audio or smart home devices, 3 times faster than previous generations of deep learning and training, speech recognition speed increased by 2 times.
Figure: Alibaba has developed its own DFSMN speech recognition model on the GitHub platform.
At the recent Yunqi Conference Wuhan Summit, the AI Cashier with the DFSMN speech recognition model accurately identified the user's voice order list in a noisy environment in a PK with a live clerk. Within seconds, 34 cups of coffee were ordered. In addition, the automatic ticketing machine equipped with this voice recognition technology has also been on the Shanghai Metro's "gang".
Xie Lei, a well-known speech recognition expert and professor at Northwestern Polytechnical University, said: "This is an open source DFSMN model of Ali. It is a groundbreaking improvement in the accuracy of speech recognition. It is one of the most representative results of deep learning in the field of speech recognition in recent years. Has a huge impact on the global academic community and AI technology applications. “Industry sources said that DFSMN is expected to become one of the most important acoustic identification models in the global speech recognition field, following the traditional LSTM model.