Home > News content

Tencent AI Lab's annual report card: Let AI save lives and grow land

via:博客园     time:2019/1/30 17:02:15     readed:361


Following is our review of Tencent AI Lab's key work in 2018, and I wish you all a healthy and auspicious Spring Festival.

Industry applicationTechnology to goodness


AI technology landing


Tencent Intelligent Microscope


In Tencent video, we provide super-resolution and video classification technology. In addition, we have explored the in-depth understanding, editing and generation of video content. For example, let the machine analyze a video in depth, identify the characters, objects and scenes, and analyze their relationship, and recognize the different actions and events in the video in time order to generate sentences that can express rich semantic information of the video.


FrontiersDeep exploration

Defining the Next Generation of Intelligent Interaction - 3D Virtual Human


In addition, our Deep Intensive Learning Agent won the Level-10 built-in AI in StarCraft II and won the first Chinese regional championship in the history of FPS shooting AI competition Viz Doom in cooperation with Tsinghua University.


Using Robot as Carrier to Connect Virtual Reality



Open Source Collaboration

Besides publishing papers and publishing research results, we also share Tencent's accumulated technological capabilities (especially AI capabilities) to the entire industry through open source code and data, hoping to promote the common development and prosperity of the industry ecology.

We also opened a large-scale, high-quality Chinese word vector data set in October, which contains more than 8 million Chinese words. It is better than the previous data set in coverage, freshness and accuracy.

In November, we launched an automated deep learning model compression and acceleration framework, PocketFlow, which integrates a variety of model compression and acceleration algorithms, and can automatically search suitable compression parameters using reinforcement learning. We hope that this framework can reduce the technical threshold of model compression and enable mobile AI application development.

In terms of University cooperation, we have conducted joint research with professors from world-renowned universities such as MIT, Oxford, Stanford, HKUST, Tsinghua and Harbin Institute of Technology. Through special research projects, visiting scholars'programs, youth scholars' funds and joint laboratories, we have explored academic frontiers and rapidly applied research to Tencent cloud, Tencent open platform and other businesses. Medium.


Basic Research Promotes Borders

Our basic research directions are mainly four: machine learning, computer vision, speech processing and natural language processing. In 2018, we have more than 150 academic papers published in various top academic conferences, such as NeurIPS, ICML, CVPR, ECCV, ACL, InterSpeech and ICASSP, ranking first among domestic enterprises.


In the future, we will continue to focus on frontier research topics, promote interdisciplinary, multi-modal and cross-cutting research topics, and constantly explore research boundaries with an open, cooperative and win-win attitude.

machine learning

Learning ability is one of the core skills to distinguish intelligent machines from ordinary automation machines, and it is also a necessary skill to move towards general artificial intelligence (AGI). Our research covers reinforcement learning, transfer learning, imitation learning, optimization algorithm, weak supervised and semi-supervised learning, confrontational learning and multi-task learning.

We have explored the possibility of automated machine learning (AutoML), which is one of the frontier directions in the field of machine learning. For example, we propose a hyperparametric optimization algorithm based on data distribution migration [1]. This method uses the distribution similarity to migrate the hyperparametric optimization results corresponding to different data, so it can play a hot start effect on the hyperparametric optimization of new data. We have further developed FastBO algorithm and found that it has better effect than manual parameter adjustment in many scenarios such as medical treatment and games.

To solve multi-task problem, we propose a learning framework L2MT [2], which can automatically discover an optimal multi-task learning model. We also propose a learning transfer method L2T [3], which can significantly reduce the computational cost and domain knowledge required for transfer learning.


L2MT framework

We also propose some improved methods for reinforcement learning, such as a meta-rule component neural network describing how to construct reinforcement learning strategies from environment and tasks, which realizes the synthesis strategy adapting to different environments and tasks. We also try to use demonstration to improve the exploratory effect of reinforcement learning (POfD) [5] and fully decentralized Multi-Agent Reinforcement learning using networked agents [6].

In terms of computer security and social security, we have developed automatic feature learning, group classification and graph feature enhancement algorithms, which can successfully identify and combat black-producing users, black-related groups and malicious users (marker coverage rate is over 90%) and accurately identify users with credit risk to help prevent and control financial risks.

computer vision

Computer vision technology has a very broad application prospects, and is an indispensable part of the important applications of intelligent medicine, autopilot, augmented reality, mobile robots and so on. We are constantly looking for ways to give machines more powerful visual capabilities to understand the world in real time, robustly and accurately.

In 2018, we explored three-dimensional real-time positioning using camera and other sensor data, tracking and segmentation of objects in video using traditional spatial-temporal modeling (MRF) and deep learning (CNN), and some new methods for video description generation tasks. We also define a new task called Video re-localization [4], which can find semantically related segments of a specified video in a long video. We also propose an end-to-end neural network TVNet [5] for motion representation in video.

In addition to helping machines understand the world, we are also exploring video generation technologies. For example, we propose a solution to automatically generate delay photographic videos [6], which can show possible dynamic changes by predicting subsequent image frames. We also explore the application of multi-stage dynamic generation antagonism network (MD-GAN) [7] in this task.


MD-GAN Framework

speech processing

Our voice solutions have been used in Tencent's audio speakers, polar TV boxes and tinkling speakers. In 2018, we put forward some new methods and improvements, and made some progress in speech enhancement, speech separation, speech recognition, speech synthesis and other technical directions.

In the aspect of voice wake-up, we propose a new voice wake-up model to solve the problems of false wake-up, wake-up in noisy environment, fast-spoken wake-up and child wake-up. It can significantly improve the quality of keyword detection, perform well in noisy environment, and significantly reduce the power consumption of front-end and keyword detection module. We also propose a voiceprint recognition system framework based on Inception-ResNet [2], which can learn more robust and differentiated embedding features.


Left: benchmark keyword detection architecture right: text-dependent speech enhancement architecture

Speech synthesis is an important technology to ensure the natural communication between machine and human. Tencent has a deep technical accumulation in speech synthesis, and has developed a new technology that can achieve end-to-end synthesis and accent synthesis. In 2018, Tencent AI Lab made some new progress in such tasks as the change of intonation and prosody and the transfer of speech style.

natural language processing

Tencent AI Lab has extensive and targeted research in natural language processing, involving text understanding, text generation, human-computer dialogue, machine translation and other directions.

The models we trained rank first in many reading comprehension datasets, such as RACE, ARC (Easy/Challenge) and OpenBook QA of CMU University.

In the field of machine translation based on neural networks, we improve the low fidelity of translation by improving the multi-level multi-head self-attention mechanism in the current mainstream translation models and putting forward a training framework based on fidelity. We also propose a joint learning method for the default of pronouns in spoken language translation, and explore how to integrate external translation memory into the neural network translation model.

We have also released an AI-assisted translation product [5], which pays tribute to human translation. It uses the industry's leading technology of interactive machine translation and assisted translation input, cooperates with Bilingual Parallel Data, provides users with real-time intelligent translation assistance, and helps users complete translation tasks better and faster. As a future form of translation tool, this product has entered many translation classes in Colleges and universities.

We study text and dialogue generation, and propose a reply generation model based on reinforcement learning framework [6], which can automatically generate multiple different replies for the same input. An interlingual neural network confidence tracking framework XL-NBT [7] has important practical application potential in the realization of interlingual dialogue system (such as multilingual automatic customer service). In addition, we have also improved the conditional variational self-coding machine for the diversity of automatic response [8].


Return Generation Model Based on Reinforcement Learning

It is worth mentioning that we will explore the combination of Chinese classical culture and modern technology. During the Spring Festival of 2018, we launched Tencent AI Spring Festival couplets, which can generate a couplet according to two Chinese characters provided by users. We also explored the problem of creating machine poets, and proposed a method of poetry generation (CVAE-D) [9] based on confrontational conditional variational self-encoding, which made good progress in consistency of theme and novelty of words.

Looking forward to the future

In the past three years, Tencent AI Lab has established laboratories in Shenzhen and Seattle, USA. At present, the team has more than 70 top AI scientists and 300 experienced engineers, focusing on four major research directions.


The road ahead is long and the road is blocked. We will continue to move forward and brighten the light of humanity with science and technology.

China IT News APP

Download China IT News APP

Please rate this news

The average score will be displayed after you score.

Post comment

Do not see clearly? Click for a new code.

User comments

Related news