EMNLP is the top conference in the field of natural language processing. Its full name is Conference on Empirical Methods in Natural Language Processing, sponsored by the SIDDAT group of the International Language Institute (ACL), October 31 this year. November 4th will be held in Brussels, Belgium.
This year, Tencent AI Lab participated in EMNLP for the second time. A total of 16 articles were selected, covering various research topics such as language understanding, language generation, and machine translation. The following is a summary of the interpretation.
In addition, in this year's top academic conferences, Tencent AI Lab has also been selected for many papers, ranking among the top domestic companies, including NIPS (20 articles) and ICML (16 articles) in the field of machine learning. CVPR (21 articles) and ECCV (19 articles), as well as Interspeech (8 articles) in the voice field.
1. QuaSE: Sequence editing under quantitative guidance
QuaSE: Sequence Editing under Quantifiable Guidance
This article is led by Tencent AI Lab and is completed in cooperation with the Chinese University of Hong Kong. This paper presents the task of Quantitative Guided Sequence Editing (QuaSE): Editing an input sequence to generate an output sequence that satisfies the values used to quantify a particular attribute of a sequence while maintaining the main content of the input sequence. For example, the input sequence can be a sequence of words, such as comment sentences and ad text. For comment sentences, the value can be a score; for an ad, the value can be a click rate. One of the main challenges of QuaSE is how to perceive word-related wording and edit them only to change the results. In this paper, the proposed framework contains two potential factors, the result factor and the content factor, which provide convenient editing for input sentences to change numerical results and preserve content. The framework of this paper explores the use of pseudo-parallel sentence pairs by modeling their content similarities and differences in results to better decouple potential factors, allowing for the generation of outputs that better satisfy desired numerical results and maintain content. The dual reconstruction structure further enhances the ability to generate an output that meets expectations by exploiting the coupling of potential factors of the pseudo-parallel sentence pairs. For evaluation, the researchers prepared a data set of Yelp's comment sentences, using scoring as a numerical result. This paper reports and discusses the experimental results in depth to illustrate the characteristics of the framework.
2. Neural machine translation using deep representation
Exploiting Deep Representations for Neural Machine Translation
This article is led by Tencent AI Lab and completed in cooperation with Nanjing University. Neural machine translation systems are typically composed of multi-layer encoders and decoders, which allows the system to model complex functions and capture complex language structures. However, in general, the translation process only utilizes the top layer of the encoder and decoder, which misses the opportunity to utilize useful information in other layers. In this work, the researchers proposed to expose and transmit all of these signals simultaneously using layer aggregation and a multi-layered attention mechanism. In addition, this article introduces auxiliary regularization to encourage different layers to capture different information. The researchers conducted experiments on the widely used WMT14 English to German and WMT17 Chinese to English translation data, and the experimental results proved the validity and universality of the method.
3. Local modeling of self-focused neural network models
Modeling Localness for Self-Attention Networks
This article is led by Tencent AI Lab and is completed in cooperation with the University of Macau. The self-attention model can pay close attention to all input elements and is proven to have the ability to capture global dependencies in many tasks. However, the capture of such dependency information is done through a weighted summation operation, which may cause it to ignore the relationship between adjacent elements. This paper establishes a local model for the self-attention network to enhance its ability to learn local context. Specifically, we design the local model as a learnable Gaussian deviation that represents the local extent of the enhancement. Subsequently, the resulting Gaussian deviation is used to modify the original attention distribution to obtain a locally enhanced weight distribution. In addition, we found that in a multi-layer self-attention network, the lower layer tends to focus on the smaller local area, while the higher layer pays more attention to the capture of global information. Therefore, in order to maintain the original model to capture long-distance dependence while strengthening its modeling of local information, this paper only applies local modeling to the lower-level self-attention network. The quantitative and qualitative analysis of the Chinese-English and English-Chinese translation tasks proves the validity and applicability of the proposed method.
4. Thematic memory network for short text classification
Topic Memory Networks for Short Text Classification
This article is led by Tencent AI Lab and is completed in cooperation with the Chinese University of Hong Kong. Many classification models perform poorly on short text classifications, mainly due to the sparsity of data generated by short texts. In order to solve this problem, this paper proposes a novel topic memory mechanism for encoding the topic representations that are useful for classification to improve the performance of short text classification. The predecessors' work mainly focuses on expanding the text features with additional knowledge or using the already trained topic models. Different from the past work, the model of this paper can simultaneously learn the topic representation and text classification in an end-to-end manner under the memory network framework. . The experimental results on the four benchmark datasets demonstrate that the model in this paper not only outperforms the most advanced models in short text classification, but also produces meaningful topics.
5. A hybrid method for automatically constructing Chinese typo checking corpus
A Hybrid Approach to Automatic Corpus Generation for Chinese Spelling Check
This article is led by Tencent AI Lab and is completed in cooperation with Tsinghua University and Tencent SNG. The automatic checking of Chinese typos is a challenging and very meaningful task that is not only used in the pre-processing stage of many natural language processing applications, but also greatly facilitates people's daily reading and writing. The data-driven approach is very effective in Chinese typo checking, but it faces the challenge of missing data. This work proposes a method of automatically constructing a spell check data set that simulates typos by constructing visually and audibly similar words based on OCR and SR based methods, respectively. Using the method proposed in this paper, the researchers constructed a large-scale data set for training different typo automatic inspection models. The experimental results on three standard test sets prove the rationality and effectiveness of the method of automatically constructing data sets in this paper. Sex.
6. Chinese poetry generation based on the conditional variational self-encoder
Generating Classical Chinese Poems via Conditional Variational Autoencoder and Adversarial Training
This article is a cooperation project of Tencent Rhinoceros and is completed in cooperation with Peking University. It is a difficult problem for computers to automatically create poems with smooth expression and beautiful writing. Although the previous related research has achieved remarkable results, the automatically generated poetry still has a great gap with the poet's creation, especially the consistency of the subject and the novelty of the word. In this paper, the researchers propose to combine poetry with conditional variational self-encoders and countermeasures. The experimental results show that the model in this paper has a significant improvement, whether it is automatic indicators or manual evaluation.
7. Interactive document generation for abstract generation, representation learning and retouching
Iterative Document Representation Learning Towards Summarization with Polishing
This article is a cooperation project of Tencent Rhinoceros and is completed in cooperation with Peking University. By observing the fact that humans read and understand multiple times when generating abstracts, the text proposes a removable abstract generation model based on interactive text abstraction technology. Considering that the current digest generation technique is limited to treating only the generated summary text, most text expressions cannot achieve globally optimal results. In response to this situation, the method proposed in this paper can continue to select, update the corresponding text and optimize the corresponding text representation. The experimental results on CNN/DailyMail and DUC2002 data prove that the proposed method surpasses the best model in the past in both automatic and manual evaluation.
8. Variational autoregressive decoder for reply generation
Variational Autoregressive Decoder for Neural Response Generation
This article was participated by Tencent AI Lab and completed in cooperation with Harbin Institute of Technology. Due to the combination of the probabilistic graph model and the advantages of neural networks, the Conditional Variational Auto-encoder (CVAE) has shown excellent performance in many natural language processing applications such as open domain dialog responses. However, traditional CVAE models typically generate dialog responses from a single latent variable that is difficult to accurately model the diversity in the response. In order to solve this problem, this paper proposes a model to introduce serialized latent variables into the dialogue generation process. In this model, the researchers used a Recurrent Neural Network (RNN) to extend the approximate posterior probability distribution. The backward cyclic neural network makes the model better capture the long-distance dependence in the text generation process. problem. In order to better train the proposed model, the researchers added auxiliary targets for predicting subsequent bag-of-words during the training process. Experiments on generating datasets in OpenSubtitle and Reddit dialogs show that the proposed model can significantly improve the relevance and diversity of generated responses.
9. Multiple attention mechanisms with differential constraints
Multi-Head Attention with Disagreement Regularization
This article is led by Tencent AI Lab and is completed in cooperation with the Chinese University of Hong Kong and the University of Macau. The bullish attention mechanism is welcomed by researchers because of its ability to learn different representations in different subspaces. In this work, this paper introduces a difference constraint to explicitly encourage the diversity of multiple attention heads. Specifically, this paper proposes three kinds of difference constraints, which respectively encourage each attention head to be different from other attention heads in the input subspace, the attention alignment matrix, and the output representation. The researchers conducted experiments on the widely used WMT14 English to German and WMT17 Chinese to English translation data, and the experimental results proved the validity and universality of the method.
10. A Method of Joint Prototype Reduction and Neural Network Machine Translation Based on Shared Reconstruction Mechanism
Learning to Jointly Translate and Predict Dropped Pronouns with a Shared Reconstruction Mechanism
This article is led by Tencent AI Lab and is completed in cooperation with Dublin City University. Pronouns are often omitted in pronouns-deficient languages (eg, Chinese), but this poses significant challenges to the integrity of machine translation results. Recently, Wang et al. (2018) (Translating Pro-Drop Languages with Reconstruction Models) proposed the use of reconstruction mechanisms to alleviate the problem of the missing pronouns in neural network machine translation. This paper further strengthens the original reconstruction model from two aspects. First, this paper proposes a shared reconstructor to make full use of the representation of the encoder and decoder. Secondly, in order to avoid the error transmission of the additional abbreviated pronoun labeling system, this paper proposes an end-to-end model of the pronoun prediction and translation using the joint learning method.
11. A statistical reweighting method for reducing the neural network dialog model to generate a generic response
Towards Less Generic Responses in Neural Conversation Models: A Statistical Re-weighting Method
This article is led by Tencent AI Lab and is completed in cooperation with Suzhou University and Wuhan University. The neural network generation model of sequence-to-sequence (Seq2Seq) has performed well in the dialogue generation task of essay. However, these generation models tend to generate generic, tedious responses that greatly affect the conversation experience. Researchers have observed that in a conversational task, each input statement is likely to correspond to multiple reasonable responses, that is, in the form of a pair of n (or m to n from the perspective of the entire corpus). In this case, using the standard Seq2Seq objective function, the model parameters are easily dominated by the general (high frequency) sentence pattern at the cost of loss. Inspired by this, this paper proposes a method based on statistical re-weighting to give different acceptable responses to multiple input responses, and uses the classical neural network generation model for training. The experimental results on a large Chinese conversation corpus show that the proposed method significantly reduces the number of general responses while increasing the acceptance rate of model generation responses.
12. Translate math problems into expression trees
Translating a MathWord Problem to a Expression Tree
This article is led by Tencent AI Lab and is completed in cooperation with the University of Electronic Science and Technology and the Chinese University of Hong Kong. The sequence-to-sequence model has been successful in the direction of automatic solution mathematics. However, although this method is very simple and effective, there is still a disadvantage: a mathematical problem can be solved correctly by multiple equations. This non-deterministic mapping relationship impairs the performance of the maximum likelihood estimation. In this paper, a unique method of formula normalization is proposed by using the uniqueness of expression tree. In addition, the paper also analyzes the performance of the three most popular sequence-to-sequence models on the automatic solution task. The researchers found that each model has its own advantages and disadvantages, so this article further proposes an integrated model to combine their advantages. The actual model on the dataset Math23K shows that the integrated model using the formula normalization is significantly better than the most advanced methods.
13. Marginal probability estimation of n-gram under the language model of cyclic neural network
Estimating Marginal Probabilities of n-grams for Recurrent Neural Language Models
This article is the Tencent AI Lab Rhinoceros Gift Fund project, which was completed in cooperation with Northwestern University. Cyclic Neural Network Language Models (RNNLMs) are the mainstream methods of current statistical language modeling. However, RNNLMs can only perform probabilistic calculations on complete text sequences. In some practical applications, it is often necessary to calculate the probability of a context-independent phrase. In this paper, the researchers explored how to calculate the marginal probability of RNNLMs: how the model calculates the probability of a short text in the absence of the preceding context. This paper proposes a method to change the RNNLM training so that the model can calculate the marginal probability more accurately. Experimental results show that the proposed technique is superior to the baseline system, such as the traditional RNNLM and the importance sampling method. This paper also shows how to use the marginal probability to improve RNNLM, that is, to make the marginal probability close to the probability of an n-gram in a large data set in training.
14. Online debate defense/objection relationship recognition based on mixed attention mechanism
Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates
This article was participated by Tencent AI Lab and completed in cooperation with Harbin Institute of Technology. Inferring the debate, especially the pro-/objection relationship between the texts of online debates, is one of the basic tasks of argumentation. The expression of approval/objection usually relies on the deliberative expression in the text and the interaction between the participants in the debate, whereas past work often lacks the ability to jointly model these two factors. In order to solve this problem, this paper proposes a hybrid attention mechanism based on neural network, which combines the self-attention mechanism and the cross-attention mechanism to locate the important text in the text through contextual context and interaction information between users. section. Experimental results on three online debate datasets show that the proposed model is superior to the existing optimal model.
15. XL-NBT: A Cross-Language Neural Network Confidence Tracking Framework
XL-NBT: A Cross-Lingual Neural Belief Tracking Framework
This article is the Tencent AI Lab Rhinoceros Gift Fund project, which was completed in collaboration with the Ohio State University and the University of California, Santa Barbara. The realization of cross-lingual dialogue system has important practical significance in practical applications (such as automatic customer service). Existing methods for implementing multilingual support usually require separate labeling for each language. In order to avoid a large amount of labeling cost, as a first step to achieve the ultimate goal of multilingual universal dialogue system, this paper studies a cross-language neural network confidence tracking framework that does not require new language annotation. Specifically, this article assumes that there is a trained confidence tracker in the source language (such as English), and that there is no corresponding annotation data for the training tracker in the target language (such as German or Italian). In this paper, the source language tracker is used as the teacher network, and the target language tracker is used as the student network. At the same time, the structure of the trust tracker is decoupled and some parallel corpora are used to help the migration learning. In this paper, two different types of parallel corpora are discussed in detail, and two different types of migration learning strategies are obtained. In the experiment, the researchers used English as the source language and German and Italian as the target languages respectively. The experimental results verify the effectiveness of the proposed method.
16. Natural language video time domain positioning
Temporally Grounding Natural Sentence in Video
This article is led by Tencent AI Lab and is completed in cooperation with the National University of Singapore. This article describes an efficient and efficient way to locate the corresponding video content of a natural statement expression in a long untrimmed video sequence. Specifically, this paper proposes a novel Temporal GroundNet (TGN) that captures the interaction between ever-evolving fine-grained video frames and words between video and sentences. The TGN scores a set of candidates for each frame based on the interaction information between the mined video frames and words, and finally locates the video segments corresponding to the sentence. Unlike traditional methods of processing overlapping segments separately in a sliding window manner, TGN takes into account historical information and generates final positioning results after a single processing of the video. The researchers extensively evaluated the TGN proposed in this paper on three public data sets. Experiments have shown that TGN significantly improves the performance of the prior art. This paper further demonstrates the effectiveness and efficiency of TGN through corresponding comparative tests and speed tests.