Heart of the machine
Today, Baidu ERNIE upgraded and released the ERNIE 2.0, a continuous learning semantic understanding framework, and released the ERNIE 2.0 pre-training model based on this framework. The model surpassed BERT and XLNet in a total of 16 Chinese and English tasks, achieving SOTA effects.
In the past two years, unsupervised pre-training technology represented by BERT and XLNet has made technological breakthroughs in many natural language processing tasks. Unsupervised pre-training techniques based on large-scale data have become critical in the field of natural language processing.
Baidu found that the previous work mainly used the co-occurrence signal of words or sentences to construct a language model task for model pre-training. For example, the BERT is pre-trained by a masking language model and a next-prediction task. XLNet builds a fully-arranged language model and pre-trains it by autoregressive methods.
However, in addition to the language co-occurrence information, the corpus also contains more valuable information such as lexical, grammatical, and semantic. For example, the concept knowledge of words such as names of people, places, and institutions, structural knowledge such as order and distance relations between sentences, semantic similarity of text semantic similarity and language logic. Then, if you continue to learn various tasks, can the effect of the model be further improved? This is what ERNIE 2.0 hopes to explore.ERNIE 2.0 next-generation upgradeBefore the machine heart, I introduced Baidu open source ERNIE 1.0. Today Baidu also open sourced the ERNIE 2.0 Fine-tuning code and English pre-training model. So what are the upgrades to ERNIE 2.0 compared to 1.0?
ERNIE 2.0 open source address:https://github.com/PaddlePaddle/ERNIE
Of course, this is only the aspect of the main manifestation of the update, and many core ideas and tuning techniques are hidden in the model. Let's take a look at what the main ideas and specific structure of ERNIE 2.0 are.What is ERNIE 2.0?As mentioned earlier, there is a lot of very valuable information in the text, for which Baidu proposes a continuous learning semantic understanding framework ERNIE 2.0.
The framework supports incremental introduction of custom pre-training tasks from different angles to capture lexical, grammatical, semantic and other information in the corpus. These tasks train and update the model through multitasking. Whenever a new task is introduced, the framework can learn the task without forgetting the information that has been learned before.
ERNIE 2.0 paper address:Https://arxiv.org/pdf/1907.12412v1.pdf
The ERNIE framework supports the introduction of custom tasks at any time that share the same coding network and are trained through multitasking. This multitasking approach enables the coding of lexical, syntactic, and semantic information in different tasks to be learned together.
In addition, when we present new tasks, the ERNIE 2.0 framework can incrementally learn distributed characterization based on previously pre-trained weights. As shown in Figure 1 of the original paper below, the ERNIE 2.0 framework is built on a pre-training/fine-tuning architecture that is very popular in NLP.
ERNIE 2.0 differs from classic pre-training methods such as BERT or XLNet in that it is not done on a small number of pre-training tasks, but by introducing a large number of pre-training tasks, helping the model to efficiently learn vocabulary, syntax and Semantic representation.
In addition, it is important that the ERNIE 2.0 framework continuously update the pre-training model through multi-task learning, which is the meaning of "continuous pre-training". In each fine-tuning, ERNIE first initializes the weights that have been pre-trained, and then uses the data fine-tuning model for the specific task.
Figure 1: The ERNIE 2.0 framework, shown above. ERNIE 2.0 can learn different tasks in sequence and adapt to a variety of new natural language understanding tasks through fine-tuning.
Pay attention to the continuous pre-training process, which can be divided into two steps, namely, constructing an unsupervised pre-training task and incrementally updating the ERNIE model through multi-task learning. The different tasks here are a sequence, so the model can remember the knowledge that has already been learned while learning a new task.
Figure 2 below shows a continuous pre-trained architecture that contains a series of shared text encoding layers that encode context information that can be constructed by a looping neural network or Transformer, and the encoder parameters pass all pre-training tasks. Update.
Figure 2: Architecture for multitasking pre-training in ERNIE 2.0.
Finally, in order to verify the effectiveness of this pre-training approach, Baidu researchers built pre-training models through a series of unsupervised natural language processing tasks. As shown in Figure 3 below, the ERNIE 2.0 concrete model structure, we can see that it mainly contains three types of pre-training tasks. The word-aware task will teach the model to capture information at the lexical level, the structure-aware task will capture the syntactic level of the church model, and the semantic-aware task will be responsible for providing semantic information.
Figure 3: The specific structure of the ERNIE 2.0 model, which can be divided into three categories.
It is worth noting that ERNIE 2.0 has a positional embedding compared to models such as BERT. This embedding is unique for different tasks, so the model can clearly see in the process of fine tuning that the current pre-training task is what. Therefore, the ERNIE 2.0 pre-training model that relies on the framework not only implements the SOTA effect, but also provides a solution for developers to customize their own NLP models.
How is the ERNIE 2.0 effect?Baidu researchers compared the performance of ERNIE 2.0 and the current optimal pre-training model in Chinese and English tasks. In the English task, ERNIE 2.0 defeated BERT and XLNet in seven tasks of the natural language understanding data set GLUE. In Chinese, ERNIE 2.0 surpassed BERT in nine different data sets, including reading comprehension, sentiment analysis, and question and answer, and refreshed SOTA.
To facilitate comparison with BERT, Baidu researchers use the same transformer model settings as BERT. The ERNIE 2.0 base model uses 48 NVIDIA v100 GPUs and the large model uses 64 NVIDIA v100 GPUs. ERNIE 2.0 is implemented in Baidu's deep learning framework, PaddlePaddle.dataThe English data used for model training is Wikipedia and BookCorpus, and some are from Reddit. In addition, Baidu uses Discovery data as chapter relationship data. The Chinese data includes encyclopedia, news, dialogue, information retrieval and textual relationship data from Baidu search engine. The specific data statistics are shown in the following table:
Performance of ERNIE 2.0 on English tasksThe performance of the ERNIE 2.0 model on the GLUE dataset is shown in the following table. We can see from the table that the ERNIE 2.0_BASE model outperformed BERT_BASE in all 10 tasks and scored 80.6. ERNIE 2.0_LARGE outperforms BERT_LARGE and XLNet_LARGE in tasks other than MNLI-m. The performance of ERNIE 2.0LARGE on all mission test sets exceeded BERT_LARGE, yielding a score of 83.6, an increase of 3.1% over the previous SOTA model BERT_LARGE.
Table 1: Results of the model on GLUE, where the results on the development set are the median of the five experimental results, and the test set results are done through the GLUE assessment service.
Performance of ERNIE 2.0 on Chinese tasksThe researchers conducted a number of experiments on nine Chinese NLP tasks, including machine reading comprehension, named entity recognition, natural language inference, semantic similarity, semantic analysis, and question and answer. So what is the specific effect? The following table shows the performance of models such as ERNIE 2.0 on these Chinese tasks.
Table 2: Results of the model in 9 regular Chinese NLP tasks. The model results are the median of the five experimental results, and the boldface indicates the SOTA result.
ERNIE 2.0 outperformed BERT_BASE on all nine tasks, and ERNIE 2.0_LARGE achieved optimal performance on these Chinese tasks, creating new SOTA results.