Cerebras, the company best known for making the CS-2 Wafer Scale Engine, the world's largest accelerator chip, has announced it is already using it. The giant core & throughout; An important step forward in training artificial intelligence,Trained the world's largest NLP (natural Language processing) AI model on a single chip.
The model has 2 billion parameters and is trained based on the CS-2 chip.
The accelerator chip, the world's largest, is etched from a single square wafer in a 7-nm process.
Hundreds of times the size of mainstream chips, with 15KW of power,It integrates 2.6 trillion 7nm transistors, packs 850,000 cores and 40GB of memory.
Figure 1 CS-2 Wafer Scale Engine chip
Single chip training AI large model new record
The development of NLP model is an important field in artificial intelligence. Using NLP model, artificial intelligence can “ Understand & throughout; Text meaning, and the corresponding action. OpenAI's Dall. E model is a typical NLP model. This model can convert user input text information into picture output.
For example, when the user types “ Avocado Shaped Armchair & RDquo; Then, the AI will automatically generate several images corresponding to this sentence.
Figure: &ldquo generated after the AI receives the message; Avocado Shaped Armchair & RDquo; The picture
More than that, the model enables AI to understand complex things like species, geometry, and historical eras.
However, it is not easy to realize all this, because the traditional development of NLP model has high computational cost and technical threshold.
In fact, when it comes to numbers alone, Cerebras' model, with its two billion parameters, looks rather tame by comparison with its peers.
The aforementioned Dall.e model has 12 billion parameters, while the largest model is Gopher, which DeepMind launched late last year, with 280 billion parameters.
But beyond the staggering numbers, Cerebras' NLP has one big breakthrough: it makes it easier to develop NLP models.
How does Bigcore beat the GPU?
The traditional process of developing an NLP model requires developers to split a large NLP model into functional parts and spread their workloads over hundreds or thousands of graphics processing units.
Hundreds of graphics processing units can mean huge costs for vendors.
Technical difficulties are also afflicting manufacturers.
The sharding model is a matter of customization; each neural network, each GPU's size, and the network that connects (or interconnects) them together are unique and not portable across systems.
The manufacturer must consider all these factors before the first training.
The work is extremely complicated and sometimes takes months to complete.
Cerebras says this is part of the NLP model training. One of the most painful aspects. , very few companies have the necessary resources and expertise to develop NLP. For other companies in the AI industry, NLP training is too expensive, time-consuming, and unusable.
But if a single chip can support a 2 billion parameter model, it means that there is no need to use a large number of Gpus to disperse the workload of training models. This saves vendors the training costs and associated hardware and scaling requirements of thousands of Gpus, as well as the pain of shred models and spread their workloads across thousands of Gpus.
Cerebras is also not obsessed with numbers alone. The number of parameters is not the only criterion for evaluating a model.
Rather than hope being born. The giant core & throughout; Ldquo; Efforts throughout the &; Cerebras preferred models. Smart & throughout; .
Cerebras has been able to achieve explosive growth in the number of parameters because of the use of weighted flow technology. This technique decouples computing from memory usage and allows memory to be expanded to store any number of parameters added to the AI workload.
Because of this breakthrough,The time to set up the model was reduced from months to minutes, and developers are in between gpT-J and GPT-NEO models. Just a few keystrokes. You can complete the switch. This makes NLP development much easier.
This brings a new dimension to NLP.
As Dan Olds, Chief Research Officer of Intersect360 Research, commented on Cerebras' achievements: &LDquo; Cerebras' ability to bring large language models to the masses in a cost-effective and easily accessible way opens up an exciting new era for artificial intelligence. ”
Recommended use China IT News APP
Download flyfish app to read news