Half a year later, mlperf released the latest mlperf inference v1.0 results. V1.0 introduced new power measurement technologies, tools and metrics to supplement the performance benchmark. The new indicators make it easier to compare the energy consumption, performance and power consumption of the system.
In the v1.0 benchmark, cloud reasoning still includes a series of workload such as recommendation system, natural language processing, speech recognition and medical image, while the edge AI reasoning test does not include recommendation system.
MLPerf Inference v1.0
All major OEMs submitted mlperf test results, among which,NVIDIA, which has a dominant position in AI field, is the only company that has submitted all mlperf benchmark data from data center to edge this time, and has set a new record with A100 GPU.
More Than This,More than half of the systems that submitted scores adopted NVIDIA's AI platform.
However, there are still few startups submitting their AI chip inference performance benchmarks.
The highest performance of AI reasoning increased by 45% in half a year
Lei Feng has already introduced when mlperf inference v0.7 results were released. The benchmark performance of A100 tensor core GPU released by NVIDIA in May last year in cloud reasoning is 237 times that of the most advanced Intel CPU.
After six months of optimization,NVIDIA has further improved the performance of the recommendation system model DLRM, speech recognition model rnn-t and medical imaging 3D u-net model by up to 45%, and the performance gap with CPU has also increased to 314 times.
From the perspective of architecture, the advantage of GPU architecture for reasoning is not obvious, but NVIDIA still refreshes the benchmark record of mlperf AI cloud and edge reasoning with its architecture design and software optimization.
Mlperf's benchmark proves the performance of A100 GPU, but its expensive price is beyond many companies' affordability.
today,NVIDIA A30 (power consumption 165W) and A10 (power consumption 150W) GPU with more cost performance are also the first show in mlperf inference v1.0.
A30 GPU is better than computing and supports a wide range of AI reasoning and mainstream enterprise computing workloads, such as recommendation system, conversational AI and computer vision.
A10 GPU focuses more on image performance, which can accelerate deep learning reasoning, interactive rendering, computer-aided design and cloud games, and provide support for hybrid AI and graphics workload. A30 and A10 GPUs, which can be applied to AI reasoning and training, will be applied to various servers this summer.
The reasoning performance of A100 cloud AI is 314 times higher than that of CPU
After half a year's optimization, the performance gap between A100 and CPU has increased from 237 times to 314 times.
Specifically, in the benchmark of data center reasoning, in the offline test, the performance of A100 is 1-3 times higher than that of the newly released a10. In the server test, the performance of A100 is up to 5 times higher than that of a10. In both modes, the performance of A30 is higher than that of a10.
It is worth noting that the reasoning performance of ice lake, the third generation of Intel's most powerful extensible CPU released earlier this month, is significantly improved compared with the previous generation of Cooper Lake in the offline test of resnet-50 and SSD large models, but it shows a performance gap of 17-314 times compared with the A100 GPU.
The cloud AI reasoning of Qualcomm AI 100 performs well under the mlperf inference v1.0 test. The results of resnet-50 and SSD large models submitted under offline and server tests show that the reasoning performance of Qualcomm AI 100 is higher than that of NVIDIA's new A10 GPU, and the results of other models have not been submitted.
In terms of per watt performance, the Qualcomm A100 is higher than the A100 in the resnet-50 and SSD large models that submitted results, but its performance is lower than that of the A100.
Xilinx's vck5000 FPGA performs well in the test of image classification resnet-50.
Jetson series is the only chip to submit all edge reasoning test results
The performance advantage of a series GPU in cloud AI reasoning can be extended to the edge. The first mock exam of MLPerf's edge AI reasoning Benchmark is divided into Single-Stream and Multi-Stream, A100 PCIe, A30, A10 have significant performance advantages in all models of Single-Stream, while high pass A100 has obvious advantages under the model of the A10, but Qualcomm has only submitted the result of this model.
The AgX Xavier and Xavier NX of NVIDIA's Jetson family are more suitable for edge scenarios. According to the submitted data, Centaur has obvious advantages in resnet-50 model, and the performance of SSD small model is equivalent to that of Jetson Xavier NX.
For the multi stream benchmark of edge AI reasoning, only NVIDIA has submitted the results. The performance of A100 PCI version is the highest, which is 60 times higher than that of Jetson AgX Xavier and Xavier NX.
Among the results submitted by NVIDIA this time, many of them are based on the Triton inference server, which supports the models of all the main frameworks and can run on GPU and CPU. It also optimizes for different query types such as batch processing, real-time and serial transmission, which can simplify the complexity of deploying AI in applications.
Lei Feng net (official account: Lei Feng net) learned that in the case of fairly matched configuration, the performance of the Triton submitted results is close to that of the optimized GPU, which achieves 95% of the performance and the performance of the optimized CPU99%.
In addition, NVIDIA also uses the multi instance GPU performance of ampere architecture, using seven MIG instances on a single GPU and running all seven mlperf offline tests at the same time, achieving almost the same performance as a single MIG instance running independently.
Summary
The continuous updating of mlperf benchmark results can provide some valuable references for enterprises investing in IT infrastructure, and also promote the application and popularization of AI.
In this process, the software is very important for the improvement of AI performance, which is also the A100 GPU. Through targeted optimization, it has achieved 45% performance improvement in half a year.
At the same time, it is not difficult to find that NVIDIA is maintaining its leadership in the AI field through continuous hardware and software optimization and a richer product portfolio. It seems that it is becoming more and more difficult to surpass NVIDIA in the AI field.
- THE END -
Link to the original text:Lei Feng net
User comments