GPU delay comparison: AMD rdna2 wins NVIDIA

I believe everyone has heard about CPU cache and memory delay test, but few people have done the same test for GPU.

Chips and cheese did a special test to compare and investigate the cache and memory latency of AMD and NVIDIA GPU architectures.


First of allThe competition between AMD rdna2 and NVIDIA ampere is represented by RX 6900 XT and RTX 3090, the former winning in almost all stages.

Rnda2 architecture innovatively adds infinity cache and infinite cacheAt the same time, the latency is also remarkable. The L2 cache hit rate only increases by about 20ns, which is significantly lower than that of ampere.

Even more amazing,RDNA2 memory latency is as like as two peas, Ampere, but don't forget that Ampere has two levels of caching and RDNA2 has four.

The cache architecture of ampere is more traditional. The latency of SM array from private level 1 cache to level 2 cache increases by more than 100 ns, while that of rdna2 from level 0 cache to level 2 cache only increases by about 66 ns. It seems that the core area of ga102 is too large, which directly increases the delay.

This can just explain that AMD rdna2 architecture has better performance and energy efficiency at low resolution, because the latency of level 2 cache and level 3 cache is very low, which is more suitable for small load.On the contrary, ampere has obvious advantages under high load, such as 4K resolution.


Having said the comparison between GPUs, how about putting GPUs and CPUs together? Take RX 6900 XT and Intel Core i7-4770 as examples.

Naturally, the CPU cache is not of a single level, so linear data is used for Y-axis here. You can see that the whole process is much lower than rdna2 with DDR3 - The memory latency of 1600 cl9 is only 63ns, that of the combination of Rx 6900 XT and gddr6 is 226ns, and that of the end cache is 53 ns . 42ns、123 . 2ns。


Take a look at the previous generations of NVIDIA GPUs, including GTX 980 Ti of Maxwell architecture, GTX 1080 of Pascal architecture, and RTX 2060 mobile of Turing architecture.

Maxwell and Pascal are almost the same. The former is slightly higher overallThis may be due to the large chip area and low core frequency.

Turing already has the appearance of ampereThe latency of the first level cache is much lower than that of the second level cache. The strange thing is that the latency of the video memory is higher after 32MB, and the reason is unknown.


Amd examined HD 5850 / 6950 of Terascale architecture, HD 7970 of GCN architecture, and Rx 6900 XT,It's obviously decreasing generation by generation, and all levels of caching are improving at the same time.


