Home > News content

In addition to 64 cores and 7nm, what else is AMD's Rome architecture worthy of attention?

via:CnBeta     time:2019/8/13 9:31:29     readed:94

Visit the purchase page:

AMD flagship store

In the face of global partners, customers and the media, AMD CEO Dr. Su Zifeng also said without hesitation: AMD will bring competition back to the data center market.

Dr. Su Zifeng, CEO of AMD, gave a speech

Up to 64 cores, 128 threads, up to 3.4GHz Boost frequency, up to 4TB of memory in a dual system, 512MB L3 cache, full line of DDR4 3200MHz memory support and 128 Lane PCI-E 4.0 support give AMD the full confidence to say this.

Leadership defined by AMD

From the basic characteristics, the second-generation EPYC processor released this time mainly includes three major features: 7nm process, Zen 2 core and Chiplet architecture.

7nmThe last member of the Invincible Fleet will be in place

It must be said that AMD has been dragged down by the process for a long time in the past. The reason is that GF technology is hard to blame for its perenniality. However, under Dr. Su's drastic reforms, AMD finally got rid of the GF's shackles and handed over the OEM work to the more advanced TSMC. So we saw the Ryzen7 3000 series processor, the Radeon RX 5700 series graphics card and the second generation EPYC processor released this time. Civil processors, graphics cards, data center processors, AMD's three pillars of the business have finally enjoyed the power consumption reduction and Die size reduction brought by 7nm.

Although the process and process are not the success or failure of a chip, but its power consumption reduction and area reduction have given AMD the opportunity to launch a better energy efficiency, more core, higher frequency. And these are exactly what AMD is back on.IntelPrerequisites for a positive confrontation.

ChipletArchitecture——a new understanding of the chip

The so-called chiplet is to completely abandon the traditional single-die design ideas in the previous CPU design, and then design and manufacture the different functions separately; the modules exchange data through special interconnect devices and embedding technology.

AMD's proud Chiplet design

At present, in Rome, AMD adopts the design idea of ​​Core and IO separation; in the middle of the processor is integrated IO core with memory controller, PCI-E 4.0 controller, internal interconnected Infinity Fabric controller and L3 cache. Around the IO core are several Core modules (AMD calls it CCD, a group of 8 cores and one Infinity Fabric, 64 core products only need to arrange 8 Core modules around the IO core). This design idea has several benefits:

First, different parts can be manufactured separately to effectively reduce the cost incurred by the yield problem. For example, under the same technical conditions, if the same batch of wafers randomly generate 20 fault points during the etching process, in the worst case, this means that 20 die will be scrapped. In a traditional large core design, if a wafer can only cut 100 complete cores, then these 20 points of failure mean that the yield will drop to 80%. But in the chiplet idea, AMD can build more die on the same size of the wafer (in inverse square of the chip side length), if the side length of the die is reduced by half, then the number of die that a wafer can cut is At least 4 times higher, then the same 20 bad points can only reduce the chip yield to 95% in the worst case. Obviously, this will greatly reduce AMD's manufacturing costs.

Second, because different functional modules are completely independent of each other, it is easier and less expensive to upgrade or step.

Third, this chiplet idea also allows AMD processors to integrate different IP, different companies, different functions, and different process chips on-chip instead of in chip to quickly create a new processor or SOC that meets market demands. .

For processors, chiplet is a fairly advanced design idea that greatly reduces costs and simplifies the time-to-market for new products. However, chiplets are not omnipotent. To achieve high-performance CPU products, AMD or TSMC need to solve many challenges in the packaging process, such as the electrical performance of interconnected conductors, achieving higher data bandwidth in a smaller cross-sectional area, How to arrange more pins and so on on a valid area. However, since Rome has been able to achieve a 3.4 GHz frequency and a TDP of up to 225 W under this framework, AMD and TSMC have clearly achieved a lot of success in this regard. It can be said that the second generation EPYC is one of the highest performance, highest power consumption and highest frequency in the current chiplet mode. The challenge is self-evident.

Surprise brought by Zen 2 architecture

The second generation EPYC CPU design architecture

With the good start of the EPYC generation, AMD has clearly found the right architectural design direction and has made even more aggressive features in the second generation of EPYC. This includes a new branch prediction architecture called TAGE, 2x OP cache capacity, optimized L1 instruction cache, almost doubled L1 bandwidth, 3rd generation address unit, 2x floating point path bandwidth, and 2x L3 cache capacity.

This drastic performance architecture enhancement brings 23% of core execution efficiency. With eight DDR4 3200 memory channels and up to 4TB of memory support (up to 64GB of memory per core), AMD can achieve performance advantages in many memory-sensitive applications. On the day of the conference, Dr. Su Zifeng said: By using the second-generation EPYC processor, AMD's partners and users have broken 80 performance records worldwide.

PCI-E 4.0,Power multiplier

At the same time, the second-generation EPYC is still the world's first x86 processor supporting PCI-E 4.0. Two times the bandwidth of PCI-E 3.0 enables better performance for devices with high data throughput.

Xilinx ALVEO U50 Network Accelerator Card Supporting PCI-E 4.0

Broadcom 200G Ethernet Card

Mellanox ConnectX-6 200G Infiniband NIC

Although most of the current application forms (GPUs, accelerator cards, network cards, HBAs, etc.) are not able to fully enjoy the performance gains brought by the doubling of bandwidth, for some high-throughput FPGAs (such as those used in Spark Quary) Lings ALVEO U50), higher PCI-E bus bandwidth can obviously improve single-card performance (in the live demonstration, Xilinx Spark Quary acceleration card after the replacement of PCI-E 4.0 bus data throughput Can be increased by 1.7 times).

In addition, for the next generation (for now, 100G network card belongs to the next generation of products that have just been introduced to the market, then 200G is naturally the next generation). For the network, the 200G network card also needs PCI-E 4.0 as the bus ( The bus bandwidth converted from the 100G network card is 12.8GB, which just reaches the upper limit of PCI-E 3.0 x16. The 200G network card naturally needs PCI-E 4.0 with double the bandwidth.) In the demonstration, Broadcom's 200G NIC in the PCI-E 4.0 x16 environment in one-to-one bidirectional read test, data throughput can be directly doubled from 192Gbps to 381Gbps, performance improvement is immediate.

Mellanox's ConnectX-6 200G Infiniband NIC has a similar performance, with one-to-one bidirectional write testing increasing from 202 Gbps to 395 Gbps.

Of course, with the further enhancement of flash technology and main control performance, PCI-E 4.0 also has long-term significance for many high-performance NVMe storage devices (of course, if you need to enter full application, PCI-E 4.0 Switch and supporting standards) It also needs to be followed up and matured).

Since the PCI-E 4.0 controller is integrated into the IO core and the IO cores of all products are identical, the second-generation EPYC product, no matter how much core or frequency, has the largest supported PCI-E 4.0 Lane. The number is 128 (note that even in a two-way system, after installing two second-generation EPYCs, this number will not double; AMD's explanation is that the IO core needs to have a certain number of Infinity Fabrics. The connection is left to the communication between the cores).

But in any case, such a design will obviously give a lot of cost-effective choices for many CPU load applications such as hyper-convergence, cold storage, firewall, AI cluster and other applications.

The battle of ecology will become the key to victory and defeat

A huge leap in performance and functionality has enabled AMD to return to the mainstream of the data center on the product side. But more than a decade of backwardness has left AMD with a lot of lessons to be supplemented in terms of ecology, and this is not a one-off thing like a product release.

60+ partners

In the release process,HPE,DELL, Xinhua III, Cisco, Super,AsusCompanies, Gigabyte, ASRock, Taian motherboards, Open 19 and other companies and organizations have already demonstrated their products and designs, and at the user level, including AWS, Azure, Google Cloud, etc.cloud serviceThe provider also excludes executives from AMD stations. In China, the three major cloud giants, including BAT, have also actively cooperated with AMD on the application of the second-generation EPYC (Xinhua III products are ready and presented at the conference site). But the spectacular list of partners is still not enough for data center applications.

Advantages of AMD EPYC

In addition to system manufacturer partners and users, the EPYC series wants to help AMD return to the game and also requires support for a large number of data center components, operating systems, applications, and open source standards. In the AMD non-public display partner list, we have been able to seeSamsung, magnesium, modern,Western dataSuch mainstream flash memory and main control manufacturers, main control manufacturers such as PMC, network equipment manufacturers such as Broadcom and Mellanox,MicrosoftOperating system providers such as SUSE and Red Hat, enterprise core application providers such as SAP, Oracle, MongoDB, VMware, and Citrix, and open source organizations such as OpenStack, Docker, Spark, and Java, but this is for the entire data center. Ecology is still not enough.

So, when is it a complete enterprise-level ecosystem? I think it should be enough when I don't need or can't provide a list of cooperative companies. And AMD still has a long way to go from this state.

If AMD can find the leading (or at least flat) level of performance, core quantity, and features in future generations of products, then it is believed that there will be more and more partners who will take the initiative to extend the olive branch to AMD. It will be stronger and stronger. At that time, AMD could re-engage the competition to Intel's city gate.

But then again, the emergence of more cores, higher frequencies, more advanced processes, and Chiplet architectures indicates that AMD is already moving in the right direction. This also gives us an expectation for the future development of the EPYC series and AMD in the data center field.

China IT News APP

Download China IT News APP

Please rate this news

The average score will be displayed after you score.

Post comment

Do not see clearly? Click for a new code.

User comments