Home > News content

How strong is AMD EPYC Rome? Or incite Intel server market position

via:雷锋网     time:2019/8/8 23:31:46     readed:100

Visit the purchase page:

AMD flagship store

The new processor sets a world record of 80 and can be called the strongest x86 processor in history. From the current information, the second-generation EPYC processor based on the Zen2 architecture has a good performance in terms of architecture, performance and security compared to the Intel Xeon scalable processor.

AMD released its first 7-nanometer processor, which also marks AMD's first time in the competition with Intel to win the process advantage, this advantage should not be underestimated. Because of TSMC's 7nm process combined with AMD's Zen 2 microarchitecture, AMD's EPYC Roman processor is largely considered a turning point, enabling it to stand out from the data center market with Intel's 95% market share.

Even if only 20%serverMarket share will also have a truly transformative impact on AMD, which has long been at a disadvantage. If you know that Intel's profit per day exceeds AMD's profit for the entire quarter, you can feel that AMD seems to have overcome insurmountable difficulties.

The debut of the AMD EPYC Rome processor not only marks the company's multi-year bets, savvy marketing strategies and ingenious engineering.It also marks the beginning of the biggest turmoil in semiconductor history.

As always, it all starts with chips, but there are many things that need to be done to get the data center, such as operating system and software optimization, relationships with OEMs, and building a strong hardware ecosystem. For a new and unique architecture like Zen, the difficulty is twofold.

AMD's first-generation EPYC Naples processor familiarizes the industry with the new Zen micro-architecture. Although there are some advantages over Intel Xeon processors, the big shift takes time, and Naples lacks a killer feature that spurs the industry. Turn to AMD. Especially in industries that are very conservative in adopting new architectures.

After the Naples processor debuted in 2017, AMD had to make a major decision: it could move EPYC to a faster and more efficient 12nm process than its desktop chip, or go directly to the 7nm process.

AMD chose to move toward the 7nm process, providing it with a killer feature that laid the foundation for fundamental improvements in density and power consumption.

The 7nm process has a density advantage over Intel's 14nm process, which is equivalent to more cores. It also brings power consumption, more work per watt of power (key considerations in the data center), higher clock frequencies, larger caches and competitive pricing.Combined with cost and yield advantages based on chiplet design, the improved Zen 2 architecture increases per-cycle (IPC) instruction throughput by approximately 15%, rapidly shifting to PCIe 4.0, industry-leading memory channels and x86 processors Throughput, EPYC is no longer considered a "alternative" to Intel.Now, its characteristics are considered to attract industry giants, just as inHPC uses the same Roman processor as the supercomputer.

Today, the reasons behind AMD's doing this become clear, and AMD's partners will release 80 world records, the highest worldwide record for AMD's data center processors.Impressively, these records have a 40-50% to 80% improvement in many actual workloads. Performance boost comes from four times the floating point performance and larger L3 cache, which also helps improve AI/ML workloads, as well as leading I/O capabilities that provide doubling throughput for GPU accelerators (not to mention Said to support more accelerators for each server). Adding PCIe 4.0 also benefits storage devices, especially main storage.

The desktop PC market has attracted a lot of attention, and you can see the reports published around Ryzen 3000, but there is no doubt that the data center can bring huge profits.

If AMD wants to win a bigger war with Intel, it must win the battle of the data center.But Intel is not just sitting and watching. Let's take a look at the battle for data centers in the next few years.

AMD EPYC Rome processor

The EPYC Rome processor features a unique architecture with eight 7nm computing chips, each with eight cores, connected to a built-in 12nm I/O chip via Infinity Fabric, which has built-in memory and a PCIe controller. AMD customizes the number of compute chips and the number of cores for each specific model.

Source: Tom's Hardware

The processor uses Socket SP3 (FCLGA 4094) interface for backward compatibility with the Naples platform, although it loses the PCIe 4.0 connection and is compatible with the next generation EPYC Milan (Milan) model. The custom platform can provide users with up to 162 PCIe 4.0 channels through clever configuration techniques, and in most cases 128 channels.

AMD continues to offer specific models of processors for two-socket servers (2P) and products for single-socket servers (indicated by the "P" suffix).

The number of cores in Rome ranges from 8 cores and 16 threads to x86 leading 64 cores and 128 threads.We usually expect that as the number of cores / TDP rises, the turbo frequency will drop, as we saw in the basic clock in Rome, but AMD is still optimistic about this trend. In fact, its highest core model has the highest overclocking frequency.

The basic clock speed range is from 2.0 GHz to 3.2 GHz, while the overclocking frequency range is from 3.0 GHz to 3.4 GHz, providing a comprehensive improvement in peak frequency compared to the predecessors of Naples. Considering that the number of cores in some models is twice, this is impressive, AMD said, and the increase in the base frequency should offset some of the performance advantages of Intel's single core.

AMD's power-aware overclocking algorithm also supports multi-core high-frequency, and the EPYC 7742 maintains a high frequency of 3.2 GHz when all cores are loaded. At the same time, Intel's largest GM Cascade Lake Xeon product has 28 cores and 56 threads, and will not change until some time in the first half of 2020. Intel introduced the new 56-core Cooper Lake model.

All Roman processors support 2TB of memory, up to 4TB per server, distributed over 8 DDR4-3200 channels, which is a significant improvement over Xeon's 6 DDR4-2933 channels. The eight memory channels in Rome have raised concerns about the memory throughput of each core, but AMD claims that performance can scale as the number of cores increases, even extending to two slots. Intel expects to support eight DDR4 channels next year, when the 14nm Cooper Lake chip is introduced.

Rome offers 128 channels of PCIe 4.0 for all models, including single-slot models for up to 162 channels. It is worth noting that both single- and dual-socket servers open 128/162 PCIe 4.0 channels to users. The throughput of the PCIe 4.0 interface is twice that of PCIe 3.0, which is a feature that Intel's current product does not match. It is rumored that Intel will support PCIe 4.0 on its Ice Lake processor, but will ship in the second quarter of 2020, which will make Intel's high-speed I/O device stack weak, such as the new GPU supporting PCIe 4.0, network And storage devices.

Rome's L3 cache is different, up to 256MB for 64-core models. AMD also offers a 48-core model with 192MB or 256MB L3 cache and a 32-core model with 64MB or 128MB L3, indicating AMD's ability to tailor a higher performance model to a specific workload.The most powerful Roman model offers nearly half a gigabyte of L3 cache in a two-socket model.

AMD divides its roman lineup into five different TDPs, from 120W to 225W. These TDP ranges can be changed based on SKU-by-SKU, enabling users to get higher performance from each model, up to 240 W cTDP. Higher TDPs typically require a custom platform, so not all previous generation servers can support 240W TDP. The new peak TDP surpasses the previous generation model, but this is expected because the core number in Rome has doubled twice.

AMD EPYC Rome Pricing

AMD has not yet announced the official pricing of the EMYC Rome lineup, but Tom's Hardware sources provide the following data. AMD's goal is to provide higher performance, more cores, more memory bandwidth and more I/O at each price point, which is to provide a better total cost of ownership than Intel.

While this is not a complete comparison of Intel Xeon's scalable products, and Intel does not compete with AMD for products with more than 28 cores, the basics remain the same:AMD offers more cores and threads in each market segment, and L3 cache is three times that of Intel, but at a lower price. In fact, the Intel 28-core model is more expensive than AMD's most powerful 64-core 128-thread.

AMD's TDP is lower than the Intel high-core model, but two less core products, similar to the two TDPs.It is worth noting that although AMD has powerful features on its 7nm chip, the large 12nm I/O chip adds some power. As always, TDP is not a measure of power consumption, so we have to wait for third-party results to measure the relative power efficiency between the two stacks.

AMD's processor also does not require a chipset on the host board, mainly because the processor itself provides a large number of PCIe 4.0 channels. This reduces cost and platform power consumption.

AMD EPYC Roman Performance

With the 7nm process, AMD's Zen 2 architecture adds new features and significantly improves the performance of the Zen microarchitecture.AMD also said it will introduce the 7nm+ process Zen 3 microarchitecture in 2021.

AMD claims that the performance per socket has doubled compared to the Naples processor, and by doubling the throughput of 256-bit AVX, the FLOPS (floating point) performance peak has quadrupled theoretically. Rome offers 204GB / s of memory throughput and supports up to 4TB of RAM per slot. PCIe 4.0 provides peak I/O throughput of 512 GB / s.Rome is the first x86 server processor to support PCIe 4.0, although IBM's POWER architecture already supports faster standards.

Unlike Intel Xeon, which has nearly a hundred different SKUs, AMD has optimized its products into four lanes, 8,12 / 16, 24 / 32 and 48 / 64 core segments, for a total of 19 SKUs, with few categories.Unlike Intel, AMD does not shrink features such as PCIe lanes or memory speed/channel to differentiate its stack.

AMD claims that a single-socket server with a 64-core model can outperform a dual-socket server with an Intel 8280M or higher.

AMD EPYC Rome Security

AMD has built Spectre v2 mitigations in the chip to reduce the impact on performance. AMD also patched IBRS and IBPB as well as Spectre v4. For the various speculative execution vulnerabilities that occurred last year, Rome is not as vulnerable as Intel. Rome also supports secure memory encryption.

AMD's source of trust comes from a secure processor that runs separate code using a separate ISA. These chips also have an AES-128 engine in the memory controller, and the keys are managed by a secure processor. Therefore, the key is isolated from x86. The chip supports up to 509 keys. SME can prevent physical memory attacks and can be in hardware orvirtual machineThe hypervisor level is complete. The SEV is built on top of the SME, allowing each visitor to have its own key, which can only be isolated from the hypervisor by secure processor management.

AMD added x2APIC extensions to improve support for high core counts, support for its memory bandwidth and quality of service for L3 cache access, and added support for non-volatile memory.

AMD EPYC Rome Zen 2 microarchitecture

EPYC Rome uses the same basic microarchitecture as the Ryzen 3000 series processors, and performance improvements such as a 15% increase in throughput per cycle (IPC) are the same.

Based on the 7nm process, it offers double density, up to 1.25 times the high frequency at any given power point, or can be adjusted to half the power consumption, with the same level of performance as the previous generation model.

The Zen2 microarchitecture is a good improvement, but the high-level improvements include a new TAGE branch predictor that complements the second phase of the perceptron-based prediction unit. The company also doubled the L3 cache capacity and turned to the 8-way correlation of the L1 instruction cache, allowing it to reduce the L1 cache and double the running cache.

AMD always supports 256-bit AVX, but it needs to split the instructions into two 128-bit. For Zen 2, AMD doubles the data path width and vector register file. Changes to load/store units include larger storage arrays and larger L2 DTLB blocks. AMD also increased the read and write widths to 256b and tripled the load + storage bandwidth.

Each computing chip (CCD) consists of two standard quad-core CCXs, but now they are equipped with twice the L3 cache, which helps reduce the amount of access to main memory. AMD also uses a new NUMA array to effectively reduce memory latency.

AMD EPYC Rome Multi-Chip Hybrid Architecture

As before, Rome is based on SoC design, but the company turned to 12nm I/O chips and bundled eight computing chips together. The core chiplet design is similar to the consumer-grade Ryzen 3000, which provides a cost advantage due to the inherent yield advantage of smaller die. It also allows vendors to place more chips in the slot because the mask limitations no longer apply when the compute core is distributed across multiple chips. As a result, AMD can be used in a single package up to ~1000 square millimeters, which is equivalent to 32 billion transistors in a package.

The 12nm I/O chip connects the chip to the 8 cores. DDR4 and PCIe 4.0 controllers are on the I/O chip, which allows the processor to provide latency similar to memory access, rather than the three-layer latency profile of previous generation chips. This also has the effect of improving NUMA performance, now there are only two NUMA domains, and there are three in Naples. This corresponds to an equal delay distribution of 104ns and 201ns for the two domains, which are reduced by 19% and 14%, respectively. These chips can also be configured as three NUMA domains, which can additionally reduce the domain 94ns latency.

AMD adds dynamic non-core DVFS systems to save power when it is not needed or underutilized non-core, or the power saved can be dedicated to computing cores. Unlike Intel, AMD does not reduce the frequency based on the type of instruction being processed, but rather reduces power consumption, which helps Rome maintain a higher overclocking core for high core number models. This is especially helpful for high performance models, such as the increase in the maximum frequency of the 7742.

In addition to doubling the number of cores per socket, AMD has roughly doubled the bandwidth of Infinity Fabric, which supports 10.7 GT / s throughput between two processors in a two-socket system. The optimized platform for Rome can reach up to 18 GT / s. AMD doubles the read width of the Infinity Fabric for each clock to 32B, but retains the 16B write width. Infinity Fabric also features a link width management system that saves power during periods of low utilization, and the same technology applies to the memory subsystem.

Rome offers up to 410 GB / s of memory throughput, which easily exceeds Intel's peak throughput of 282 GB / s.

Rome's I/O links can be configured for several different purposes, can be dedicated to socket-to-socke connections, or used only as standard PCIe links. This allows the company to support 128 channels on a single-socket system. The PCIe subsystem also supports forks, allowing up to 8 devices per x16 link. In the smart initiative of the Radeon Instinct GPU, some 2P systems can get more I/O channels by disabling socket-to-socket links and provide up to 162 channels of PCIe 4.0 in a two-socket server. These technologies require a dedicated platform that is not compatible with the first generation of Naples systems.

All Roman processors can run on a single-socket server, but AMD retains a model specifically for single-slot systems to drive a specific ecosystem.


It now appears that the AMD EPYC Roman processor looks powerful and has an unprecedented core count. We will have to wait for third-party verification in the lab, but if the chip is in line with expectations, AMD's Rome may be AMD's turning point in the data center.

Intel is busy promoting its platform-level benefits, such as tight integration with accelerators and Optane DC's persistent memory, which can be seen as complementary products that add value, or simply as a vendor lock-in. It all depends on your point of view.

Intel will also ensure that its partners and customers realize that it does have its own high-core product, which is the 14-core 58-core Cooper Lake model, but these chips will not be released until next year, and now Intel's products still use 14nm, no PCIe 4.0 connection. Obviously, when Intel faces such competition, they are trying to prevent customers from investing in the ECYC Roman processor.

For data center and enterprise customers, verifying the software stack and hardware configuration requires considerable validation, especially for mission-critical applications. AMD is confident to convince customers to switch, considering the time and money required to develop new systems that support new hardware. That's why AMD communicates its roadmap and strategy: it wants prospective customers to know that these investments will pay off in the long run.

As AMD wisely did with its first generation of Naples processors, its goal was to serve hyperscalecloud serviceProviders help them reduce overhead. Trying to get CSPs (Cloud service providers) can also promote a cloud-based instance ecosystem that potential customers can use to test new hardware without the need to pay for upfront investment.

If AMD's Rome can fulfill its promise, then Intel's main advantage may be that Intel has established a strong relationship with large original equipment manufacturers and original equipment manufacturers, thus establishing a dominant position in the data center, Intel in the past We have been trying to remind us of this advantage in a few weeks. But the industry has long wanted to control prices through real competition. There is no doubt that Rome has contributed in this regard, and if the chip is really as expected by most analysts, AMD can change the entire data center market.

China IT News APP

Download China IT News APP

Please rate this news

The average score will be displayed after you score.

Post comment

Do not see clearly? Click for a new code.

User comments