In the era of dual-core Pentium D, people look scornful when they hear that glue is multi-core. But today in 2018, not only AMD is using glue multi-core technology, but Intel, which last year also used glue multi-core technology to attack AMD EPYC processors with its shortcomings such as high latency and poor performance, has also launched glue multi-core 48-core processors this year, and will have more glue in the future. Nuclear technology is flourish.
Is glue multicore good? This is not a simple thing to say. In today's super class, we will talk about the past and future of MCM glue multi-core technology.
Moore's law fails. It's not easy to raise the frequency and increase the core road.
For CPU processors, there are only three things people are pursuing - higher performance, lower power consumption and cheaper price. Pricing is not only related to technology, but also to the business strategy of the manufacturer. This problem can not be explained by technology, but performance and power consumption are directly related to technology. Performance improvement is the most important.
Consumer level Intel CPU has achieved the 18 core.
Under the existing conditions, there are only two ways to improve CPU performance. One is to increase the frequency of CPU operation and the other is to increase the number of CPU cores. But today's semiconductor technology is facing bottlenecks. These two things are not easy, especially in the case of simultaneous demand, because we now need both high-frequency CPU and multi-core CPU. U, that makes it even more difficult.
28 the core Skylake-SP processor architecture has been very complex, and the core area is as high as 698mm2
In the past two years, with the help of AMD, Intel has accelerated the improvement of multi-core processors. Before last year, the desktop had only 10 cores and 20 threads at most. In 2017, Cori9-7980XE processors with 18 cores and 36 threads were launched.The serverSkyllake-SP processors with 28 cores and 56 threads have been launched on the product line, but Intel has paid a considerable price. The 28 core processors adopt XCC architecture layout, which is still extremely complex, and the core area reaches 698 mm2, while the core area of ordinary desktop 4 cores and 6 cores processors is still between 100 and 200 mm2.
Obviously, if we need more CPU cores, then the core area is bound to increase, then what is the harm of increasing the core area? It takes a long time to answer this question completely. The simplest explanation is that in the same process and wafer, the larger the chip core area, the lower the output, and the larger the core is more likely to have defects, resulting in further reduction of productivity/yield.
The impact of core area size on productivity / yield (source)
This is the answer from Stack Exchange that why the image sensor is not large in size. The smaller the chip area, the more efficient the wafer utilization, the higher the yield, the less waste and the lower the cost. The larger the core area, the greater the waste and the lower the yield. Although the example is graphics sensor, it is true for all semiconductor chips.
Someone wants to say, is not more advanced technology helpful to increase frequency, reduce power consumption and reduce core area? Yes, that's the role of Moore's Law. Semiconductor manufacturers can benefit CPU performance and core area by improving technology. But the problem is that Moore's Law has long been invalid, and it's not the current 14nm and 10nm processes that are invalid. Moore's Law in the strict sense is invalid after 28nm - for this reason. Intel hasn't publicly acknowledged that, but judging from the actual performance of most semiconductor manufacturers and existing chips, Moore's law has really been useless in recent years, and the improvement of transistor density and performance brought by advanced technology is getting smaller and smaller.
On the other hand, the cost of R&D and manufacturing of advanced technology is also increasing. This cost increase is aimed at the overall cost, especially after 28nm, the cost of entering 7Nm node from 14/16nm process is a big increase. AMD mentions an example, comparing the cost of 45-7nm process with that of 250-mm2 wafer core. If the cost of 45-nm process is 100% benchmark, the cost of 28-nm process is about 1.8, 20-nm node is 2.0, 14/16nm node is slightly higher than 2.0, but by 7Nm node, the cost increases to 4.0, compared with now. The 14/16nm process cost doubled.
According to professional SemiengingeeringwebsitePreviously published an article, 28nm nodeDevelopmentAs long as 51 million 300 thousand chips are put into the chip, the 16nm node needs 100 million dollars, and the 7Nm node needs 297 million dollars.
Even if we don't mention the huge investment in advanced technology, it's much more difficult technically to produce 10 nm or less. Intel hasn't produced 10 nm technology so far. Although TSMC and Samsung have achieved the level of 7 nm, their technology level and use are still a little far from the high-performance processors mentioned here. In short, we hope to start with them. It is also impossible to solve the problem of CPU frequency and core area by the process.
Singles can not fight on groups, MCM multi chip re favored.
With semiconductor technology approaching the physical limit gradually, it is impossible to expect future 7nm, 5nm or even 3nm processes to rescue processors. However, the disadvantages mentioned above are still aimed at monolithic circuits. Since a single chip is not easy to upgrade, let's have multiple chips, which is MCM (m). The ulti-chip module (multi-chip module) is designed, which is also the glue multi-core which is ridiculed by everyone.
MCM multi-chip module is not a new thing, the technology has decades of history, so many different MCM multi-chip technologies have been derived from the development of these years, so although they all seem to be "glue multi-core", the different "glue" effect is also different, chip packaging technology has been in for many years. Progressing.
As for AMD and Intel, Intel is the first company to use MCM glue multi-core. As early as Pentium Pro processors, MCM packaging technology has been used. But it is probably the Pentium D dual-core glue that everyone is familiar with. In that era, in order to launch dual-core processors first, Intel had to use it on Presler-based P4. The MCM glue technology has won the honor of the double core start.
Of course, the performance of Pentium D dual core in the market is not satisfactory, but this has little to do with MCM multicore. It is still the pot that Pentium structure does not give. MCM only aggravates everyone's dissatisfaction.
After that, Intel and AMD seldom use MCM technology in processor architecture, and continue to use native multi-core architecture. After all, this architecture should have been the design of multi-core processors. However, as the number of CPU cores gradually increases from single-digit to ten-digit, the limitations of monolithic multi-core become more and more serious, not only in the front. It is difficult to manufacture and low yield because it is not flexible enough, because besides the number of cores, processors also need to consider the collocation of IO cores such as memory channel and PCIe channel. As shown in Skyllake-SP architecture, Intel uses XCC, LCC and H on it in order to cooperate with processors with different cores. CC three different internal architectures, which undoubtedly increases the complexity of the chip.
The design of single chip is becoming more and more complex, more and more expensive, and Intel may be able to go on with its wealth and technological advantages. But AMD is not good. Whether it is a desktop processor or a server processor, AMD has to fight with Intel in price war. More core and lower price are their weapons, so it is impossible to go any further. The single chip route has also been used by AMD on Ryzen sharp dragon and EPYC Xiaolong processor on MCM.
In this architecture, AMD makes two sets of CCX units as one module into 8-core 16-thread processors. This is the desktop version of the Ruilong 7 processor, while the first generation of EPYC processors has 32 cores and 64 threads. The first generation of EPYC processors has four 8-core modules encapsulated in it. Detailed technical introduction has been given in our previous initial evaluation, which will not be discussed here. Let's see why AMD did it.
The answer is simple -- save money. To solve this problem, AMD compares the advantages and disadvantages of MCM and Monolithic in designing 32-core processors in EPYC architecture. If the original 32-core architecture is used, the core area is only 777 mm2, while the current MCM multi-core chip architecture uses four 213 mm2 modules, the core area is 852 mm2, which is about 10% wasted compared with single chip. The core area.
But it's much easier to build four 213mm2 small core processors than one 777mm2 large core. The yield of the latter is too low. How low is it? AMD has published data this year, and the yield of a complete 32-core processor is less than 17%. AMD can't afford the cost.
In addition to wasting part of the core area, the design of MCM also has the problem of delay. After all, the communication distance between native multi-core is much shorter than that between external chips. This is why the memory delay problem was criticized by the Revlon processor before, but even with these two shortcomings, AMD still carries forward the design of MCM. A 40% reduction in chip manufacturing and testing costs is enough to disprove the negative effects. Moreover, problems such as delays can be mitigated by other means without significant impact.
Intel also said that AMD's MCM module had performance and latency problems.
Intel has been insisting on native multi-core design in recent years, compared with the shift from AMD to MCM design. For this reason, Intel's chief architect wrote a special article about Diss glue multi-core earlier, indicating that native multi-core has many advantages, performance has not compromised, glue multi-core.. However, it did not take long for Intel to launch a glue itself. Multi-core, Cascade Lake-AP 48 core processor, is actually a combination of two 24-core Cascade Lake processors through MCM mode, and is not a native 48 core.
Intel's Cascade Lake-AP 48 core processor is obviously in emergency. Although its 28-core processor performance is not worse than AMD's 32-core processor, it is much more expensive. AMD also launched a 64-core 7-nm Roman processor this year, further widening the core gap with Intel's Xeon processor, while Intel's 2-core processor. It will be possible to produce a 10-nm server chip in 2020, but it will be difficult to produce a 64-core processor. It is sooner or later to glue multi-core.
All roads lead to the same goal. AMD and Intel are heading for heterogeneous MCM simultaneously.
Is MCM glue multicore just the way it is now? No, AMD recently announced the 7Nm Zen 2 architecture of the Roman processor. Its biggest feature is to increase the number of CPU cores to 64 cores and 128 threads, which has doubled since now. The multi-core fantasy is well-known. In order to achieve the design of up to 64 cores and 128 threads, AMD will continue MCM glue multi-core, but this time the multi-core MCM is different.
According to the information released by AMD, the MCM of the 7nm Rome processor is 8 1 architecture. In this MCM multi-chip architecture, AMD separates the CPU kernel from the IO unit, and the eight small cores around it are pure CPU cores, while IO units such as DDR memory controller, PCIe controller and IF controller are made into a single core.
In addition to the separation of CPU core and IO unit, the 7Nm Rome processor also uses different processes - the core IO unit is 14nm process, the GF agent, and the surrounding CPU core is 7Nm process, and the Taiji Electronics agent. This is also done to reduce costs, because the IO unit does not require such advanced process technology.
The MCM structure of AMD on Roman processors reminds people of the EMIB multi-chip packaging technology before Intel. In this respect, they can be said to be the same as each other. Both of them are the core of integrating different processes in one processor package. CPU core and core display in Intel's EMIB package can be 10nm, communication and its application. The core of his IP can be 14nm or even 22nm technology.
In addition, Intel also compared the advantages and disadvantages of EMIB packaging with traditional 2.5 packaging, indicating that EMIB technology has the advantages of normal packaging yield, no need for additional process, simple design and so on.
Conclusion: MCM glue multicore is probably the normal state of future processors.
From being ridiculed to being valued again, MCM multi-chip module has become a powerful weapon of multi-core processors for many years, especially on servers with more than one core. On the other hand, today's MCM multi-chip design technology is also different from the simple and rough glue multi-core. Intel mentioned earlier that the delay of their EMIB technology is only 10% higher than that of monolithic circuits, and the delay in other technical solutions can even increase by 50%.
However, the impact of MCM multi-chip technology on mainstream desktop processors is not so great. In the next two years, high-end desktop processors should be based on 8 cores and 16 threads. Therefore, whether AMD's next generation of Ruilong 3000 desktop processors will use the design of separation of core and IO is of great concern.