Cadence is an AI industry participant whose Tensilica IP products are still very noticeable and appear in popular SoCs such as HiSilicon's Kirin lineup or MediaTek's chipset. As the industry tries to shift cloud-based AI reasoning to edge-end devices themselves, the market for intra-device neural network reasoning is exploding to achieve lower power and lower latency.
Lei Feng learned that Cadence this week demonstrated the demand for a wide range of capabilities from the Internet of things, mobile AR-R / VR to intelligent monitoring and automotive applications, and announced more products that could accelerate the use of neural network reasoning at the edge. And announced the introduction of a new dedicated
Cadence says there will be a large number of sensors in applications such as automotive power, including cameras, lidars and ultrasound, and the need for reasoning performance is urgent. Standard DSPs will handle the main tasks of signal processing, but tasks that are actually meaningful to data will be handed over to neural network accelerators, such as DNA 100 for processing perception and decision-making tasks.
Cadence claims that the DNA 100 has a 4.7-fold performance advantage over competing solutions with similar-sized MAC engines. Cadence does this with its sparse computing architecture, which means it only calculates non-zero activation and weight, and achieves higher hardware MAC utilization than its competitors.
In terms of architecture, the DNA 100 looks similar to other reasoning accelerators, and its most important processing power lies in what Cadence calls
MACs are local 8-bit integers that operate on a full throughput quantization model, but it also provides 16-bit floating-point operations with half-rate 16-bit integers and quarter-throughput. A single MAC engine / sparse computing engine is scalable in 256/512/1024 MAC, and then IP can be extended by adding more engines, up to four. This means that the maximum configuration of a single DNA 100 hardware block contains up to 4096 MAC.
Cadence is still well aware that some application scenarios or neural network models may not be processed by fixed function IP, and still offers the possibility of coupling DNA 100 with existing DSP IP. These two products are tightly coupled, and DSP can effectively handle more specific N layers, passing the kernel back to DNA 100, thus making the solution scalable in the future and scalable to the customization layer that customers want.
Bandwidth is a key bottleneck in the hardware of neural network reasoning, so compression is essential to achieve the best performance without platform constraints. In addition to providing bandwidth reduction by compressing weights and activation, the DNA 100 also provides very wide interface options for one to four AXI 128 or 256-bit interfaces in terms of raw bandwidth, which means up to 1024-bit bus widths in the widest configuration.
In order to extend IP over 4096 MACs, it is only necessary to place multiple hardware blocks side by side on the SoC, which can greatly improve the theoretical computing capability. Software plays a key role here because it can properly distribute workloads between different blocks. Cadence explains that this approach can also be used to speed up a single kernel / reasoning, and they envisage possible multi-chip extensions through chip-to-chip communication.
As far as the performance of DNA 100 is concerned, it is emphasized once again that the actual performance of its architecture is obviously higher than that of the equivalent competitive architecture with the same number of MAC. Here.
Cadence demonstrates the performance of the ResNet 50, whose DNA 100 is configured with a maximum 4K MAC configuration and has the original hardware performance of 4TMAC. According to official data, the performance of the DNA 100 is 4.7 times higher than that of competitive solutions, reaching 2,550 fps, compared with 538 FPS for competitors. In terms of energy consumption, DNA 100 has a 2.3 advantage over competitive solutions. Of course, the network in the test has been pruned to achieve the best result on DNA 100.
On the software side, Cadence provides a complete software stack and a neural network compiler to take full advantage of hardware, including network analyzers and optimizers, as well as the required device drivers. Cadence also recently announced that it will support Facebook's Glow compiler, a machine learning compiler across hardware platforms.
DNA 100's hardware IP will be licensed in early 2019, and the product will be available as early as the end of 2020.