Xilinx announced that its new generation of ACAP-based chip series Versal has been officially shipped to the first customers and will be officially shipped in the second half of this year. This also means that the new architecture that Xilinx has developed for many years has finally reached the stage of market testing.
Since Xilinx released the ACAP architecture in 2018, the architecture has received a lot of attention from the industry. ACAP's full name is "Adaptive Computation Acceleration Platform", which not only contains the configurable logic of the FPGA, but also includes the ARM core.And AI Engine and DSP Engine. This means that chips using the ACAP architecture will meet three needs: ARM cores can run some general-purpose and low-performance tasks, such as operating systems; FPGA configurable logic can run custom logic; and AI Engine can Run AI-related high-performance dedicated calculations, such as matrix operations.
The newly added AI Engine in the ACAP architecture is the focus of everyone's attention. According to information released by Xilinx, the AI Engine is a set of SIMD core arrays.Each core contains a complete RISC processor, a fixed-point SIMD processing unit, a floating-point SIMD processing unit, and local memory. Each core can also be connected together via a network on chip (NoC) for highly flexible data flow. Such an architecture is in fact similar to a many-core architecture. In this release of the Versal AI Core series, 128-400 AI Engines will be integrated to achieve the 43-133TOPS INT8 fixed-point computing capability.
Why does Xilinx add a dedicated AI Engine to the FPGA? Let’s review the FPGA History of development.
When the FPGA first appeared, its biggest selling point could be configured as arbitrary logic, as long as the logic capacity does not exceed the upper limit of the FPGA. Initially, the expectations of FPGAs are To meet its functional requirements.
Under this expectation, a system using an FPGA can be divided into two parts, one is a high-speed processing part implemented using a general-purpose processor.The other part is the special function part implemented using FPGA. In other words, the main role of the FPGA is the coprocessor.
However, with the user's needs and the expansion of the FPGA company's business, the FPGA is not satisfied with being a coprocessor alone. From the user's point of view, in fact, this master-slave processor system will be more troublesome in design and integration. How to properly handle the reliable interconnection between the master-slave processor and the data communication protocol design requires a lot of effort. . As FPGA capacity increases, many users try to deploy a processor core on the FPGA to run the OS and control the rest of the programmable logic. However, such efficiency is low because the processor core is a general purpose processor, and the most appropriate area for FPGAs is actually custom logic rather than general purpose processors. From the FPGA business of Xilinx and Altera, the profit margin of the FPGA as a coprocessor market is obviously lower than the profit margin of the entire system. Therefore, based on the needs of both supply and demand, Xilinx introduced the Zynq series of chips including ARM hard core and FPGA.Altera also introduced a similar concept to the Altera SoC family.
In products such as Zynq and Altera SoC, an ARM processor hard core and programmable logic are included. Because the processor is hard core, it can run in the frequency range of GHz. Compared with the previous one, the soft core running on the FPGA can only run to the frequency of 100MHz. It can be said that the integrated ARM hard core greatly enhances the general-purpose processor core. Its performance makes it no longer a performance bottleneck for the system. In addition, because the ARM hard core and FPGA are integrated on the same chip, there is a wealth of high-speed interconnect resources between the processor and the FPGA, which makes the data exchange between the ARM hard core and the FPGA logic efficient. At the same time, in the tool chain, it also provides good support for the collaborative work of ARM core + FPGA. When these elements are combined, an important breakthrough is achieved. Such a chip containing an ARM core and an FPGA can work directly as a system in many scenarios without relying on other host processors. Therefore, an FPGA with an integrated ARM core is no longer a coprocessor.It can be used as a main processor.
Today, with the FPGA target artificial intelligence market, in fact, the same situation occurred before: Many users want to use FPGA to realize artificial intelligence calculation, but the ability of FPGA to realize artificial intelligence calculation Limited, on the other hand, mainstream artificial intelligence calculations are quite regular, basically based on matrix operations, so Xilinx actively joins the hard core that can flexibly support such artificial intelligence operations in the ACAP architecture - AI Engine, hope Can support artificial intelligence calculation as a complete system, and slowly move from chip manufacturers to system vendors.
ACAP's impact on the artificial intelligence market
ACAP’s main market is Cloud and edge computing market. The cloud market is the server of the data center. In such applications, ACAP is expected to enter the server as an acceleration card for accelerating artificial intelligence calculation. Currently, Amazon AWS already has a number of servers with Xilinx FPGAs deployed.With the release of ACAP, more FPGAs are expected to enter this cloud computing market. It is worth noting that ACAP's main strength is low-precision fixed-point calculation (INT8), so the AI application is mainly based on reasoning, and the training of important applications of cloud artificial intelligence still needs GPUs that are good at floating-point operations. In terms of AI reasoning, FPGAs have the advantage of being less power-consuming and customizable than GPUs, so they will compete with GPUs for market share in this area. However, compared to the complete developer ecosystem of GPU, the development ecology of FPGA is still not mature enough, so the development ecology such as toolchain may become the key point to fully unlock the potential of FPGA.
Outside the cloud market,Edge computing will be a key market for ACAP. Edge computing is between the endpoint and the cloud. The edge server is deployed near the IoT terminal. It collects information about the terminal device and processes it nearby, and uploads the necessary information to the cloud. While the Internet of Things, 5G and autonomous driving technologies have greatly increased the potential of the edge computing market, they also put forward new requirements for edge computing, which requires edge computing to perform real-time artificial intelligence calculations in addition to network operations and data storage. And this is also the market that ACAP mainly targets. In fact, communication base stations are originally one of the major markets of FPGAs. Therefore, with the concept of 5G combined with edge computing and artificial intelligence, ACAP can obviously cater to such a trend, and it is expected to occupy more communication markets with the rise of 5G+ concept.
For artificial intelligence chip startups, ACAP will be a strong contender in the cloud and edge computing. Considering ACAP's computing power (40-100TOPS), we believe that ACAP will be very competitive in edge computing (the demand for computing power in the cloud will soon exceed 100TOPS,Therefore, ACAP may be more able to take advantage of its configurability rather than power. Due to the high cost of FPGAs, we believe that ACAP will face more high-end markets, but on the whole, we believe that ACAP will become a tool for Xilinx to enter the artificial intelligence market.
For Intel’s old rival, Intel, the strategy for dealing with Xilinx ACAP is not to launch another chip with similar functions, but to take advantage of its packaging technology. Compact. In April of this year, Intel released the next-generation FPGA architecture AgileX, which uses an EMIB package to package FPGAs and other chips. Therefore, if the customer chooses to integrate a chip for AI acceleration in the AgileX system, it can achieve similar effects to Xilinx ACAP. In this way, Intel's AgileX solution is more flexible than Xilinx, because Xilinx's ACAP is only optimized for AI-related calculations.Intel's AgileX optimized calculations depend on what chip chips the customer chooses to integrate. However, the ecology of the chip is not yet perfect, so there is a question mark in terms of cost, tool chain and capacity. Therefore, we believe that the first customers of Intel's AgileX architecture are mainly aimed at deep cooperation with Intel. Using EMIB technology and having the technical strength to understand or even customize the large customers of the chip system, it seems that we think that Xilinx's ACAP is betting on the AI market and hopes to quickly cover more customers; Intel is still reluctant to give up FPGAs. Flexibility, so I don't want to bind FPGA and AI deep so early, but I hope that FPGA can be the pioneer of its chip grain ecology. This is also related to the architecture of the two companies: FPGA is just a division of Intel, and its product definition must obey Intel's overall strategy; FPGA products are all for Xilinx, so everything is around FPGA.
For Chinese chip companies, it is also important to realize the trend of FPGA integration with dedicated accelerators and processor hard cores. Although we still need to catch up in the FPGA field, the architecture of FPGA SoCs is more like a horizontal expansion than FPGAs. Longitudinal deepening, that is, it can be carried out in parallel with the R&D FPGA, so that it can catch up with the pace of FPGA development in the future. From another perspective, we can also think that the next step of FPGA development is not just how to put the FPGA logic scale. Doing a lot of speed, but how to combine FPGA with other applications to form a SoC with sufficient flexibility and performance. With the development of FPGA IP, Chinese chip companies can also consider purchasing FPGA IP with dedicated accelerator IP.Thereby implementing our own "ACAP".