Speaking of FPGA, many people may think of FPGA manufacturers in the first place Xilings and Altera< /u>(Already acquired by Intel), but there are other very distinctive FPGA vendors, such as FPGA-based hardware acceleration devices and high-performance embedded FPGA (eFPGA) semiconductor intellectual property. (IP) company Achronix.
Thanks to the rapid development of artificial intelligence/machine learning in recent years, new algorithms are constantly appearing, which drives programmable The rapid growth of the FPGA market. According to market research firm Semico Research, the market size of FPGAs in AI applications will triple in the next four years to $5.2 billion.
According to market research firm IP Nest The latest IP market analysis report released recently shows that Achronix is the fastest IP provider in the world in 2018, with a year-on-year growth of 250%, which shows that Achronix's business is growing rapidly.
In order to further meet the growing demand for artificial intelligence/machine learning (AI/ML) and high-bandwidth data acceleration applications, in May of this year, Achronix introduced the innovative, new FPGA family, the Speedster 7t series.
New architecture: the perfect combination of ASIC and FPGA
We all know that for AI acceleration, the computing power of ASIC chips is compared to general-purpose chips such as CPUs and GPUs, and programmable FPGAs. And the calculation efficiency is directly customized according to the needs of specific algorithms, so it can realize the advantages of small size, low power consumption, high reliability, strong confidentiality, high calculation performance, high computational efficiency, etc. Therefore, in its For specific application areas, the energy efficiency of ASIC chips far exceeds that of general-purpose chips such as CPUs and GPUs, as well as programmable FPGAs.
But, as we mentioned earlier, the current AI algorithm is still in a phase of constant rapid update iteration, and the selectivity of numerical precision is also more and more. At the same time, with the rapid development of AI application scenarios, new solutions must address different needs in terms of high performance, flexibility and time to market.
And AISC is designed for specific algorithm acceleration, which also makes it far less flexible than can be quickly adapted to new software algorithms through programming FPGA. However, FPGAs are not as good at AISC in terms of size, energy efficiency, and cost.So is there such a product that can combine the advantages of FPGA and ASIC well? Achronix's Speedster 7t series may be such a product.
Achronix claims that the Speedster 7t series is based on a highly optimized new architecture that simplifies ASIC-like performance Designed with FPGA flexibility and enhancements that go far beyond traditional FPGA solutions.
▲Achronix CEO Robert Blake
Aborix Semiconductor President and CEO Robert Blake said: "Speedster7t is the most exciting release in Achronix history, representing the establishment of four architectures. Innovation and accumulation based on hardware and software development, and close cooperation with our leading customers. Speedster7t is a fusion of flexible FPGA technology and ASIC core efficiency, providing a new ‘FPGA+’ chip category that can dramatically increase the limits of high-performance technology.
Speedster7t FPGA Series Details
Based on Achronix's introduction, Speedster7t FPGA series products Designed for high-bandwidth applications, it features a revolutionary new 2D NoC (Network on Chip) and a high-density new machine learning processor (MLP) module array. Programmability is perfectly integrated with ASIC's cabling architecture and computational engine, and the Speedster7t family of products creates a new class of "FPGA +" technology.
At the same time, the Speedster7t series also includes a high-bandwidth GDDR6 interface. >, 400G Ethernet ports and PCI Express Gen5 interfaces, all of which are interconnected to provide ASIC-level bandwidth while preserving the full programmability of the FPGA.
and in order to handle receiving large amounts of data from multiple high-speed sources, it also needs to distribute that data to programmable on-chip algorithmic and processing units and then provide with as low a delay as possible Those results, so in the process technology, Speedster7t devices have chosen to use TSMC's latest 7nm FinFET process to manufacture <
New Machine Learning Processor Array
The AI performance that can be provided by a traditional FPGA with a DSP module It is relatively limited because the use of DSP blocks can only provide inefficient numerical precision support. Using external LUTs and memory to build AI/ML applications requires elimination of additional logic editing and memory resources, and performance is limited by FPGA routing.
In contrast, Speedster7t FPGAs use the largest parallel array of programmable computing cells in the new Machine Learning Processor (MLP), which offer the industry's highest FPGA-based computational density. MLP is a highly configurable, computationally intensive unit module that supports up to 32 multipliers per MAC unit and can drive variable precision adders/accumulators to support 4 to 24 bits. Point format and efficient floating point mode, including support for the 16-bit format of
In addition, each MLP is also tightly coupled to the memory block, including 72K bits of RAM and 2K bits of register. This type of operation and store level linking allows the MLP to implement more complex AI algorithms without the need to use FPGA routing resources.
In addition, the MLP is also closely adjacent to the embedded memory module.Ensure that data is transferred to the MLP at the highest performance of 750 MHz by eliminating the delay associated with FPGA routing in traditional designs.
This combination of high-density computing and high-performance data transfer enables processor logic arrays to deliver the highest available computing power based on FPGAs with teraflops per second The quantity is in units (TOPS, Tera-Operations Per Second).
Ultra-high throughput memory bandwidth and interface
High-performance computing and machine learning systems The key is the high off-chip memory bandwidth, which provides storage sources and buffers for multiple data streams. The Speedster7t device is the only FPGA that supports GDDR6 memory, which is the highest bandwidth external storage device. Each GDDR6 memory controller is capable of supporting 512 Gbps of bandwidth.Up to eight GDDR6 controllers in the Speedster7t device support GDDR6 accumulation bandwidth of 4 Tbps and provide equivalent memory bandwidth with HBM-based FPGAs at a small cost.
“Meiguang (Micron) is happy to work with Achronix to achieve the world’s first direct high-bandwidth storage demand Loaded GDDR6 FPGA products," Mal Humphrey, vice president of marketing for Micron's Computing and Networking business unit. “Innovative and scalable solutions like this will drive differentiation within the field of artificial intelligence,Among them, heterogeneous computing options and high-performance storage are essential parts to accelerate the acquisition of data content.
In addition to this ultra-high throughput storage bandwidth, Speedster7t devices include the industry's highest performance interface ports to support extremely high bandwidth data streams. Speedster7t The device features up to 72 of the industry's highest performance SerDes, which can scale from 1 to 112 Gbps. There is also a hardware 400G Ethernet MAC with Forward Error Correction (FEC), support for 4x 100G and 8x 50G configurations, and each Controllers have 8 or 16 channels of hardware PCI Express Gen5 controllers.
"Achronix's new Speedster7t FPGA family is an innovative chip architecture implementation An excellent case of creating the architecture is to process large amounts of data directly to AI applications.Rich Wawrzyniak, chief market analyst for ASIC and SoC at Semico Research, said: "By integrating mathematical functions, memory and programmability into their machine learning processors, combined with cross-chips and 2D NoC structures, bottlenecks are eliminated." And a great way to ensure free flow of data throughout the device. In AI / ML applications, memory bandwidth is everything, Achronix's Speedster7t provides impressive performance metrics in this area. ”
New 2D on-chip network: providing ultra-efficient data movement
This 2D NOC can be connected to all FPGA high-speed data and memory interfaces. They are like superimposed on the FPGA interconnect city street system Like the aerial highway network, Speedster7t's NoC supports the high-bandwidth communication required between on-chip processing engines. Each row or column in the NoC can be implemented as two 256-bit, one-way, industry-standard AXI channels. It operates at 2 GHz and provides 512 Gbps of data traffic in each direction.
By implementing a dedicated 2D NoC in Speedster, it greatly simplifies high-speed data movement and ensures that data streams can be easily directed to any custom processing engine in the entire FPGA fabric. Most importantly, NOC eliminates the use of traditional FPGAs. Programmable routing and logic lookup table resources present congestion and performance bottlenecks in mobile data streams throughout the FPGA. This high-performance network not only increases the total bandwidth capacity of the Speedster7t FPGA, but also increases the effective LUT capacity while reducing power consumption.
Take the frequency required for 400G Ethernet bus bandwidth operation as an example.The best solution for traditional FPGAs is that the bus size is 1024 bits, but the required frequency is 724MHz, which is not possible in traditional FPGAs. Obviously, for any 400G Ethernet bus bandwidth, traditional FPGAs don't run fast enough.
In contrast, the Speedster7t FPGA can be implemented with a 256MHz operating frequency on a four-256-bit bus via a 2D NOC.
Security features for security-critical and hardware-assured applications
Speedster7t FPGA family products are available with the most advanced bitstreams in the face of threats from third-party attacks Security protection functions, they have multiple layers of defense to protect the confidentiality and integrity of the bitstream. The key is encrypted based on tamper-resistant physical unclonable technology (PUF), and the bit stream is 256-bit AES-GCM encryption algorithm. Encryption and verification are performed. To prevent attacks from the side channel, the bit stream is segmented, each data segment uses a separately derived key, and the decryption hardware uses a differential power analysis (DPA) counter measure. In addition, 2048 bits RSA public key authentication protocol is used to activate the decryption and authentication hardware. Users can be sure that when they load their secure bitstream, it is the expected configuration because it has passed the RSA public key, The AES-GCM private key and CRC are authenticated.
Four Speedster7t FPGA Family Products
The Speedster7t FPGA family currently has four products, ranging in size from 363K to 2.6M with 6-input look-up tables ( LUT).
In terms of specific performance indicators, Achronix revealed that the fastest 7t1500 of the Speedster7t FPGA series, at its highest frequency of 750MHz, 80 % utilization, each MLP block supports 16×Int8 operation,Under the ResNet-50 training model, image recognition capability of up to 8600 frames per second can be achieved. Under the Yolov2 algorithm, the 7t1500 can also achieve image recognition capability of 1600 frames per second.
According to Achronix CEO Robert Blake, ACE design tools that support all Achronix products are now available, including support for Speedcore eFPGA and SpeedchipTM FPGA poly Chip package chip (Chiplet).The first batch of Speedster7t FPGA family devices and boards for evaluation will be available in the fourth quarter of 2019.
From the previous introduction, we can easily see that Speedster 7t series FPGA Mainly through its new two-dimensional on-chip network, as well as a high-density new machine learning processor module array, the FPGA's programmability and ASIC's wiring structure and calculation engine are perfectly combined. This is similar to the new ACAP architecture introduced by Xilinx last year.
It should be noted that Achronix is the only company that offers both stand-alone FPGA chips and SpeedcoreTM embedded FPGA (eFPGA) semiconductor intellectual property (IP). . In other words, the chip design vendor can purchase the form of authorization,Integrate the IP of Achronix's SpeedcoreTM embedded FPGA (eFPGA) into your own chip design and design a chip that meets your needs.
And Achronix uses the same technology used in Speedster 7t FPGAs in Speedcore eFPGA IP to support seamless transitions from Speedster7t FPGAs to ASICs. This also means that chip designers can also get the latest Speedster7t FPGA family technology and convert it to an ASIC by working with Achronix. Achronix CEO Robert Blake said the technology is expected to help customers save up to 50% in power consumption and reduce costs by 90%.