We have talked something about AI basics and development as well as the AI scholars at "Summary of Tsinghua AI Chip Report One" and "Summary of Tsinghua AI Chip Report two". In this part,we will learn some knowledge about AI chip industry and trends as well as representative enterprise in AI chip.
With the continuous development of AI chips, the application field will continue to develop in a multi-dimensional direction with the passage of time. Here we choose several industries which are currently relatively concentrated to introduce.
Currently, AI chips are more concentrated in application fields.
In September 2017, Huawei released Kirin 970 chips at the Consumer Electronics Show in Berlin, Germany. The chips, equipped with Cambrian NPU, became the "first AI chips for mobile smartphones in the world". In mid-October 2017, Mate10 series of new products (the processors of this series of mobile phones are Kirin 970) were launched. Huawei Mate10 series smartphones equipped with NPU have strong ability of deep learning and local inference, so that all kinds of photography and image processing applications based on deep neural network can provide users with a more perfect experience.
Apple has released mobile phones represented by the iPhone X and their built-in A1 Bionic chips. Neural Engine (Neural Network Processing Engine), a dual-core architecture independently developed in A11 Bionic, can handle the corresponding neural network computing requirements up to 600 billion times per second. This Neural Engine made A11 Bionic a real AI chip. A11 Bionic has greatly improved the experience of taking photos on the iPhone X and provided some innovative new uses.
ADAS is one of the most attractive AI applications. It needs to process a large amount of real-time data collected by lidar, millimeter wave radar, camera and other sensors. Compared with traditional vehicle control methods, intelligent control methods are mainly embodied in the application of control object model and comprehensive information learning, including neural network control and deep learning methods. Thanks to the rapid development of AI chip, these algorithms have been gradually applied in vehicle control.
Devices requiring computer vision technology, such as smart cameras, unmanned aerial vehicles, traffic recorders, face recognition greeting robots and smart handwriting boards, often have the need for local inference. If they can only work on the Internet, they will undoubtedly bring bad experience. At present, computer vision technology seems to be one of the fertile fields for artificial intelligence applications, and computer vision chips will have broad market prospects.
The representative of VR device chip is HPU chip, which is developed and customized by Microsoft for its own VR device Hololens. The chip, which is manufactured by TSMC, can process data from five cameras, one depth sensor and motion sensor at the same time. It also has the function of accelerating matrix operation and CNN operation of computer vision. This allows VR devices to reconstruct high-quality 3D images of portraits and transmit them anywhere in real time.
In terms of voice interaction device chips, there are two companies in China, Qiying Tailun and Yunzhisheng. The chips provided by them all have built-in deep neural network acceleration schemes optimized for voice recognition to realize off-line voice recognition. The stable recognition ability provides the possibility for the landing of voice technology. At the same time, the key link of voice interaction has also made significant breakthroughs. Speech recognition breaks through the single point ability, from far-field recognition to speech analysis and semantic understanding, and presents a whole interactive scheme.
Both home robots and commercial service robots need AI solutions with special software and chips. Typical companies in this field have Horizon Robots founded by Yu Kai, former head of Baidu Deep Learning Laboratory. Of course, Horizon Robots also provide other embedded systems such as ADAS and Smart Home. Artificial intelligence solutions.
This article will introduce the representative enterprises in the field of AI chip technology at home and abroad. The rankings are in no order. The domestic representative enterprises in the field of AI chip technology include Zhongke Cambrian, Zhongxing Weiwei, Horizon Robot, Shenjian Science and Technology, Lingxi Science and Technology, Qiying Tailun, Baidu, Huawei, etc. Overseas, Yingda, AMD, Google, Qualcomm, Nervana Systems, Movidius, IBM, ARM, CEVA, MIT/Eyeriss, Ping, etc. Guo, Samsung and so on.
Middle Cambrian. Cambrian Science and Technology was founded in 2016, headquartered in Beijing. Its founders are Chen Tianshi and Chen Yunkuo brothers of the Institute of Computing, Chinese Academy of Sciences. The company is committed to building core processor chips of various intelligent cloud servers, intelligent terminals and intelligent robots. Alibaba Venture Capital, Lenovo Venture Capital, National Science Investment, China Koturing, Yuanhe Origin, Chungbao Investment Co-investment, is the first Unicorn start-up company in the global AI chip field.
Cambrian is the first successful AI chip company in the world with mature products. It has two product lines: terminal AI processor IP and cloud high performance AI chip. Cambricon-1A (Cambricon-1A), released in 2016, is the world's first commercial in-depth learning dedicated processor. It is aimed at smartphones, security monitoring, UAVs, wearable devices and intelligent driving and other terminal devices, and its performance-power ratio in running the mainstream intelligent algorithms surpasses that of traditional processors.
Zhong Xing Wei. In 1999, a number of PhD entrepreneurs from Silicon Valley established Zhongxing Microelectronics Co., Ltd. in Zhongguancun Science and Technology Park, Beijing. They initiated and undertook the national strategic project "Starlight China Core Project", devoted to the development, design and industrialization of digital multimedia chips.
In early 2016, Zhongxingwei launched the world's first SVAC video codec SoC integrated with Neural Network Processor (NPU), which enables intelligent analysis results to encode with video data at the same time, forming a structured video stream. This technology is widely used in video surveillance cameras, which opens a new era of intelligent security surveillance. The self-designed embedded neural network processor (NPU) adopts the architecture of "data-driven parallel computing" and optimizes the deep learning algorithm. It has the characteristics of high performance, low power consumption, high integration and small size. It is especially suitable for the front-end intelligence of the Internet of Things.
Internal Structure of Neural Network Processor VC0616 Integrating NPU
Horizon Robotics. Horizon Robot was founded in 2015, headquartered in Beijing. Its founder is Yu Kai, former director of Baidu Deep Learning Institute. BPU (Brain Processing Unit) is an efficient artificial intelligence processor architecture IP designed and developed independently by Horizon Robot. It supports the implementation of ARM/GPU/FPGA/ASIC, and focuses on special areas such as autopilot and face image recognition. In 2017, Horizon released an embedded AI solution based on Gauss architecture, which will be applied in intelligent driving, intelligent life and public security. The first generation of BPU chip Pangu has entered the stage of streaming. It is expected to be launched in the second half of 2018. It can support 1080P high-definition image input. Processing 30 frames per second, detecting and tracking hundreds of targets. The first generation BPU of the horizon uses the 40-nm technology of TSMC. Compared with the traditional CPU/GPU, the energy efficiency can be increased by 2-3 orders of magnitude (about 100-1,000 times).
Shenzhen Science and technology. Shenjian Science and Technology was established in 2016, headquartered in Beijing. Founded by Tsinghua University and Stanford University, the world's top in-depth learning hardware researchers. Deep Learning Technology was acquired by Sales in July 2018. Deep learning technology calls its neural network processor based on FPGA DPU. So far, Shenjian has published two DPUs: Aristotle Architecture and Descartes Architecture. Among them, Aristotle Architecture is designed for convolutional neural network CNN. Descartes Architecture is designed to deal with DNN/RNN network, which can make extremely efficient hardware for sparse neural network after structure compression. Speed up. Comparing with Intel Xeon CPU and Nvidia TitanX GPU, processors using Cartesian architecture increase the computing speed by 189 times and 13 times respectively, and have higher energy efficiency by 24,000 times and 3,000 times.
Ling Xi technology. Lingxi Science and Technology was founded in Beijing in January 2018. Its co-founders include the world's top brain computing researchers at Tsinghua University. The company is committed to the development of a new generation of neural network processor (Tianjic), which is characterized by its ability to efficiently support existing popular machine learning algorithms (including CNN, MLP, LSTM and other network architectures), as well as more brain-like, more growth potential impulse neural network algorithms, so that the chip has high computational power and high growth potential. Multi-task parallelism and low power consumption are the advantages. Software tool chain supports mapping and compiling of neural network directly by Caffe, TensorFlow and other algorithm platforms, and develops friendly user interface. Tianjic can be used in cloud computing and terminal application scenarios to help AI landing and promotion.
Ki Indira Tallon. Qiying Tailun, founded in Chengdu in November 2015, is a speech recognition chip developer. Qi Ying Tailun's CI1006 is an artificial intelligence speech recognition chip based on ASIC architecture. It contains the processing hardware unit of brain neural network. It can perfectly support DNN computing architecture and carry out high-performance data parallel computing. It can greatly improve the processing efficiency of artificial intelligence deep learning speech technology for large amounts of data.
Baidu. Baidu released XPU at the Hot Chips conference in August 2017, which is a 256-core, FPGA-based cloud computing accelerator chip. The partner is Xilinx. XPU uses a new generation of AI processing architecture, has the versatility of GPU and the high efficiency and low energy consumption of FPGA, and has highly optimized and accelerated Baidu's in-depth learning platform Paddle Paddle. It is reported that XPU pays attention to computing intensive and rule-based diversified computing tasks, hoping to improve efficiency and performance, and bring flexibility similar to CPU.
HUAWEI. The neural network processor NPU on Kirin 970 uses Cambrian IP, as shown in Figure 12. Kirin 970 uses TSMC 10nm process, has 5.5 billion transistors, and its power consumption is 20% lower than that of the previous generation chips. CPU architecture is composed of four core A73 + four core A53, and its energy consumption is 20% higher than that of the previous generation of chips; GPU uses 12 core Mali G72 MP12 GPU, which improves 20% and 50% in graphics processing and energy efficiency, respectively; NPU uses HiAI mobile computing architecture and provides computational performance under FP16. It can reach 1.92 TFLOPs. Compared with four Cortex-A73 cores, it has about 50 times energy efficiency and 25 times performance advantage in dealing with the same AI tasks.
Nvidia. Invida was founded in 1993, headquartered in Santa Clara, California, USA. As early as 1999, Invida invented GPU, redefined modern computer graphics technology, and completely changed parallel computing. Deep learning has a very stringent requirement for computing speed, and Nvida's GPU chip can allow a large number of processors to operate in parallel, which is ten or even tens times faster than CPU, so it has become the first choice of most AI researchers and developers. Since Google Brain used 16,000 GPU cores to train DNN models and achieved great success in voice and image recognition, Invida has become an undisputed leader in the AI chip market.
AMD. AMD Semiconductor Company specializes in the design and manufacture of innovative microprocessors (CPU, GPU, APU, motherboard chipset, TV card chip, etc.) for the computer, communications and consumer electronics industries, as well as providing flash memory and low power processor solutions. AMD Semiconductor Company was founded in 1969. AMD is committed to providing standard-based, customer-centric solutions for technology users, from businesses, government agencies to individual consumers.
In December 2017, Intel and AMD announced that they will jointly launch a laptop chip that combines Intel processors and AMD graphics units. At present, AMD has high performance Radeon Instinc acceleration card for AI and machine learning, open software platform ROCm, etc.
Google. In 2016, Google announced that it would independently develop a new processing system called TPU. TPU is a special chip specially designed for machine learning applications. By reducing the calculation accuracy of the chip and reducing the number of transistors needed for each calculation operation, the number of operations per second of the chip is higher, so that the fine-tuned machine learning model can run faster on the chip, and then the user can get more intelligent results faster. In March 2016, Li Shishi was defeated and in May 2017, Koger's Alpha dog was defeated by using Google's TPU chips.
During the developers'conference of Google I/O-2018, TPU 3.0, the third generation of AI learning processor, was officially released. TPU3.0 uses 8-bit low-precision calculation to save the number of transistors. It has little impact on accuracy, but it can save power and speed up a lot. It also has pulse array design, optimize matrix multiplication and convolution operation, and use larger on-chip memory to reduce the dependence on system memory. Speed can be accelerated to a maximum of 100 PFlops (1000 trillion floating-point calculations per second).
Qualcomm. Qualcomm, which dominates the smartphone chip market, is also positively positioned in the area of AI chips. According to the information provided by Qualcomm, it has invested in Clarifai and China's cloud acquaintance "focusing on artificial intelligence services of the Internet of Things". As early as 2015, on CES, Qualcomm has launched a flying robot, Snapdragon Cargo, carrying the Miaolong SoC. Qualcomm believes that it can play its role in the field of computer vision in terms of industrial and agricultural monitoring, as well as new demands for aerial photography, photography and video. In addition, Qualcomm's Yaolong 820 chip is also used in VR helmet. In fact, Qualcomm is already developing mobile device chips that can perform in-depth learning locally.
Nervana Systems. Nervana was founded in 2014. The Nervana Engine is an ASIC chip specially designed and optimized for in-depth learning. The implementation of this scheme benefits from a new memory technology called High Bandwidth Memory, which has both high capacity and high speed, providing 32 GB on-chip storage and 8 TB per second memory access speed. The company currently offers an artificial intelligence service called in the cloud, which they claim is the fastest in the world and is currently used by financial services, health care providers and government agencies. Their new chips will ensure that the Nervana cloud platform will remain at its fastest pace in the next few years.
Movidius (acquired by Intel). In September 2016, Intel announced its acquisition of Movidius. Movidius focuses on developing high-performance visual processing chips. The latest generation of Myriad2 vision processor is mainly composed of SPARC processor as the main controller, special DSP processor and hardware acceleration circuit to process special vision and image signals. This is a vision processor based on DSP architecture. It has a very high energy consumption ratio in the field of vision-related applications. It can popularize visual computing to almost all embedded systems.
The chip has been widely used in Tango mobile phone, Xinjiang UAV, FLIR intelligent infrared camera, Haikang Deep Eye series camera, Huarui intelligent industrial camera and other products of the Google 3D project.
IBM. IBM has released Watson a long time ago and has put a lot of practical applications into it. In addition, the development of brain-like chips, TrueNorth, has been launched. TrueNorth is the latest achievement of IBM's involvement in the DARPA research project SyNapse. SyNapse is called Systems of Neuromorphic Adaptive Plastic Scalable Electronics. The ultimate goal of SyNapse is to develop a computer architecture that breaks the von Neumann architecture.
ARM. With ARM's new chip architecture Dynam IQ, the performance of AI chips is expected to increase 50 times in the next three to five years.
ARM's new CPU architecture will bring together multiple processing cores by configuring software for different parts, including a processor specially designed for AI algorithms. Chip manufacturers will be able to configure up to eight cores for new processors. At the same time, in order to make mainstream AI run better on its own processors, ARM will also launch a series of software libraries.
CEVA. CEVA is an IP supplier specializing in DSP, with many product lines. Among them, image and computer vision DSP product CEVA-XM4 is the first programmable DSP to support in-depth learning. Its new generation CEVA-XM6 has better performance, stronger computing power and lower energy consumption. CEVA points out that smartphones, automobiles, security and commercial applications such as UAVs and automation will be the main objectives of its business.
MIT/Eyeriss. Eyeriss is actually a MIT project, not a company, and in the long run, if it goes well, it's likely to incubate a new company. Eyeriss is a highly efficient hardware for deep convolution neural network (CNN) accelerator. It has 168 cores built into the chip, which is specially used to deploy the neural network. Its efficiency is 10 times that of the general GPU. The key technology is to minimize the frequency of data exchange between the GPU core and memory (which usually consumes a lot of time and energy). In general, the core of the GPU usually shares a single memory, but each core of Eyeriss has its own memory.
At present, Eyeriss is mainly located in face recognition and speech recognition. It can be used in smartphones, wearable devices, robots, automobile and other Internet of Things applications.
At the launch of the iPhone 8 and X, Iphone made it clear that the A11 processor it used integrated a hardware dedicated to machine learning, the Neural Engine, with up to 600 billion operations per second. The chip will improve the performance of Apple devices when dealing with tasks requiring artificial intelligence, such as face recognition and speech recognition.
Samsung. In 2017, Huawei Hess launched Kirin 970 chips. According to people familiar with the situation, Samsung has developed many kinds of AI chips for Standard Huawei. Samsung plans to use AI chips in new smartphones in the next three years, and they will also build new component businesses for AI devices. Samsung also invested in artificial intelligence chip companies such as Graphcore and Shenjian Technology.
At present, the core of mainstream AI chips is to use MAC (Multiplier and Accumulation) acceleration array to accelerate the most important convolution operations in CNN (Convolutional Neural Network). This generation of AI chips has three main problems.
(1) The huge amount of data required for in-depth learning computation results in the bottleneck of the whole system, namely the so-called "memory wall" problem.
(2) Relating to the first problem, a large number of memory accesses and a large number of operations of MAC arrays increase the overall power consumption of AI chips.
(3) Deep learning requires high computational power. The best way to improve computational power is hardware acceleration. At the same time, the development of deep learning algorithms is also changing rapidly. New algorithms may not be well supported on the solidified hardware accelerator, that is, the balance between performance and flexibility.