5.1. Edge AI Chips

So we need new chips to meet the latency and power requirements for always-on, always-watching, always-listening AI on our devices. Each device will need a dedicated chip, Apple is likely the model for the industry with A range for the iPhone, H range for airpods, M for laptops, R for mixed reality headsets, S range for watches. We can expect further specialization over the next few years. These SoCs combine CPU, GPUs, wireless comms, memory, and a so-called neural engine, which I think we are all now calling a neural processing unit (NPU) for the GPU to offload some AI tasks. As with data center AI designs, there is plenty of room for optimization. We will almost certainly see edge AI accelerators in the market optimized for a broad range of AI offloading tasks. The challenge is deciding how specific the design needs to be to meet performance, latency and power requirements whilst maintaining some level of programmability to run a variety of algorithms. So we may see the NPU become dominant as a “general-purpose” AI accelerator, or a variety of “application-specific accelerators” for voice conversations or for image recognition or for biosignal tracking. The most interesting areas are in-memory computing (IMC), neuromorphic accelerators such as event-based cameras, and analog accelerators.

In-memory computing (IMC)
Neuromorphic chips and event-based cameras
Analog AI accelerators

In-memory computing (IMC)

In-memory computing (IMC) attempts to address the fundamental bottleneck of data movement between processing and memory units. This approach promises to revolutionize the energy efficiency of AI computations, particularly for large-scale models. At its core, IMC leverages the physical properties of memory devices to perform computations directly within the memory array. While both digital and analog implementations exist, digital IMC currently dominates due to its compatibility with existing CMOS processes and the challenges associated with analog-to-digital conversion in analog IMC.

Digital IMC architectures typically employ emerging non-volatile memory (NVM) technologies such as Resistive RAM (ReRAM) or Magnetoresistive RAM (MRAM). These devices offer the dual functionality of data storage and computation, enabling the creation of computational memory arrays. In ReRAM-based IMC, for instance, the resistance states of memristive devices are used to represent synaptic weights, while current summation along bitlines implements the multiply-accumulate (MAC) operations fundamental to neural network computations. The efficiency of IMC stems from its ability to parallelize MAC operations across vast arrays of memory cells. In a typical implementation, input activations are applied as voltages to wordlines, while synaptic weights are encoded in the conductance of memory elements. The resulting currents are summed along bitlines, directly realizing the dot product operation. This parallelism, combined with the elimination of data movement, enables IMC architectures to achieve energy efficiencies up to 10 TOPS/W in sub-28nm nodes, a significant improvement over conventional digital designs.

However, IMC faces several technical challenges. Device variability and noise, particularly in analog implementations, can impact computational accuracy. Current research focuses on developing error-resilient algorithms and circuit-level techniques to mitigate these issues. For example, iterative programming schemes and closed-loop weight update mechanisms have been proposed to enhance the precision of weight storage in ReRAM arrays. Another critical challenge is the development of efficient analog-to-digital converters (ADCs) for reading out computation results. Recent advancements in this area include the use of time-domain ADCs and novel circuit topologies that leverage the inherent properties of NVM devices for conversion. The scalability of IMC architectures is also an active area of research. As array sizes increase, issues such as sneak path currents and voltage drops along bitlines become more pronounced. Hierarchical designs and novel crossbar architectures are being explored to address these challenges and enable the scaling of IMC to larger model sizes. On the algorithmic front, there's ongoing work to adapt neural network training algorithms to the constraints of IMC hardware. This includes developing quantization techniques that match the precision capabilities of IMC arrays and exploring novel neural network architectures that are inherently more tolerant to the non-idealities of IMC computations.

Recent benchmarks have demonstrated that many AI workloads, particularly those in inference tasks, can tolerate the inherent imprecision of IMC without significant accuracy loss. For instance, studies have shown that convolutional neural networks implemented on ReRAM-based IMC can achieve near-software accuracy levels for image classification tasks while offering orders of magnitude improvement in energy efficiency. Looking ahead, the integration of IMC with advanced packaging technologies, such as 3D stacking, promises to further enhance performance and energy efficiency. Additionally, the exploration of hybrid architectures that combine IMC with traditional digital processing elements is emerging as a promising direction for balancing the strengths of both paradigms.

Worth watching

Syntiant (USA): Syntiant develops Neural Decision Processors (NDPs) that perform AI computations within memory, specifically optimized for always-on audio and sensor applications. Their approach enables ultra-low power consumption, allowing for efficient AI processing in battery-operated edge devices.
Axelera AI (Netherlands): Axelera AI combines digital in-memory computing with analog AI cores to create high-performance, energy-efficient AI accelerators for edge devices. Their hybrid approach aims to provide both the flexibility of digital systems and the efficiency of analog computation.
Hailo (Israel): While primarily a digital architecture, Hailo's chips incorporate elements of in-memory computing to optimize dataflow and reduce power consumption. Their architecture is designed to efficiently run neural networks by minimizing data movement, a key principle of IMC.

Neuromorphic computing

Neuromorphic computing represents a shift in computing that draws inspiration from the biological neural systems. This approach stands in contrast to both traditional von Neumann architectures and IMC designs, offering advantages for certain classes of AI applications. At the core of neuromorphic computing are Spiking Neural Networks (SNNs), which utilize discrete event-based communication (spikes) rather than continuous values. The Loihi 2 chip, Intel's latest iteration, demonstrates the ability to implement up to 1 million neurons per chip, each capable of integrating up to 4,096 synaptic inputs. The neuromorphic architecture employs a fundamentally different computational model compared to traditional AI accelerators or IMC systems. While IMC focuses on efficient matrix multiplication within memory arrays, neuromorphic systems emulate a asynchronous, event-driven processing. This results in extremely low power consumption during periods of inactivity, a crucial advantage for always-on edge devices.

Loihi 2 incorporates several key features that set it apart from traditional computing architectures. Its asynchronous spike-based computation allows neurons to communicate via discrete spikes, eliminating the need for a global clock and enabling fine-grained power gating. The chip also implements on-chip learning capabilities, supporting various synaptic plasticity rules for online learning and adaptation. Furthermore, Loihi 2 offers programmable neuron models, allowing researchers to implement a variety of neuron models ranging from simple leaky integrate-and-fire to more complex biologically-inspired models. The high fan-in and fan-out capabilities of each neuron enable the implementation of complex network topologies, with each neuron able to receive inputs from and send outputs to thousands of other neurons.

The event-driven nature of neuromorphic systems aligns well with emerging sensor technologies, particularly event-based vision sensors like Prophesee's Metavision. These sensors operate on a fundamentally different principle compared to traditional frame-based cameras, registering only changes in pixel intensity. This approach achieves temporal resolutions below 10 microseconds, enabling the capture of high-speed events that would be missed by conventional cameras.

Neuromorphic computing faces several challenges in its path to widespread adoption. Creating efficient algorithms for SNNs and event-based data processing requires new approaches, as traditional deep learning methods are not directly applicable. The unique properties of SNNs necessitate new programming tools and frameworks, presenting a barrier to adoption. Finding sufficiently large and valuable niche applications to drive production volumes and reduce costs remains a challenge. Additionally, bridging the gap between neuromorphic hardware and traditional computing systems requires significant engineering effort.

Recent advancements in neuromorphic computing include the development of larger-scale systems, such as Intel's Pohoiki Springs with 100 million neurons, and the exploration of novel materials and devices for implementing neuromorphic principles at the hardware level. Research is also ongoing in developing hybrid systems that combine neuromorphic elements with traditional digital processors or IMC units, aiming to leverage the strengths of each paradigm.

As the field progresses, neuromorphic computing holds promise for enabling new classes of AI applications, particularly in scenarios requiring real-time processing of temporal data streams, adaptive learning, and ultra-low power consumption. While it may not replace traditional AI accelerators or IMC systems in the near term, neuromorphic computing is poised to carve out its own niche in the AI hardware landscape, potentially revolutionizing areas such as autonomous systems, brain-computer interfaces, and intelligent sensor networks.