“DRAM doesn’t scale anymore. In the glory days, memory bit density doubled every 18 months – outpacing even logic. That translates to just over 100x density increase every decade. But in this last decade, scaling has slowed so much that density has increased just 2x”. So says Dylan Patel. Just as logic chips improve dramatically in terms of density and cost per transistor function, DRAM improvements have been minor and increased bandwidth has been a function of expensive packaging, not scaling. Memory is unlikely to be a long-term bottleneck to AI data center scaling, but the question is one of economic viability. The DRAM roadmap hints at some brutal trade-offs in cost and power to achieve the throughput required for trillion+ parameter models. Today, High Bandwidth Memory (HBM) is the solution for almost every AI accelerator. It prioritizes bandwidth and power efficiency but is expensive at 3x the price of DDR5 per GB. HBM3e can deliver 36GB capacity and about 1.2 TBps performance. Gen 4 is the only game in town for data center AI accelerators. Other DRAM varieties like DDR5, LPDDR5X and GDDR6X target different cost, performance, and power requirements. Some companies are combining the high performance, high cost of HBM with the lower performance, lower cost of LPDDR. This is all fine but the truth is HBM is a hack. It’s a packaging solution to increase density to solve for the inherent DRAM bandwidth and power problems.
Opportunities
Spin-Transfer Torque Magnetic Random Access Memory (STT-MRAM)
Resistive Random Access Memory (ReRAM)
Others
FeRAM: Employs ferroelectric materials (typically PZT or HfO2) that maintain polarization states for non-volatile storage. Achieves read/write speeds of 20-80ns with extremely high endurance (>10¹⁴ cycles) and low power consumption due to voltage-based switching. Cell size remains large (15-40F²) due to capacitor structure requirements, limiting density to 1-2 Gb/cm². Manufacturing challenges include ferroelectric material integration with CMOS and scaling issues below 130nm nodes. Cost remains 4-5x higher than DRAM due to specialized materials and process complexity, confining use to niche applications like industrial control systems and automotive. Development focus shifts to HfO2-based implementations for better CMOS compatibility and scaling potential.
3D DRAM stacking: Employs through-silicon vias (TSVs) to vertically integrate multiple DRAM dies. Achieves bandwidth up to 900 GB/s and capacities of 24-48 GB per stack. Reduces power consumption by 50-70% compared to planar DRAM due to shorter interconnects. Thermal density increases exponentially with die count, requiring advanced cooling solutions. Manufacturing complexity impacts yields, particularly TSV formation and die thinning processes. Cost remains 2-3x higher than conventional DRAM due to complex integration and lower yields.
Hybrid Memory Systems in AI accelerators: Combines HBM (1-2 TB/s bandwidth, 4-64 GB capacity) with GDDR6/6X (up to 1 TB/s, 8-32 GB) or DDR5 (up to 460 GB/s, 128+ GB). Implements multi-level memory hierarchy with software-managed data movement between tiers. Utilizes cache coherence protocols and page migration algorithms to optimize data placement. Requires sophisticated memory controllers capable of managing multiple interfaces and protocols simultaneously. Enables heterogeneous compute architectures with specialized memory subsystems for different AI operations (e.g., HBM for matrix multiply, DDR for embedding tables). Increases design complexity and power management challenges due to disparate voltage domains and timing requirements.
Questions