It’s cheaper to design chips people are out there experimenting and shipping interesting stuff. Great. And batteries are “better”. But also we could just like, make the models smaller no? Seems like the easiest option imo. GPT-4 is estimated to have over 1 trillion parameters, requiring hundreds of gigabytes of memory for storage and potentially consuming up to 2,000 watts during inference. For context, the iPhone 15 Pro has 8GB of RAM and a total power consumption under 5 watts, while even the most advanced smartwatches have less than 2GB of RAM and operate on milliwatts of power. In the past 2 years , we’ve seen Gemini Nano, GPT-4 mini, Claude Haiku, and TinyLlama. These models can be 10-100x smaller than their larger siblings offering good enough performance. But the larger models are still much much better. While hardware engineers work on optimized chips and better batteries, there's substantial work to be done on the algorithmic front. The algorithmic overhang as it’s known.

6.1. Model Compression

6.2. Efficient Architectures

6.3. Adaptive Inference