What up? Last week I said the Magnificent Seven have an advantage in the next leg of AI build out because test-time compute introduced operating costs. Test-time compute remember is using runtime to compute to come up with a better response instead of exclusively using the training data. Test-time compute is basically what will enable practical agents to do stuff on the Internet.
My claim is that venture capital as an asset class is less attractive than it used to be [can’t include timeline without more research]. Because:
I want to explain number three because it’s the load baring argument.
TL;DR
AI investing is closer to deep tech than Internet/SaaS investing over the past 20 years. The shift to reasoning-heavy AI models introduces significant operating expenses at scale, challenging the traditional software valuation playbook. While higher revenues may offset some costs as agents replace human labor, startups face massive challenges: higher opex, lower margins, and intense competition from incumbents. VCs (we) need to adapt or risk suboptimal returns. But considering the amount already invested, it’s likely the returns are already in the post. Good luck out there.
1. Historical Context
In the era of internet services and mobile apps, computational processes during user interactions were lightweight. Google Search and Facebook News Feed relied on pre-built indexes, efficient algorithms, and cached data. This allowed these businesses to scale globally at near-zero marginal cost, leading to incredibly high gross margins (often 75%-80%). The economics of SaaS followed a similar trajectory, with relatively low customer acquisition costs (CAC) and high annual contract values (ACVs), creating a highly scalable model. It was the best of times.
2. The AI Transition: From Capex to Opex
Old World: Between 2018 and 2024, AI scaling followed the pre-training scaling law, which was capex-intensive. Larger models, trained on massive GPU clusters, dominated the landscape, but inference costs were relatively modest. Models like GPT-3 and Claude were built for static inference: once trained, the computational burden of serving each query was manageable.
New World: From 2024 onwards, basically o1 and now o3, a shift emerged: post-training scaling laws emphasized runtime compute. Techniques like chain-of-thought (CoT) reasoning, retrieval-augmented generation, and adaptive inference increased performance but required significantly more computation per query. While these approaches improved reasoning and decision-making—core features for AI agents—they also introduced a meaningful, recurring opex burden for each user interaction.
3. Linear or Near-Linear Scaling Costs
Unlike SaaS, where incremental user costs approach zero at scale, reasoning-heavy AI systems incur meaningful costs for every user or request. Inference costs no longer fall neatly into the background but instead rise in proportion to usage. Serving billions of users becomes proportional to the compute needed, fundamentally altering the scaling economics.
Evidence
1. Compute-Heavy Models
OpenAI’s o3 is claimed to cost upwards of $1,000 per query for the most intensive reasoning modes. While this is likely an outlier tied to early experimentation, it underscores the growing burden of runtime compute. Even optimized production models may require custom silicon and aggressive cost management to remain viable.