Will Scaling Laws Hold? 2025 and the Future of AI
Scaling Laws and Application Layers: The Defining AI Questions of 2025
Another year has flown by, bringing us closer to a future shaped by AI. I've always been wary of predictions—too often, they're wrong. As I said in my post about the AI market, this isn't about playing the expert; it's about cutting through the noise to uncover first principles. Let's see what 2025 may bring.
Before diving into 2025, a quick note: I've included my 2024 predictions scorecard at the end of this post (spoiler alert: I did not do bad).
The Trillion-Dollar Question: Will Scaling Laws Hold for 2025?
The Trillion-Dollar Question: Will Scaling Laws Hold for 2025?
2025 will test whether the scaling laws—more data and more compute leading to better performance—continue to hold. This is, quite literally, the trillion-dollar question that has been talked about to insanity over the past few months. With hundreds of billions of committed investment in data center infrastructure, the stakes couldn't be higher.
If the scaling laws persist, investment and progress will surge ahead. But if they plateau or break, we could face the blow-up scenario I outlined here: where we need to face a significant infrastructure write-off, which could be a good thing for the Infinity games (so that people feel they can stop throwing billions into the problem)
One of the most influential essays I read this year is "Infinity Missions". In this essay, Packy McCormick explores the concept of "Infinity Games," emphasizing the importance of pursuing ambitious, long-term projects that aim for boundless positive impact. Drawing inspiration from Blaise Pascal's philosophical ideas, McCormick argues that dedicating resources to these visionary endeavors is rational, as the potential infinite rewards far outweigh the finite risks. He encourages individuals and organizations to engage in "Infinity Missions"—ventures that, despite uncertain outcomes, have the potential to drive significant progress and innovation.
To give my take on 2025 scaling laws, let's break it into their two main components: compute and data. Progress is expected on both fronts.
Compute
We haven’t seen real scaling in computing since the Gen3 models (starting with GPT-4 in March 2023). Consequently, the last 18 months have felt like incremental progress—or even stagnation—for some (which is, btw, a big fallacy. If today you talk to GPT-3 or even the early version of 4, you would not believe how dumb they were).
What's true is that we are missing the kind of substantial scaling that fueled the leap between GPT-3 and GPT-4. In 2025, three key developments will shape the compute landscape:
Bigger Single-Location Clusters
Elon Musk’s announcement of a 100k H100 data center was soon followed by plans for a 1M H100 facility. In 2025, we will see the first Gen4 models trained on clusters of at least 100k H100s. This will be the true acid test for scaling.
Higher GPU Performance
Nvidia’s B100 and B200 GPUs are set to hit the market, offering roughly double the performance for the same power consumption. This means each data center will deliver significantly higher compute output.
Multi-Location Data Centers
Energy remains the bottleneck for scaling single-location data centers, but multi-location setups may provide a solution. Google is believed to have used this approach for training the Gemini family, and more players are likely to follow suit.
Data
While much of the available internet text has been mined and used, we have yet to tap into the potential of multimodal data fully (think YouTube video, audio…) and synthetic data (think o3).
Multimodal Data: Platforms like YouTube (which Google is already exploiting) hold immense untapped potential, as their video and audio content can complement traditional text-based training data.
Synthetic Data: Synthetic data, particularly that derived from reasoning models, will increasingly feed into training pipelines. Simulation models such as Sora or Veo2 are set to become major sources of high-quality synthetic data, expanding the training horizons.
So, Will Scaling Laws Hold?
Here’s my take: While pre-training gains may become harder to achieve, we will continue to see significant model improvements driven by capacity build-up and the integration of synthetic data. Scaling isn’t just a question of adding resources but also about leveraging new forms of data and smarter compute solutions - what OpenAI called reasoning- If these advancements align, 2025 could mark another leap forward for AI.
Being Specific: Where Do I See Capabilities Progress?
What Aspects Should We Expect Foundational Models to Improve On?
1. IQ
Better models mean higher IQ. This improvement will stem from two areas:
Pre-training Gains will be incremental gains for the new generation of frontier models like GPT-next (I have no clue about the name) or Grok 3; I suspect, however, that pre-training will be a lesser factor in 2025.
Inference-Time Compute: Technologies like OpenAI’s reasoning (O1, O3 models), and Google’s reasoning breakthroughs are poised to steal the spotlight. These innovations will allow models to perform more complex reasoning during inference, enabling leaps in capability.
If this trend continues, AI will surpass human-level performance in economically valuable tasks such as coding, mathematics, and even some research areas. Hopefully, these advances will translate into tangible progress in fields like scientific discovery.
Why Do I Think This? Compute is already committed for the next 5 years (ingredient 1 in the scaling potion), and AGI labs have spent the last 18 months developing pipelines to extract value from synthetic data (ingredient 2). Thus, we have the two needed ingredients for scalability.
2. Reliability
While hallucinations can enhance creativity, they’re a liability in agentic flows. These flows involve models autonomously performing multiple actions to achieve a goal. For example, imagine Luzia booking your flight. You’d expect the best deal to Paris—not an accidental trip to Kathmandu.
For agentic flows to succeed, models need to become significantly more reliable. Error compounds in multi-turn interactions - accuracy of a 10-step process in a 98% accurate model goes down to 82% -. If agentic flows take hold, AI will unlock a new dimension of value creation with full automation of whole workflows.
Why Do I Think This? Agentic flows represent a massive opportunity for AI. Closing the reliability gap so that we start seeing agentic applications is critical to justifying the growing investment in foundational models. If you can disrupt a whole SaaS industry, the willingness to pay for these models will be orders of magnitude higher (a $2000 a month subscription?). Revenues for labs will continue their upward trajectory, and their CEOs will gather enough resources and support for Gen5 and Gen6 models
3. Real Multimodality
Multimodality will become mainstream in 2025. Foundational models will be capable of seamlessly integrating and understanding inputs from multiple modalities—text, image, audio, and video—to perform complex tasks. Multimodality will also reach consumers.
Why Do I Think This? Humans naturally work across multiple modalities, and there’s a heck of a lot of untapped multimodal data available (think YouTube and podcasts). Labs are incentivized to overcome the technical challenges of building true multimodal models to unlock this rich data trove.
What Will Happen to Open Source and Small Models?
4. Open Source
Open-source models will continue to evolve as long as Meta (and others) funds their training. While there might be a lag between frontier models and their open-source counterparts, the training methodologies are well-understood. It’s only a matter of time and funding before reasoning models comparable to O1 emerge in the open-source ecosystem.
Why Do I Think This? Meta has openly committed to advancing open-source AI, using it as a strategy to commoditize foundational AI technology, set de facto standards, and disrupt competitors.
5. Small/tiny Models
Small models will continue to close the gap with larger ones, albeit with a 12-18 month delay. We’ll see the rise of ultra-compact models (1B-2B parameters) designed for edge devices like smartphones and cars. Think about how powerful your car voice assistant would be even with a GPT 3.5-level model (Have you tried talking to your car lately?)
Why Do I Think This? Two accelerating trends, with no sign of slowing down, are driving the push for on-device computing:
Privacy: Companies like Apple emphasize user privacy, better served by processing data locally.
Scalability: Inference is inherently energy-intensive. The only scalable solution is to shift the cost to the user by leveraging edge devices.
Where My Assumptions Might Fail
I can see two non-zero probability scenarios that could challenge this forecast. Both are completely opposite sides of the same reality.
1. Gen4 Falls Short
If Gen4 models (GPT-next) fail to meet expectations and reasoning data isn’t enough to deliver an “AlphaGo moment,” it’ll be back to the drawing board. This could delay meaningful progress as researchers pivot to new approaches.
2. Gen4 Is Too Good
If Gen4 models are astonishingly powerful, they might not be released to the public. This could happen due to:
Profit Motives: Companies may restrict access to maintain competitive advantages.
Regulatory Barriers: Governments could impose restrictions to mitigate risks, slowing deployment.
And finally, any prediction for the remaining of the AGI labs?
Five major labs are pushing the frontier of AI: OpenAI, Anthropic, xAI, Google, and Meta. Here are my thoughts on each of these:
OpenAI
OpenAI will continue leading the pack in frontier models, aggressively releasing products to the public to maintain its leader perception—a critical factor for securing funding. On the product side, OpenAI is likely to double down on direct-to-consumer offerings, strengthening its brand presence. For APIs, knowing that foundational models are rapidly becoming commoditized, OpenAI will likely push API consumers toward customization as a lock-in mechanism (e.g., fine-tuning, RLHF).
Anthropic
Despite less funding (compared to OpenAI) and a narrower market adoption (vs. ChatGPT), Anthropic will remain competitive due to its high-quality product, well-positioned models—their tone is appealing for specific tasks—and its ability to attract AI talent. Anthropic’s more conservative release strategy could appeal to developers and researchers wary of OpenAI’s aggressive approach. I see a non-zero chance of Anthropic being acquired (or "acquired") by one of the hyperscalers.
xAI
If scaling laws hold, xAI stands to be one of the biggest winners, thanks to Elon Musk’s ability to achieve what often seems impossible. Grok—xAI’s model—could leverage two key opportunities: higher IQ through scaling and exclusive access to real-time information from Twitter’s feed.
Google
Google’s main advantages for the next 12 months are its access to vast amounts of data—YouTube is a goldmine—and its vertical integration, which will fuel the improvement of its models. Google will continue integrating AI across its suite of products, with the search engine being the most critical. Personally, I’ve gone from using Google Search hundreds of times a day to not opening it for weeks. While this might reflect early adopter behavior, it’s an existential threat for Google and the company is for sure reacting to it.
Meta
Meta will continue championing open-source AI and distributing its models through its own platforms, such as Instagram and WhatsApp. If scaling laws stagnate, Meta could emerge as the biggest winner as it does not depend on the profits coming from model usage to win. The company might aggressively pursue alternative AI products, such as search (is a Perplexity acquisition on the horizon?).
Other Competitors
For smaller competitors with a high density of talent, 2025 is likely to be a consolidation year. Companies like Mistral may face acquisitions or strategic pivots as the market matures and competition intensifies.
Finally, I need to confess that I am a bit lost about Amazon’s AI strategy. Yes, they released their family of models - which are decently good gen3 and tiny models-, and yes, they have invested over $8B in Anthropic, but I don’t know the exact direction they are taking. With its Capex being the highest among all hyperscalers, my best guess is that they are betting on the commoditization of the foundational layer - thus would benefit from the AI-stagnation scenario- and an increase on inference from production applications that could run in their datacenters where many clients have already their cloud expenditure.
I hope Amazon also decides to revamp Alexa, right now it is a beautiful product with the IQ of a potato.
Let’s talk application layer now
We've barely scratched the surface of what current AI can do today. Even if we stopped training any new foundational model, we have years of innovation ahead. 2025 will be a defining year for AI as we begin to witness the maturation of the application layer. The industry needs to show at last that the billions worth of investment have an economic purpose
The application layer is broad, encompassing everything from tools and user interfaces to industries like education and legal services. It’s where AI stops being just technology and starts solving real-world problems. In 2025, two themes will dominate this layer: process optimization and UI innovation.
Process Optimization: Better, Faster, Smarter
AI’s promise to enhance human productivity comes into sharp focus here. Whether through human-in-the-loop workflows (e.g., legal research and drafting documents) or fully automated processes (e.g., deploying code to production without human intervention), the goal is clear: take tasks humans do today and make them better or faster.
Low-risk applications will proliferate as businesses test and refine these systems. Mistakes in these areas are less critical thanks to built-in redundancies, making them ideal candidates for early adoption.
UI Innovation: From Invisible AI to Radical Experiments
We’re entering an era where interfaces will evolve dramatically. Some applications will make AI invisible, seamlessly integrating it into user experiences where the technology itself is secondary. Others will push boundaries, experimenting with social AI platforms and creative tools like canvas or generative art applications. This duality—practicality and bold experimentation—will define the next wave of user interaction.
In summary
2025 will test the resilience of scaling laws, as the industry grapples with whether increasing compute power and data availability can sustain the dramatic performance improvements seen in recent years. Meanwhile, the application layer will take center stage, focusing on process optimization and innovative interfaces to deliver real-world value. Low-risk automation will proliferate, and UI experiments will redefine how humans interact with AI. It’s a year where AI will move from abstract potential to practical, transformative use.
Looking Ahead
In the past twelve months, I’ve had conversations with machines that sometimes surpass human levels, witnessed building-sized rockets lifted by "chopsticks," and ridden in robo-taxis - not to speak about using satellite internet connection for the same price as fiber! Chips are being implanted into brains, and nuclear energy is resurging. In this whirlwind of progress, even water bottle lids have been improved, thanks to EU regulations aimed at reducing waste and improving sustainability. Jokes aside, AI is no longer a distant promise but a tangible force shaping our reality. The coming year promises to push these boundaries further, delivering breakthroughs we can barely imagine today. Can't wait to write the 2026 version of this post!
2024 Report Card: How Did I Do?
Time for some accountability. Last year I made some bold calls about AI's trajectory. Let's see if I earned my prediction stripes or if I need to hang up my fortune-telling hat...
Size Matters…
☝ Nailed this one. The progression in model capabilities matched expectations, with significant leaps in performance and efficiency (90% cost reduction!). Benchmarks once considered AGI-proof are now shattered. Today, we boast Ph.D.-level capabilities in physics, mathematics, coding, and more fields are falling rapidly. This evolution has also led to a shakeout in the market: “less-funded” (over a frickin billion dollars!) AGI labs are dissolving or struggling to keep pace (Inflection and Character), and the barrier to entry has skyrocketed. Dario Amodei’s remarks about the next generation requiring tens of billions underscore this reality.
Quality Over Quantity
☝ Nailed this one too. While the headlines focus on frontier model advancements, there is a quiet revolution in smaller models (under 70B parameters), achieving Gen3’s (early version of GPT4) initial performance levels (i.e., Nvidia’s 70B). This has been made possible through high-quality synthetic data (see: Phi models) and distillation techniques (big models training smaller ones). The "AlphaGo" moment I forecast—here is Karpathy’s explanation of what I mean—is not fully realized but is inching closer to OpenAI’s reasoning pipeline.
Multimodality Takes Off
➡️ Technically correct but with caveats. In 2024, we glimpsed the promise of true multimodality. Models can now speak (advanced mode) and see (e.g., screen share in Gemini or OpenAI). However, this progress often relies on modality transitions (e.g., audio transcription or image captioning). A critical next step will be UI and UX innovation to unlock multimodality's full potential—a challenge that 2025 will likely tackle.
Architecture Breakthroughs
➡️ Mixed results. Gains from unwobbling were substantial but overshadowed by size. Mamba, the architecture I predicted would dominate 2024 due to context-light management, has not gained traction. However, the trend toward increasing context length is undeniable and remains pivotal.
“If we dream into the distant future… we’ll have context length of several billion. You will feed in all of your information, all your history over time, and it will just get to know you better and better.” - Sam Altman in a Lex Fridman Interview
Post-Training Optimizations
☝ Absolutely nailed it. Let me quote myself:
"Among these optimizations, the most promising is allowing models more time to 'think.' A well-known method for this is Chain of Thought (CoT). CoT instructs the model to take its time, articulate its thought process, and solve the problem.
The results are extraordinary when combined with step-by-step verification (using another LLM to analyze each step). Leveraging current technology and applying variations of this concept, Google researchers have improved solutions for mathematical optimization problems previously unsolved (hinting at new beyond-human-level data for the model…you see the connection?)."
We’re now entering an era where optimizing outputs post-training yields massive gains. The synergy between reasoning and advanced training techniques is driving unparalleled leaps across benchmarks.