2026.02.2010 min

Three Routes / The End of the Unified Narrative

The 'universal model' narrative is fracturing. Three companies stopped taking the same exam — and the fracture itself is the most important signal.

Two weeks. Three companies. Four models.

Your feed just exploded again. The replies are screaming "everything changed." But the real signal this time isn't who scored higher — it's that these three companies stopped taking the same exam.

We are witnessing a civilizational fork: the three most powerful technological forces in human history, after sprinting down the same track for two years, have suddenly veered off in three different directions. This isn't a product strategy divergence. It's three fundamentally different diagnoses of the question: "Where, exactly, is the bottleneck of intelligence?"

The "universal model" narrative is fracturing. And the fracture itself is the most important signal.

---

1. Anthropic: Skip the Benchmarks, Ship the Workflow

February 5. Opus 4.6. The headline isn't the model itself — it's Agent Teams: orchestrated AI squads inside Claude Code that auto-decompose tasks, assign roles, and coordinate execution. Four architectures: Leader, Swarm, Pipeline, Watchdog.

The model-level numbers are there too: ARC-AGI-2 leapt from 37.6% to 68.8%, a 31-percentage-point single-generation jump — the largest in Opus family history. Million-token context retrieval accuracy hit 76.0%, while Gemini 3 Pro scored 26.3% on the same test.

But pure coding (Terminal-Bench 2.0) came in at 65.4%, below GPT-5.3 Codex's 77.3%. Anthropic's edge isn't code generation speed — it's understanding complex business logic.

The business signals are louder than the benchmarks: Claude Code hit $1B ARR in six months, doubled past $2.5B by February. Roughly 4% of public GitHub commits now come from Claude Code.

Anthropic's bet: The bottleneck to AI commercialization isn't model capability — it's workflow orchestration.

---

2. OpenAI: Custom Silicon, Maximum Speed

February 12. GPT-5.3 Codex Spark. OpenAI's first production model that doesn't run on Nvidia hardware — fully deployed on Cerebras wafer-scale engines. Inference at 1,000+ tok/s, 15x conventional GPU speed.

But speed has a cost. Terminal-Bench 2.0: 58.4% (full Codex: 77.3%). Context window shrunk to 128K. Text-only.

The strategy matters more than the product. Behind it: a multi-year Cerebras deal (up to 750MW compute), plus AMD (6GW agreement) and Broadcom (custom accelerators). The industry is moving from Nvidia monoculture to chip diversification.

OpenAI's bet: The bottleneck isn't reasoning depth — it's latency.

This is an uncomfortable judgment. It implies that for the vast majority of users, models are already smart enough — the problem is they respond too slowly and cost too much.

---

3. Google: Forget Code Wars, Go Solve What Humans Can't

Google threw a one-two punch in two weeks. February 12: Deep Think reasoning mode upgrade. February 19: Gemini 3.1 Pro, injecting that reasoning power into a general-purpose model.

Deep Think runs a "System 2" architecture — instead of token-level prediction, it pauses, evaluates multiple logic paths, then answers. ARC-AGI-2: 84.6%, leading second place by 15.8 percentage points.

But yesterday's 3.1 Pro is the bigger chess move: injecting Deep Think's reasoning into a general-purpose model. Leading on 13 of 16 benchmarks. The killer number is price: input $2 / output $12 per million tokens — 1/7th of Opus 4.6.

Google's bet: The bottleneck isn't workflow — it's reasoning depth. And the play is to make that depth infrastructure-cheap.

---

4. Three Diagnoses — They Can't All Be Right, but More Than One Might Be

Let's place the three bets side by side:

- Anthropic says: the bottleneck is orchestration.

- OpenAI says: the bottleneck is speed.

- Google says: the bottleneck is reasoning depth.

For two years, these three companies chased each other on the same set of benchmarks. Now they've effectively abandoned head-to-head comparison. This isn't accidental differentiation marketing. Each company has diagnosed a different bottleneck in AI commercialization.

This alone is worth sitting with. It means the concept of the "universal model" — one model strongest on all dimensions — is shifting from consensus to illusion.

---

5. Chip Diversification: A Long Bet with Short-Term Pain

Spark proved non-Nvidia silicon can deliver extreme inference speed, but reasoning took a 25% hit. The real test: can Cerebras maintain the speed edge while closing the reasoning gap?

There's a bigger question here: for the past decade, Nvidia's CUDA ecosystem has been the shared infrastructure of the entire AI industry. Breaking that single dependency will inevitably introduce fragmentation costs in the short term. But in the long run, no healthy industry should be built on a single supplier.

---

6. The Agent War Goes Live

Anthropic's Agent Teams. Google 3.1 Pro scoring 33.5% on APEX-Agents. OpenAI's concurrent enterprise Agent platform Frontier. All three are betting on agents, but through entirely different entry points.

An agent is not an upgraded chatbot — it's a redefinition of how work gets done. It means AI is no longer a tool you "use" but an employee you "deploy."

---

7. After the Fork, What Should You Do?

You might be thinking: three companies, three diverging routes — which one do I bet on?

The question itself is wrong.

You don't need to bet. You need to understand what the divergence itself means.

It means AI capability has expanded to the point where no single dimension produces an absolute winner. It means tool selection is no longer "pick the strongest" — it's "pick the best fit for your use case."

Competitive divergence means you're in a buyer's market. Three companies sprinting in three directions, and the ultimate beneficiary is whoever understands their own needs well enough to pick the right tool.

Don't try to keep up with every model. Keep up with your own workflow demands. That's enough.

We opened by saying the "universal model" narrative is fracturing. But look at it from another angle: fracture isn't necessarily bad news. The end of a unified narrative is precisely the beginning of individual choice.

Don't Panic. Accelerate.

← PreviousCapability Convergence / When All Code Is Perfect, the Real War Begins Next →Verification Loops / Blade Runner 2049 Through the Intelligence Lens

← All Transmissions