I was in San Francisco for the Build 2026 opening keynote. 

Here's what Microsoft actually announced, and what I think it means for the rest of us who make this stuff real:

Every developer keynote has a thesis, and you can usually spot it by how often the speaker repeats one phrase until it stops sounding like English. This year, Satya Nadella's phrase was "unmetered intelligence." By the third mention, I'd accepted it. By the tenth, I was a believer. Somewhere around the fifteenth, I started mentally pricing out a Surface.

The bigger idea underneath it: stop consuming a frontier model and start participating at the frontier. It's a statement about behavior, not procurement.

The edge got greedy (in a good way)

Microsoft's opening move was to point at all the silicon already sitting on our desks and ask why most of it is busy rendering Teams notifications. So they expanded Windows AI APIs to run onboard models across far more hardware, and shipped two new local models: Aion 1.0 Instruct, a small, efficient model for everyday text work, and Aion 1.0 Plan, a 14-billion-parameter reasoning and tool-calling model that ships in-box. The headline isn't the models. It's "local agentic loop." You can now hand a model tools and let it run autonomously on the machine, with no round trip to the cloud and no meter running. Hence: unmetered.

Then the hardware parade: AMD Ryzen, Intel's Panther Lake, Qualcomm's Snapdragon X2 Elite and the sub-$500 Snapdragon C, and the one that got the room going, NVIDIA's RTX Spark™, a system-on-chip that fuses CPU, GPU and AI with unified memory. The first device on it is the Surface Laptop Ultra (128GB unified memory, 2,000-nit display, all-day battery, this fall). And because a fast laptop wasn't enough of a flex, the Surface RTX Spark Dev Box: one petaflop of AI compute, 128GB of memory, capable of running 120B-parameter models locally. Satya called it the dream machine, then admitted he's on the waitlist for it. Always reassuring when the CEO can't get his own product.

The genuinely wild bit is that Windows is coming to NVIDIA's DGX Station™, a "desktop data center" Satya said can run a trillion-parameter model locally, roughly the supercomputer they used to train GPT-2.5/3. So the machine that trained a frontier model a few years ago now sits next to your monitor and your sad little succulent.

What it means: Inference is moving back to the edge because the economics demand it. Once a real share of your tokens are free and local, the build-vs-buy math changes for every software vendor, and "your data never leaves the device" becomes a sentence that closes deals.

The developer experience demo was a quiet love letter to anyone who's spent a decade toggling between operating systems: a distraction-free, dark-mode-by-default Windows, an Intelligent Terminal with Copilot built in, native Linux command-line utilities, native Starship and Homebrew, and first-class WSL containers. After years of "I use Windows but live in WSL," Microsoft has decided to stop fighting it and start catering to it. Smart.

The cloud got bigger, drier and weirder

Up in the cloud, the equation Satya kept returning to was tokens per dollar per watt: electrons in one end, tokens out the other. Azure now spans 500-plus data centers across 80 regions, having added more capacity in the last 18 months than in its entire first decade. The new flagship design, Fairwater, is a two-story "AI super-factory" across Georgia and Wisconsin whose closed-loop cooling uses, and I quote, about as much water per year as a single restaurant. 

On silicon, Microsoft is hedging beautifully. One of the first clouds to validate the NVIDIA Vera Rubin NVL72, continued work with AMD, and their own Maia 200 accelerator (now in production in Iowa and Arizona, with more regions coming, and slated to power Microsoft 365 Copilot). They previewed Cobalt 200, their next-gen Arm CPU, and tellingly benchmarked it on agentic traces from GitHub Copilot rather than human workloads, posting 33% lower latency on agent calls. They also detailed Multipath Reliable Connection (MRC), an open networking protocol co-developed with AMD, Broadcom, Intel, OpenAI and NVIDIA, that lets giant synchronous workloads route around failures without expensive stalls.

The theme worth flagging: the CPU is back, because agents are impatient. NVIDIA founder and CEO Jensen Huang, who was beaming in from Taipei at an ungodly hour, put it plainly: Old CPUs were designed for humans, and humans are patient. Agents are not. They want answers now so they can fire off the next call, which is exactly what Vera Rubin was built for.

What it means: "AI infrastructure" is splitting into three workloads with different requirements: training, inference and the agent runtime. The old assumption that AI equals a pile of GPUs is already outdated, and the agent runtime is CPU-hungry in ways a lot of capacity plans aren't sized for.

A new platform, or: we reinvented the lanyard

Project Solara is Microsoft's swing at purpose-built, agent-first devices, a chip-to-cloud platform for what they call an agent-first world. The way that Microsoft framed it, the next computer isn't one device, it's a constellation of them working as one system, with your agent showing up wherever you need it.

Two reference designs: a desk device on a MediaTek SoC that signs you in as you walk up, and a portable badge on Qualcomm wearable silicon with fingerprint unlock and a camera. The badge demo, panning the camera across the room and telling Copilot to grab and send some shots, was a tidy glimpse of "a computer where there wasn't one before." They named Best Buy, Target, Levi's and others as early explorers, with healthcare as the marquee scenario.

The general response to the statement "we reimagined the access badge" felt like a raised eyebrow, but the argument holds up: plenty of workflows are a bad fit for a laptop or phone, and if your agent can ride along on a $40 wearable, you reach places computing never fit. Crucially, the pitch is an open, horizontal platform where you bring your own agent. We've all seen what happens when these ecosystems go vertical and single-vendor. An open one would be new.

What it means: Mostly "watch this space." These ecosystems live or die on developer and OEM pull, and this is an early look. Worth tracking, not yet worth budgeting for.

The intelligence layer is where the real work is

Here's where many in the room sat up, because this is the part that touches what most of us do.

Foundry now hosts 11,000-plus models, from OpenAI to Anthropic (Claude Opus 4.8 landed just last week) to Microsoft's own MAI family. But model choice was almost a throwaway. The real message: context is the hard problem, and the data tier has to be rebuilt for agents that continuously store, retrieve, reason, act and learn. So they shipped Horizon DB, a ground-up managed PostgreSQL service (roughly 3x the throughput of self-managed Postgres), a GPU-accelerated Fabric data warehouse posting 7x gains, and an "IQ" layer that ties it together: Web IQ for fresh, MCP-native web grounding, and Microsoft IQ stitching Foundry, Fabric and Microsoft 365 into one living model of your organization, split across the outside world, your operational data (Fabric IQ) and your people and policies (Work IQ).

The demo showcased the running of a grid-operations incident, which made it concrete: one question answered from the live web, the real-time state of the grid and the team's actual SharePoint playbook, with no stale uploads. When the procedure changes, the answer changes with it.

What it means: This is the unglamorous heart of enterprise AI. Models were always the easy part; grounding them in your data without a brittle pile of one-off integrations is the hard one. If the IQ layer delivers, the value moves from "which model" to "how good is your context plumbing."

Agents, and the small matter of letting them touch your files

If you're going to let an autonomous thing run code on your machine, you'd like guardrails. Enter Microsoft Execution Containers (MXC), a policy layer baked into the OS that enforces isolation from lightweight process-level containment up to full Windows 365 sandboxes. The point of putting it in the operating system is that containment holds regardless of who built the agent.

Which set up the demo everyone was excited about: OpenClaw on Windows. OpenClaw, the open-source agent that took off after launching last November, now runs natively on Windows with a WinUI 3 companion app, sandboxed by MXC, which is designed to mitigate the very real security challenges that OpenClaw presents. Microsoft demoed it by deliberately telling it to delete every file on the desktop. It tried. Repeatedly. With real conviction. The read-only sandbox swatted every attempt while the messy desktop on the demo machine looked on. Nothing got deleted, and the crowd loved it.

Then they brought out Peter Steinberger, the creator of OpenClaw, who made the real enterprise point between jokes: six months ago, that delete-everything command would have worked. The recent work has been about turning "an agent with access to everything" into "an agent with the access you grant it," with granular permissions, approvals and observability, so companies can finally say yes. He's launched an OpenClaw Foundation to keep it open and model-neutral, and made the harness pluggable, so you bring your own Copilot or Codex.

Microsoft also shipped Foundry-hosted agents as a managed runtime for long-running agents, a Fireworks AI partnership for open-weight models, and a new GitHub Copilot app with the speed of a CLI and the reach of an IDE, because nobody's cognitive load survives 100 open terminal sessions.

What it means: This is the most important enterprise thread of the keynote. The blocker for agents at work was never capability; it was containment and governance, and baking that into the OS is how an agent goes from fun demo to approved by the security review board. That's the line between shadow AI and sanctioned AI.

MAI, Frontier Tuning and your actual moat

Mustafa Suleyman delivered the line enterprises should tattoo somewhere visible: with frontier tuning, your reinforcement-learning environments become your moat.

Microsoft Frontier Tuning customizes the MAI models on your own data and workflows using RLEs (reinforcement learning environments), company-specific training gyms where a model hill-climbs on your tasks. The proof points were pointed: a MAI model tuned on Excel tasks now matches GPT-5.4 at 10x lower cost, and tuned on McKinsey's tasks it beat GPT-5.5 at the same efficiency. The kicker: unlike renting intelligence from a shared model that learns from everyone, only you keep the tuned model and the data behind it.

The demo (Tanaya Yadav) used the most gloriously mundane example imaginable: Land O'Lakes using an RL environment to perfect their butter reporting, hill-climbing past 90% accuracy at an estimated 10x efficiency. "Frontier Tuning as smooth as butter," she said, and honestly, with respect. They capped it with seven new models (across Image, Voice, Transcription, and Coding), which are on-par or beat recent generation models like NanoBanana2, GPT-5.4 and -5.5, Sonnet 4.6, Haiku 4.6, and Opus 4.6.

They also spent time talking about a Mayo Clinic partnership to build a healthcare frontier model trained on decades of real clinical practice, the kind of judgment that isn't in any journal the models have already read.

What it means: This is "participate at the frontier" made concrete, and the most strategically important idea of the day. The question shifts from "which model do we standardize on" to "what can we build that nobody can copy because it's trained on our own knowledge?" That expertise is the moat now.

And then they discovered new proteins and built a better qubit, as one does

To keep the rest of us humble about our own productivity, the keynote closed on Microsoft Discovery (now GA), an agentic loop for science that pairs models and HPC with scientific knowledge graphs and automated labs. The demo had Discovery design new proteins to recycle PET plastic, generate the DNA sequences and submit the job to a real automated lab at Cambridge Consultants. They called running a lab through a Copilot interface "like being Iron Man, but for chemistry."

Then Majorana 2, the next generation of Microsoft's topological quantum chip, was built on a material stack they used Discovery to help design. The numbers are something: qubit lifetimes of 20 seconds to a minute versus microseconds for other approaches (about 1,000x better than Majorana 1), one-microsecond operations, and a form factor that could fit a million qubits on a chip smaller than a credit card. Majorana 1 proved the physics; Majorana 2 starts the engineering scale-up.

What it means: Not much this quarter, but the meta-story matters: Microsoft is using its own discovery loop to speed up its hardest research, quantum included. AI that accelerates the next breakthrough, which makes the next AI better.

What I'm actually taking away

A few things that I think stand out and will be important to follow over the next few quarters:

  • Inference is moving to the edge, and that changes the economics and the privacy story for everyone building AI features.
  • "AI infrastructure" is now three workloads, not one, and the agent runtime is its own animal, hungry for CPU and low latency.
  • Context, not models, is the hard problem. The IQ layer is the least glamorous and most valuable part of the stack.
  • Governance is the unlock for agents. OS-level containment and granular permissions are how agents get approved for production rather than tolerated in a sandbox.
  • And the big one: participating at the frontier beats consuming it. The companies that win will be the ones who turn their own expertise into a tuned, defensible capability.

Satya closed with the two stories you can tell about this moment. One, where the technology concentrates power and leaves everyone else to absorb the consequences. The other, which opens up opportunities for developers, scientists and the communities around them. "Our job," he said, "is to make the second story true."

That's a good North Star. It's also a lot of work, which is conveniently what we do.

Let's go make a new world happen.

Technologies