Blog • April 21, 2026 • 7 minute read

NVIDIA Just Gave AI Agents Two Things They've Always Needed

NVIDIA GTC 2026 in San Jose was, to say the least, a whirlwind. If last year's conference was about the economics of tokens, this year's was about something bigger: the moment AI agents stop being a promise and start being infrastructure. Two announcements made that clear. Most people in the room were focused on the hardware specs. But the real story is what these two releases mean together, and why the window to get ahead is narrowing fast.

In this blog

What NVIDIA announced (and why it matters)

The first announcement is NVIDIA NemoClaw, an open-source stack for OpenClaw that installs in a single command, adding policy-based privacy and security controls to run secure, always-on AI agents anywhere. Jensen Huang described it simply: "the operating system of agentic computers." That framing is deliberate. Just like Windows gave every business a standard way to run software, NemoClaw gives every enterprise a standard way to run AI agents safely, at scale, with controls your legal and compliance teams will actually accept.

NemoClaw includes security sandboxes, policy engines, and privacy modes that keep data local when it needs to stay local. Consider naming NVIDIA OpenShell™ as the component providing sandboxing and policy-based guardrails — it is a key architectural element of NemoClaw that is missing from this paragraph. Developers can define exactly what agents are allowed to do—and what they're not. That last part has been missing from every enterprise AI conversation I've had for the past two years. Seventeen major platforms have already adopted the stack, including Adobe, Salesforce, SAP, ServiceNow and Siemens. That's not a pilot. That's a supply chain forming.

The second announcement is the NVIDIA Vera Rubin + NVIDIA Groq 3 LPU architecture, which is about raw physics.

NVIDIA Vera Rubin is a rack-scale AI platform engineered for agentic AI and reasoning, delivering 10x lower cost per token versus the NVIDIA Blackwell architecture. NVIDIA Groq 3 LPU is a purpose-built AI inference accelerator with 150 TB/s on-chip SRAM bandwidth, delivering up to 35x higher throughput per megawatt for token generation.

With current AI inference for agents, it's too slow. Traditional inference runs around 100 tokens per second, which is fine when a human is reading the output. But AI agents don't talk to humans. They talk to each other, run multi-step plans, and manage long conversations in parallel. At 100 tokens per second, you're not building autonomous systems—you're building a bottleneck.

The Groq 3 LPU, born from NVIDIA's $20 billion licensing and talent agreement with Groq, takes a completely different approach. Instead of relying on traditional high-bandwidth memory, it uses 500 MB of on-chip SRAM per die. The result: 150 terabytes per second of memory bandwidth per chip. That's seven times faster than NVIDIA's own Rubin GPU at 22 TB/s. A rack of 256 LPUs targets up to 300 tokens per second for agentic workloads—and the whole system delivers 35x higher inference throughput per megawatt versus NVIDIA Blackwell NVL72 for decode-heavy workloads.

Tokens per second per watt. I wrote about this metric at last year's GTC. It just became the most important number in enterprise AI.

Why these two together create an inflection point

Here's what the market has been missing. For AI agents to work in the real world, you need two things simultaneously:

The speed to act. Agents that wait seconds between steps aren't autonomous—they're just slow chatbots. The NVIDIA Vera Rubin + NVIDIA Groq 3 LPU architecture removes that ceiling. When your infrastructure can process 300 tokens per second at a fraction of the energy cost, multi-agent workflows stop being a research project and start being a product decision.

The control to deploy. Speed without governance is a liability. NVIDIA NemoClaw solves the problem that has kept most enterprises in pilot mode: "We don't know what the agent will do." With policy engines, sandboxes, and defined operational boundaries, that answer changes. You know what it will do. You decided.

Most organizations have been waiting for one of these pieces to arrive. NVIDIA just delivered both in the same week.

Jensen Huang called it an inflection point. I believe him, not because of the hardware specs, but because the combination of software governance and inference throughput removes the last two structural excuses for not deploying agents in production.

The opportunity is asymmetric

The math is starting to work in ways it didn't 18 months ago.

When inference costs drop by 35x per megawatt, use cases that were too expensive last year suddenly make sense. Customer service, compliance monitoring, supply chain coordination, clinical documentation—these aren't just cheaper to run, they're fundamentally different to design. You can build agents that operate continuously, across systems, without human handoffs. That's not an incremental improvement. It's a different business model.

Companies that move now have a real advantage: they'll be building on production infrastructure while competitors are still debating pilots. The 17 platform partners that adopted the NVIDIA Agent Toolkit aren't waiting to see how this plays out. They're building moats.

The risks are just as real

I don't want to oversell this.

The governance work is on you. NVIDIA NemoClaw gives you the tools to control agents, but the hard part—defining what agents should and shouldn't do, mapping them to your business processes, auditing their outputs—that's organizational work, not a software install. Most companies don't have the internal clarity yet to answer the question: "What decisions are we comfortable letting an agent make?" You need to answer that before you build, not after.

The infrastructure investment is significant. The NVIDIA Vera Rubin platform ships later in 2026. Getting to 35x inference throughput per megawatt isn't a cloud configuration change—it's a capital commitment with a multi-year horizon. And the technology is still moving. What you design for today needs to be flexible enough to adapt as the stack evolves.

Most organizations still don't have the baseline. I said it last year: you can't build a token strategy if you don't know how many tokens you're using. The same applies here. Before you invest in agentic infrastructure, you need to understand your current agent surface area—what workflows could actually benefit, what data is involved, and what governance gaps exist today.

Prudent steps forward

If you're a business or technology leader trying to translate this into action, here's where to start:

Map your highest-value agent use cases now. Not hypothetically—specifically. Which workflows involve repetitive, rule-based decisions with known data sources? Those are your first candidates. Prioritize by business impact and data sensitivity. The governance conversation is much easier when it's grounded in a real use case rather than a theoretical one.

Treat agent governance as infrastructure, not policy. NemoClaw exists because NVIDIA's customers demanded it. The same demand should exist internally. Before you approve any agent deployment, your organization should have a clear answer to: who defines the boundaries, how they are enforced, and who audits the outcomes. Build that capability now while the scale is manageable.

Get into the ecosystem before it closes. Salesforce, SAP and ServiceNow have already integrated the NVIDIA Agent Toolkit. If your enterprise runs on any of these platforms, agentic capabilities are coming whether you plan for them or not. The organizations that have already thought through their governance and use case strategy will be able to capture value immediately. Those who haven't will spend the first six months catching up.

Start planning for trillion-parameter inference. The NVIDIA Vera Rubin + Groq 3 LPU architecture is designed for models at a scale most enterprises haven't needed to think about yet. But that changes when agents start orchestrating other agents. Your data, networking and compute architecture needs to be designed for that future—not retrofitted into it.

The bottom line

AI agents have been stuck in pilot mode for two years, not because the models weren't good enough, but because the infrastructure was too slow and the governance was nonexistent. NVIDIA NemoClaw and the NVIDIA Rubin + NVIDIA Groq 3 LPU architecture remove both of those barriers in a single product cycle.

The inflection point Jensen Huang talked about isn't coming. It arrived in San Jose at GTC.

The question isn't whether your organization will run AI agents at scale. It's whether you'll be ready when the window opens or scrambling to catch up after it does.

What's your organization's plan for agentic AI? I'd love to hear where you are in the journey.