Blog • June 5, 2026 • 7 minute read

With Mythos, Finding is Solved. Remediation is the Race.

Fifty-five days after the Mythos announcement, the threat landscape has changed and so must our response. Anthropic, OpenAI and Microsoft have shipped frontier cyber capability. A fraction of what they have found has been patched. Here is how security leaders should move now.

In this blog

Fifty-five days is a long time in cybersecurity

It has been long enough for Anthropic's Project Glasswing partners to surface vulnerabilities in widely used software. Long enough for OpenAI to announce their vision, and for Microsoft to land MDASH at the top of the CyberGym leaderboard with an 88.45% score on 1,507 real-world vulnerability tasks — and 16 new Windows CVEs in a single Patch Tuesday. And long enough to make one thing painfully clear: only a small fraction of what these systems have found has actually been patched.

Anthropic states that across 1000+ open-source projects, they found 1,094 confirmed high/critical severity vulnerabilities, but only 75 have been patched (65 with public advisories) at the time of the blog.

That last statistic is the whole story.

For two months, the security industry has watched frontier AI find vulnerabilities at machine speed. AI capability is changing operating models from vulnerability management to defensive operations. Headlines focused on capability. Finding the bugs is no longer hard. Validating them is partial work — high accuracy, but overwhelming volume. Fixing them is where defense actually happens, and that is where defenders are losing ground.

The 2026 Verizon Data Breach Investigations Report confirms: "Exploitation of vulnerabilities is now the most common initial access vector for breaches. It has risen to 31% in this year's reporting dataset, while credential abuse—the previous leader—is down to 13%. Only 26% of critical vulnerabilities— defined as being in the Cybersecurity Infrastructure and Security Agency Known Exploited Vulnerabilities (CISA KEV) catalog—were fully remediated by organizations in 2025, a drop from the previous year's 38%. The median time for full resolution went up to 43 days, almost two weeks more than the previous year's 32 days. In the median case, organizations had 50% more critical vulnerabilities to patch in this year's reporting dataset compared to the previous year."

The harness is the product

The most important insight from the past 55 days is not about the models. It is about everything around them.

Microsoft's MDASH did not get those impressive CyberGym benchmarks because it had a new model; it received that acclaim because it orchestrates more than 100 specialized AI agents through a structured pipeline — preparing code, scanning, debating findings, validating, deduplicating, and producing proofs. Microsoft's own framing is that the durable advantage lies in the agentic system around the model rather than any single model itself. This matters for budget; models will change every quarter and scaffolding, pipelines, context, controls and remediation workflows will compound over time. Every model release cycle, the underlying capability changes — sometimes dramatically. The orchestration around it does not. Investments in scaffolding, prioritization, and remediation pipelines compound. Investments in any single model do not.

Five shifts to make now

WWT's Global Cyber Security Practice has spent the past 55 days inside hundreds of customer engagements. The pattern is consistent. Five shifts separate the security programs that are absorbing this moment from the ones that are still arguing about it.

First, security teams need to start evaluating harnesses to complement model selection and the operating systems around the models. Build pipelines, plugins, and validation layers your team controls. The model under them will change every quarter.

Second, move from periodic scans to continuous validation. The Gartner Continuous Threat Exposure Management cycle — scope, discover, prioritize, validate, mobilize — is a daily practice, not a quarterly project. The new question is not 'does this vulnerability exist?' It is 'can it be used against us, right now?'

Third, treat patching as fleet hygiene capability, not a scheduled process. Each layer you can stand up buys time for the next. You may not be able to do a clean source rebuild in 72 hours — but when a weakness is identified, apply whatever protections you can to buy time for that rebuild. The goal is to prevent the exploit window from collapsing.

Fourth, move from prioritization actions from CVSS to the lens of exploitability combined with reachability. The vast majority were unreachable, unexploitable, or already mitigated by existing controls. Raw CVSS prioritization in this environment is noise, not strategy. Score every finding by reachability, exploitability, compensating controls, and business impact.

Fifth, move human expertise upstream. AI creates more need for expert triage, not less. Hallucinations, drift, and inconsistent refusals all remain persistent issues. The human's targeted job-to-be-done is shifting from running the loop to tuning the harness, setting risk thresholds and authorizing actions. Humans authorize the consequential calls. Humans determine if automation or AI is best suited for the task at hand. AI executes and proves.

A 90-day playbook

The first 30 days are about discipline. Get a defensible priority-zero triage framework into the SOC that replace hours of guessing. Run a patch-harness gap assessment by layer. Establish asset and SBOM visibility baselines. Use this time to deploy known prevention mechanisms and operate with proof.

The next 30 days are about scaffolding. Pilot AI-augmented vulnerability management with the right pre- and post-processing layers. Bring reachability scoring into production.

The final 30 days are about scale. Embed continuous validation in release pipelines. Stand up closed-loop containment on tier-one assets. And shift the board conversation entirely: replace finding-volume metrics with remediation-velocity metrics that speak to risk, not activity.

Concretely, that means reporting on a different set of numbers. Instead of counting how many findings the scanners produced, track how fast and how completely you reduce real exposure:

Time to validate exploitability: How quickly a finding is confirmed as reachable and exploitable, not just present.
Time to containment: How fast a confirmed exposure is isolated or blocked.
Time to compensating control: How quickly an interim mitigation is in place when a full fix isn't yet possible.
Time to full remediation: How long until the underlying issue is resolved.
Remediation velocity: The throughput of validated fixes over time, not the size of the backlog.
% of critical assets with closed-loop: The share of tier-one assets covered by validation-to-containment-to-remediation automation.
Exposure reduction: The net decrease in real, validated risk across the environment.
Exposure reduction by business-critical system: The same, broken out by the systems the business depends on.

90-Day actions

Phase	Focus	Key activities
Days 1–30	Discipline	Establish priority-zero triage Define exploitability / reachability scoring Assess patch-harness gaps by layer Establish asset and SBOM visibility baselines Map compensating controls Identify tier-one assets
Days 31–60	Scaffolding	Pilot AI-augmented vulnerability management Deploy pre- and post-processing layers Bring reachability scoring into production Connect validation outputs to remediation workflows Test compensating control playbooks
Days 61–90	Scale	Embed continuous validation in release pipelines Stand up closed-loop containment for tier-one assets Operationalize remediation velocity metrics Brief the board on exposure reduction, not finding volume Scale the operating model beyond one team or environment

Get the full playbook nowDefending at the Speed of AI

Defending at the speed of AI

The capability is settled. The response is not.

For three decades, defenders built programs around the assumption that finding vulnerabilities was the hard part. That assumption is broken. The hard part now is everything between discovery and a working fix in production — validation, prioritization, triage, change management, patch management (distribution to deployment) and source rebuilds. The discipline is not new. The pace is.

World Wide Technology is built for this moment. From the Frontier AI Defense Accelerator that diagnoses where your program sits today, to the Advanced Technology Center where new architectures get proven before they ship, to Mythos-Ready service offerings across continuous exposure, secure AI systems, and detect-and-contain — we bring strategy, execution and proving ground all together.