Fifty-five days is a long time in cybersecurity

It has been long enough for Anthropic's Project Glasswing partners to surface vulnerabilities in widely used software. Long enough for OpenAI to announce their vision, and for Microsoft to land MDASH at the top of the CyberGym leaderboard with an 88.45% score on 1,507 real-world vulnerability tasks — and 16 new Windows CVEs in a single Patch Tuesday. And long enough to make one thing painfully clear: only a small fraction of what these systems have found has actually been patched

Anthropic states that across 1000+ open-source projects, they found 1,094 confirmed high/critical severity vulnerabilities, but only 75 have been patched (65 with public advisories) at the time of the blog.   

That last statistic is the whole story. 

For two months, the security industry has watched frontier AI find vulnerabilities at machine speed. AI capability is changing operating models from vulnerability management to defensive operations. Headlines focused on capability. Finding the bugs is no longer hard. Validating them is partial work — high accuracy, but overwhelming volume. Fixing them is where defense actually happens, and that is where defenders are losing ground. 

The 2026 Verizon Data Breach Investigations Report confirms: "Exploitation of vulnerabilities is now the most common initial access vector for breaches. It has risen to 31% in this year's reporting dataset, while credential abuse—the previous leader—is down to 13%. Only 26% of critical vulnerabilities— defined as being in the Cybersecurity Infrastructure and Security Agency Known Exploited Vulnerabilities (CISA KEV) catalog—were fully remediated by organizations in 2025, a drop from the previous year's 38%. The median time for full resolution went up to 43 days, almost two weeks more than the previous year's 32 days. In the median case, organizations had 50% more critical vulnerabilities to patch in this year's reporting dataset compared to the previous year." 

The harness is the product 

The most important insight from the past 55 days is not about the models. It is about everything around them. 

Microsoft's MDASH did not get those impressive CyberGym benchmarks because it had a new model; it received that acclaim because it orchestrates more than 100 specialized AI agents through a structured pipeline — preparing code, scanning, debating findings, validating, deduplicating, and producing proofs. Microsoft's own framing is that the durable advantage lies in the agentic system around the model rather than any single model itself. This matters for budget; models will change every quarter and scaffolding, pipelines, context, controls and remediation workflows will compound over time. Every model release cycle, the underlying capability changes — sometimes dramatically. The orchestration around it does not. Investments in scaffolding, prioritization, and remediation pipelines compound. Investments in any single model do not. 

Five shifts to make now 

WWT's Global Cyber Security Practice has spent the past 55 days inside hundreds of customer engagements. The pattern is consistent. Five shifts separate the security programs that are absorbing this moment from the ones that are still arguing about it. 

First, security teams need to start evaluating harnesses to complement model selection and the operating systems around the models. Build pipelines, plugins, and validation layers your team controls. The model under them will change every quarter. 

Second, move from periodic scans to continuous validation. The Gartner Continuous Threat Exposure Management cycle — scope, discover, prioritize, validate, mobilize — is a daily practice, not a quarterly project. The new question is not 'does this vulnerability exist?' It is 'can it be used against us, right now?' 

Third, treat patching as fleet hygiene capability, not a scheduled process. Each layer you can stand up buys time for the next. You may not be able to do a clean source rebuild in 72 hours — but when a weakness is identified, apply whatever protections you can to buy time for that rebuild. The goal is to prevent the exploit window from collapsing. 

Fourth, move from prioritization actions from CVSS to the lens of exploitability combined with reachability. The vast majority were unreachable, unexploitable, or already mitigated by existing controls. Raw CVSS prioritization in this environment is noise, not strategy. Score every finding by reachability, exploitability, compensating controls, and business impact. 

Fifth, move human expertise upstream. AI creates more need for expert triage, not less. Hallucinations, drift, and inconsistent refusals all remain persistent issues. The human's targeted job-to-be-done is shifting from running the loop to tuning the harness, setting risk thresholds and authorizing actions. Humans authorize the consequential calls. Humans determine if automation or AI is best suited for the task at hand. AI executes and proves. 

A 90-day playbook 

The first 30 days are about discipline. Get a defensible priority-zero triage framework into the SOC that replace hours of guessing. Run a patch-harness gap assessment by layer. Establish asset and SBOM visibility baselines. Use this time to deploy known prevention mechanisms and operate with proof. 

The next 30 days are about scaffolding. Pilot AI-augmented vulnerability management with the right pre- and post-processing layers. Bring reachability scoring into production. 

The final 30 days are about scale. Embed continuous validation in release pipelines. Stand up closed-loop containment on tier-one assets. And shift the board conversation entirely: replace finding-volume metrics with remediation-velocity metrics that speak to risk, not activity. 

Concretely, that means reporting on a different set of numbers. Instead of counting how many findings the scanners produced, track how fast and how completely you reduce real exposure: 

  • Time to validate exploitability: How quickly a finding is confirmed as reachable and exploitable, not just present.
  • Time to containment: How fast a confirmed exposure is isolated or blocked.
  • Time to compensating control: How quickly an interim mitigation is in place when a full fix isn't yet possible.
  • Time to full remediation: How long until the underlying issue is resolved.
  • Remediation velocity: The throughput of validated fixes over time, not the size of the backlog.
  • % of critical assets with closed-loop: The share of tier-one assets covered by validation-to-containment-to-remediation automation.
  • Exposure reduction: The net decrease in real, validated risk across the environment.
  • Exposure reduction by business-critical system: The same, broken out by the systems the business depends on.

90-Day actions

Phase Focus Key activities 
Days 1–30 Discipline 

Establish priority-zero triage 

Define exploitability / reachability scoring 

Assess patch-harness gaps by layer 

Establish asset and SBOM visibility baselines 

Map compensating controls 

Identify tier-one assets 

Days 31–60    Scaffolding 

Pilot AI-augmented vulnerability management 

Deploy pre- and post-processing layers 

Bring reachability scoring into production 

Connect validation outputs to remediation workflows 

Test compensating control playbooks 

Days 61–90 Scale 

Embed continuous validation in release pipelines 

Stand up closed-loop containment for tier-one assets 

Operationalize remediation velocity metrics 

Brief the board on exposure reduction, not finding volume 

Scale the operating model beyond one team or environment 

Get the full playbook nowDefending at the Speed of AI

Defending at the speed of AI 

The capability is settled. The response is not. 

For three decades, defenders built programs around the assumption that finding vulnerabilities was the hard part. That assumption is broken. The hard part now is everything between discovery and a working fix in production — validation, prioritization, triage, change management, patch management (distribution to deployment) and source rebuilds. The discipline is not new. The pace is. 

World Wide Technology is built for this moment. From the Frontier AI Defense Accelerator that diagnoses where your program sits today, to the Advanced Technology Center where new architectures get proven before they ship, to Mythos-Ready service offerings across continuous exposure, secure AI systems, and detect-and-contain — we bring strategy, execution and proving ground all together.