ARGUS: A Framework for Autonomous Blue Team Operations
The Speed Problem
There is a lot of noise right now with the agentic cybersecurity threat. Most of what you have read about AI-powered threats over the past six months have been accurate in its alarm and imprecise in its prescription. The threat is real. The response frameworks being offered are mostly not.
Before I get into ARGUS, I want to name the full problem space, because I think a lot of people are only seeing one piece of it. The defensive response to agentic threats has three lanes.
The first is supply chain and software intake hygiene - draconian scanning of every library, dependency, and application before it touches production. That is a global responsibility, not a security team function, and it is outside the scope of this paper (GLASSWING (https://www.anthropic.com/glasswing) was specifically designed for this purpose).
The second is proactive cyber: infrastructure hardening, zero trust architecture, segmentation - the foundational controls - plus continuous exposure discovery, AI-driven attack path mapping, continuous red teaming, and exploitability prioritization running at the speed of attacker reconnaissance (classically known as vulnerability management, then CTEM (thanks Gartner), the next iteration: AICTEM?).
The third is reactive cyber. Incident response. That is what this paper is about.
At WWT, these lanes find structure inside ARMOR. ARGUS is the operational implementation of ARMOR's Secure AI Operations domain. The third lane. It assumes something got through. The name is intentional. In Greek mythology, Argus Panoptes was the hundred-eyed giant whose job was to watch everything and never sleep. That is the operational model. It responds at machine speed. And it does so within a governance framework that makes autonomous action trustworthy rather than reckless.
The agentic threat is not a better phishing email. It is not smarter malware. It is a fundamental compression of the attack timeline. Discovery, exploitation, lateral movement, credential harvesting, data staging - a kill chain that used to take a sophisticated adversary days or weeks now executes in minutes. Not because the techniques changed. Because the orchestration did.
Here is the observation I keep making that almost nobody else seems to be: vulnerable does not mean exploitable. Everyone is panicking about discovery speed, and they are not wrong in doing so. But the real shift is what happens after discovery. Agentic offensive systems do not just find exposures. They triage them, score them against active exploitation data, identify viable attack paths, chain attacks together, and execute without a human analyst in the loop. The gap is not in the finding. The gap is in what happens next, and how fast it happens.
Human-paced incident response is structurally defeated by machine-speed offense. Not weakened. Not disadvantaged. Structurally defeated. A SOC running ticket-based workflows and sequential investigation steps cannot operate at the velocity this threat demands. That is not a criticism of the analysts. It is a physics problem. The answer is not to work faster. It is to build systems that operate at the right speed and know when to hand control to a human.
Why Existing Approaches Don't Solve This
SOAR has been the industry's answer to the speed problem for the better part of a decade. And look, it helped. I have built enough SOAR playbooks to know it moved the needle. But SOAR as it exists in most organizations today is a glorified ticketing system with API calls. It surfaces information to a human. The human decides. The human acts. Against a machine-speed adversary, that human is the bottleneck. Every approval step, every context-gathering exercise use seconds and minutes an agentic attacker does not need.
Every major security vendor is now selling something they call an autonomous SOC. I have sat through a lot of these demos in the past year, and I will say this: they are impressive. I do not want to take that away from them. But impressive in a demo and autonomous in production are two very different things. What most of these products actually do is AI-assisted investigation for human analysts. The machine does the legwork. The human makes the call. That is a meaningful improvement over where we were three years ago. It is not autonomous defense.
The distinction matters enormously. AI-assisted means a human approved the containment action. Autonomous means the system fired containment and handed a human the outcome. Those are different products with different risk profiles and different governance requirements. The market has not been precise about this, and honestly, I get why - the line is commercially inconvenient to draw. But if you are evaluating platforms right now, that distinction is the question you need answered before anything else.
The Design Philosophy: Jidoka
In 1924, Sakichi Toyoda invented a loom that stopped automatically when a thread broke. Not because stopping was the goal, but because letting a broken thread produce defective fabric was worse than pausing production. Toyota built an entire manufacturing philosophy on that principle. They called it Jidoka. Automation with a human touch. The machine handles detection and containment. Humans handle investigation and remediation. Taiichi Ohno added the Andon cord -- any worker could stop the line at any moment. Not just management. Any worker. Pulling the cord was not a failure. It was the system working as intended. Failing to pull it when something was wrong was the actual problem.
ARGUS is Jidoka applied to enterprise security operations. Autonomous containment at machine speed, with a mandatory human handoff when the threshold is exceeded. The Andon cord is an architectural requirement, not a safety feature you bolt on afterward.
Two historical failure modes define why the precondition architecture matters.
The first is cascade without circuit breakers. On May 6, 2010, the US stock market lost roughly a trillion dollars in value in 36 minutes. Automated trading algorithms reading the same signals started selling. Other automated systems witnessed the sell off, then sold as well. The feedback loop compressed faster than any human could intervene. The market stabilized only when the Chicago Mercantile Exchange's (CME) Stop Logic Functionality halted E-mini futures trading for five seconds - long enough to break the feedback loop, allow prices to reset, and let humans verify the data was real. Each individual system was doing exactly what it was designed to do. The problem was systemic. ARGUS firing containment actions across six domains simultaneously on a bad detection could look identical. The circuit breaker is not optional.
The second is autonomous action on false telemetry. In 1979, a relief valve at Three Mile Island stuck open. Automated systems reported it had closed. It had not. Every decision the operators made for hours was based on a monitoring layer reporting a false state. When your response system is acting on telemetry that misrepresents actual system state, the sophistication of the response logic is irrelevant. You are building a precise machine on a broken foundation.
Further reading: for the Flash Crash, the SEC/CFTC joint report "Findings Regarding the Market Events of May 6, 2010". For Three Mile Island, the Kemeny Commission Report remains the definitive account.
The Four Precondition Layers
Before a single autonomous action fires, four precondition layers must exist and be verified. These are not implementation details. They are the architectural foundation. Skip them and you have not built ARGUS. You have built a new failure mode.
Layer 1: Honest System State.
Every autonomous action requires independent outcome verification via a separate telemetry source. Not command acknowledgment. Confirmed result. EDR isolation confirmed via heartbeat loss. Token revocation confirmed via auth log. NAC block confirmed via traffic absence. The indicator that shows policy applied is not the same as the indicator that shows threat contained. Your autonomous system has to know the difference.
Layer 2: Asset Truth.
Security practitioners have been fighting with CMDBs for decades, and I have never seen an asset inventory that resembles close to perfect. It is always messy. Look at the CIS Controls - asset inventory has been a top priority since the beginning, and it remains one of the hardest problems to actually solve. ARMOR's Infrastructure Security domain calls it out the same way - device profiling is listed as a foundational control for exactly this reason.
Most enterprise CMDBs are political documents masquerading as technical ones. The people creating assets have no personal consequence for not documenting them correctly. You are not going to fix that with a project. So, stop trying to recreate the wheel and engineer around it. Use your EASM tooling, your agent-based telemetry, and the parts of the CMDB that are working and build the criticality mesh from there. Accept conservative defaults for anything unknown. The CMDB problem does not get solved. It gets architected for.
Layer 3: The Andon Cord.
An independent watchdog tier monitors ARGUS action patterns on a separate telemetry feed and halts the full response chain if the action pattern itself looks anomalous. The watchdog and the action tier cannot share the same data source -- that recreates the Three Mile Island problem at enterprise scale. Any analyst can stop the line. This is not a configurable option. It is a design requirement baked into the architecture from day one. It is also a GRC control, not just an architectural one - autonomous action authority without documented governance and a defined human override mechanism is a liability, not a capability. Your CISO and your legal team will want to see this on paper before anything fires autonomously.
Layer 4: Deception Infrastructure.
This is the ideal state precondition and the one most organizations have not designed for - and honestly, the one I am most excited about in this context. Honeytokens, deceptive credentials, honeypot endpoints placed deliberately across the environment serve two functions in ARGUS. First, they degrade the decision quality of agentic attackers operating at machine speed. An offensive system acting on poisoned reconnaissance does not just slow down. It makes bad decisions confidently. That is qualitatively different from delay - you are not buying time; you are corrupting the attacker's intelligence picture. Second, deception interaction is one of the cleanest high-confidence signals that exists. Nothing legitimate ever touches a honeytoken. A triggered deception asset maps directly into ARGUS confidence scoring and can justify elevated autonomous action thresholds on its own.
The goal is economic infeasibility. Make operating in your environment expensive enough that the cost-benefit calculus shifts. Deception is typically a Phase 2 or Phase 3 capability in sequencing, but it needs to be designed for - from day one. Retrofitting it is significantly harder than building toward it. Within ARMOR's Secure AI Operations domain, deception infrastructure remains one of the most underutilized capabilities available to defenders. That needs to change.
Operational Architecture
ARGUS operates on a four-step Jidoka model. Steps one and two are automated. Steps three and four are human.
Detect. Behavioral tripwire fires. Cross-domain signal correlation runs in parallel. Confidence score calculated against asset criticality, blast radius profile, and deception signal inputs. When detection fires, an async parallel enrichment engine fans out simultaneous API queries to every relevant source: EDR for endpoint context, identity provider for authentication history, CASB for recent file activity, network telemetry for connection patterns, threat intelligence for known-bad associations. Five seconds is a reasonable time budget. Take whatever has returned. Reason over incomplete information explicitly flagged as incomplete -- if network telemetry timed out, the confidence score adjusts downward and the action tier changes accordingly. Incomplete telemetry is an input to the decision, not a failure state to suppress.
Contain. Tiered autonomous action fires across six control plane domains. Scalpel before sledgehammer. Evidence captured before any destructive action. Outcome verified independently.
- Endpoint: process kill, file quarantine, memory dump, network segmentation, full isolation -- in that order. Full isolation is the sledgehammer. Exhaust the scalpels first.
- Network: DNS sinkhole for C2 cutoff, NAC block, VLAN shunt, BGP null route, forensic capture. Fastest lever, most reversible.
- Identity: step-up MFA, session kill, token revocation, account lockout, privileged access revocation. Most surgical lever. Most underutilized. Killing a session is less disruptive than isolating an endpoint and often equally effective. I cannot stress this one enough.
- Cloud: API key suspension, IAM policy quarantine, instance isolation, snapshot before any destructive action, key rotation. The snapshot requirement is non-negotiable. Autonomous cloud containment without evidence preservation is forensically destructive.
- Application: session invalidation, WAF rule injection, feature flag kill, rate limit enforcement. Least mature domain. Application layer actions frequently have no reliable confirmation signal, which means independent verification is hardest to satisfy here. This one keeps me up at night a little.
- Threat Intelligence and Feedback Loop: IOC propagation, detection threshold updates, playbook revision, pattern learning. Every engagement makes the system smarter. Autonomous action without a feedback loop is a static playbook with extra steps.
Investigate. Human handoff package delivered. Not just an alert. Confidence score, actions taken, outcomes confirmed, blast radius status, recommended next steps. Full context for a human who was not watching the event unfold.
Remediate. Human validates, eradicates, releases the environment. Threat intelligence feedback loop updates detection thresholds and playbooks.
Confidence Thresholds
Autonomous action is not binary. The single biggest mistake I see organizations make when thinking about this architecture is treating detection as a simple trigger -- threat detected equals action fires. That model produces self-inflicted outages. I have seen it. Blast radius and asset criticality must govern what fires autonomously and what requires human approval. The table below is a starting point. Every organization will tune these thresholds based on their environment and their risk tolerance.
Tactical Reality: What This Actually Requires
Buckle in.
The integration problem is brutal. Every action domain requires API access to a different system, owned by a different team, running a different vendor, with a different API maturity level. Some of those APIs are excellent. Some are garbage. And some have rate limits that will absolutely choke you during an actual incident when you need them most -- which I promise you will find out at the worst possible time. The concept is clean. The plumbing is an engineering project that will take over a year before a single autonomous action fires.
The org chart will fight you. ARGUS requires the SOC to have authority to fire containment actions against assets owned by IT, cloud, network, identity, and application teams. In most enterprises, those teams do not report to the CISO. I want to be direct about this: cross-domain autonomous containment authority is a political problem before it is a technical one. You need executive alignment at the CIO and CISO level before this operates at anything beyond lab scale. No amount of good architecture fixes a bad org chart.
The phased path.
Now: Executive sponsorship. Asset criticality mapping even at spreadsheet fidelity. Detection review specifically for speed. Tabletop an agentic attack scenario against your current IR process. Find where it breaks before the attacker does.
Phase 1: Identity response. Most mature API surface, lowest blast radius, fastest time to value. Step-up MFA and session kill on high-confidence behavioral detections. Achievable in months, not years.
Phase 2: Endpoint containment for non-critical assets plus deception infrastructure design. The technology exists. The policy framework is what needs to be built.
Phase 3: Full six-domain control plane. Elevated thresholds. Mandatory human approval for critical asset classes. Independent verification across all domains.
The vendor landscape in three years. CrowdStrike, SentinelOne, Palo Alto, and Splunk will all ship something they call autonomous SOC capability. The marketing will be loud - louder than it already is, which is saying something. The capability will be real but narrow - probably endpoint plus identity in high-confidence scenarios for customers who have done the integration work.
None of them will ship the governance layer. None of them will solve the cross-domain authority problem for you. And none of them will walk into your environment and tell you that their autonomous action tier needs an independent watchdog monitoring it on a separate telemetry feed. That conversation is not in their sales deck.
ARGUS is not a product. It is an architecture decision and a governance framework. The vendors are building the engine. This paper is the roadworthy certification the engine has to pass before you put it on a public highway. That distinction matters. And the window to establish it closes in roughly three years.
Closing Thought
Sakichi Toyoda built a loom that stopped when something went wrong. Not because stopping was the point. Because letting defects propagate downstream was worse than pausing. Every operator could pull the cord. The system got better from every failure.
The agentic threat era needs that same philosophy applied to enterprise security operations. Machines that detect and contain at machine speed. Humans who investigate and fix with full context. A cord that any operator can pull to stop the system. A system that learns from every engagement. And deception infrastructure that makes operating in your environment expensive enough to not be worth it.
That is ARGUS. That is the work.
One last thing. ARGUS does not exist in a vacuum. At WWT, the broader AI security conversation lives inside ARMOR - our AI Readiness Model for Operational Resilience. ARGUS is the tactical implementation of ARMOR's Secure AI Operations domain. If you are running an ARMOR assessment and asking what Secure AI Operations looks like when you actually build it, this is your answer. If you are not familiar with ARMOR yet, you can start here.