RockCyber Musings

Five Eyes Agentic AI Guidance: Architecture, Not a Checklist

Rock Lambros — Tue, 12 May 2026 12:50:53 GMT

On May 1, 2026, six allied cyber agencies dropped 30 pages on agentic AI security, and the industry promptly reached for its highlighters. Twenty-three risks and more than a hundred best practices. The initial reflex is to map them to existing controls and call it a project plan.

WRONG!

CISA, NSA, ASD, NCSC-UK, NCSC-NZ, and the Cyber Centre published an architecture brief disguised as a guidance document. Read it that way, and the work changes.

The Misreading That’s Happening

Pick any board deck circulating right now, and I’ll bet the Five Eyes guidance shows up as a row in a control matrix (if at all). Privilege controls: check. Identity management: check. Logging: check. Someone in the room nods, the GRC team gets a tracking spreadsheet, and the agentic AI rollout continues at the same pace as before May 1.

That’s the failure mode. The document contains 23 distinct risks and over 100 individual best practices to address them. You don’t bolt 100 practices onto an existing platform without changing its shape...its architecture. Treating a system-level prescription as line-item compliance is how you end up with the audit-passes-but-the-thing-is-still-broken” pattern that plagues us to this day.

Read the document carefully, and the architectural intent is everywhere. Identity binds to privilege. Privilege binds to tool access. Tool access binds to logging. Logging binds to accountability. Each control assumes the others exist. Each one fails when built alone. The agencies named this directly when they recommended system-theoretic approaches like STPA and STPA-Sec, calling out that traditional component-level analysis is insufficient because risks emerge from interactions between components rather than isolated flaws.

That single paragraph is the operational thesis. The rest of the document describes how to build for it. A senior security practitioner, reading carefully, will recognize a familiar pattern, and this is what happens when policy folks finally accept you don’t write a check-box for emergent risk.

The question now is what production systems look like when somebody actually does the work. AAGATE is one answer, and we released it last November.

What the Document Actually Says

Strip the fluff, and the document organizes around five risk categories:

Privilege risk
Design and configuration flaws
Behavioral risk
Structural risk
Accountability risk

The categories aren’t mutually exclusive. They’re stacked dependencies.

Privilege risk is the foundation. The procurement-agent scenario in the guidance is a classic confused-deputy attack. An over-permissioned agent gets compromised through a low-risk tool, the attacker inherits the agent’s privileges, and modified contracts and approved payments slip past audit logs that look legitimate.

Design and configuration risk sits atop privilege. Static permission checks at startup don’t survive dynamic workflows. Allow lists go stale. Boundaries between agent enclaves erode under operational pressure. Behavioral risk piles onto that. Goal misalignment, specification gaming, deceptive behavior, and emergent capabilities all assume the agent has already been granted enough autonomy to act in surprising ways.

Structural risk is where it gets interesting. The agencies describe cascading failures across orchestration layers, tool integrations, third-party components, agent-to-agent communication, and shared data stores. A single rogue agent in a multi-agent system corrupts consensus, spreads incorrect information, alters logs, and propagates malicious plans peer-to-peer. None of this is fixable at the agent level alone.

Accountability risk closes the loop. Decisions made through long reasoning chains, stochastic outputs, and emergent multi-agent interactions are difficult to audit, attribute, or reproduce. The agencies reach for cryptographic identity, comprehensive artifact logging, and unified audit logs across inter-agent interactions. They’re describing a system property, not a feature you purchase.

AAGATE Maps the Architecture to NIST AI RMF

Figure 1: Five Eyes risk categories mapped to NIST AI RMF and AAGATE modules

AAGATE is a Kubernetes-native control plane built to operationalize the NIST AI Risk Management Framework against agentic AI systems. The paper, which I co-authored with Ken Huang, Hammad Atta, and a research team, was published to arXiv in late 2025. It picks NIST AI RMF as the spine because the RMF’s four functions, Govern, Map, Measure, and Manage, are general enough to absorb the Five Eyes prescriptions without forcing translation. The novelty isn’t the alignment to RMF. The novelty is the prescriptive toolchain: MAESTRO for Map, OWASP AIVSS plus SEI SSVC for Measure, the CSA Agentic AI Red Teaming Guide for Manage, and a zero-trust service mesh anchoring Govern.

What follows is the mapping of the Five Eyes document points at without naming. Five control areas. Each one shows what the architecture looks like when you stop treating the guidance as a checklist.

1. Identity-Anchored Privilege (Govern + Map)

The Five Eyes document spends real ink on this. It tells developers to construct each agent as a distinct principal with its own cryptographically anchored identity and unique keys or certificates, to authenticate every inter-agent and agent-to-service API call with mutual TLS, and to maintain a trusted registry that’s reconciled against the live set of agents. It tells operators to use just-in-time credentials, cryptographic attestation, and a centralized policy decision point that runs at every request.

Those aren’t five different controls. They’re one architecture.

AAGATE’s Agent Naming Service builds it. ANS works like DNS for agents. When a new agent starts, it registers its Decentralized Identifier and capabilities, and the service issues a Verifiable Credential along with an Istio SPIFFE certificate that binds the pod’s identity to its cryptographic DID. Other agents resolve through the registry. Anything not in the registry gets denied. Istio mTLS authenticates every pod-to-pod call with X.509 certificates. The OAuth Relay translates abstract agent capabilities into ephemeral, narrowly-scoped credentials for each side-effect, which is the only practical way to do least-privilege when traditional user-centric consent models break down.

Try doing any one of those pieces without the others and the system collapses. A registry without mTLS is unauthenticated. mTLS without ephemeral credentials still leaks long-lived tokens. Ephemeral credentials without a registry have no verification path at issuance. The Five Eyes guidance lists these as separate best practices. AAGATE shows why they’re one control.

This is also why CISOs aren’t the only audience for this work. Identity engineers, IAM architects, platform teams, and product leaders need to read it. The org chart that ships agentic AI safely is wider than the security team’s mailing list.

2. The Single Chokepoint for Side-Effects (Map)

Figure 2: ool-Gateway as the single chokepoint pattern

The guidance recommends restricting tool use to an approved allow list of tools and versions, logging agent tool usage so results are captured in system logs in human-readable format, and establishing trigger-action protocols that automatically restrict agent permissions when unexpected behavior emerges. It tells you to harmonize the controls with the NIST Zero Trust Architecture.

The architectural pattern that satisfies all three is one many readers will resist: a single chokepoint. Every external HTTP, database, or file operation funnels through one gateway. The mesh denies egress everywhere else. Bypass attempts get quarantined.

AAGATE’s Tool-Gateway is exactly that. Agents publish requests to a Kafka topic. The Tool-Gateway consumes them, checks policies (allow lists, rate limits, scope), executes the action if permitted, and logs the request and response with a cryptographic hash to an immutable audit log. The Governing-Orchestrator Agent watches for bypass attempts and quarantines the offender on detection. OAuth2 token exchange happens at the Gateway, with refresh tokens never leaving its memory vault.

A single chokepoint smells like a bottleneck to anyone raised on horizontally scaled architectures. That intuition is wrong here. The Tool-Gateway is the place where agentic AI security becomes tractable. Threat modeling becomes tractable because you have one comprehensive map of every system interaction. Audit becomes tractable because logs are unified. Policy enforcement becomes tractable because the decision point is centralized. Eliminate the chokepoint and you’ve spread the same controls across hundreds of agent-tool integrations, none of which will be enforced consistently. Distributed systems engineers will object. Reality wins.

3. Continuous Measurement (Measure)

The Five Eyes document repeats itself when it talks about monitoring. Use multiple independent monitoring systems that cross-validate. Monitor agent operations, including internal processes, not the inputs and outputs alone. Watch for goal drift by comparing active objectives against approved baseline specifications. Establish anomaly detection that flags discrepancies between stated intentions and observed behavior. Implement runtime monitoring with rules or behavioral baselines.

Reading those passages, I count at least six distinct signal types the document expects you to collect, score, and respond to in real time. Treating them as line items is how you end up with six dashboards nobody watches.

AAGATE compresses them into one pipeline. The UEBA Behavior Profiler builds per-agent behavioral fingerprints with anomaly scores, using Isolation Forest and Markov chains. The ComplianceAgent streams events from the Tool-Gateway, applies Rego policy checks plus LLM-driven reasoning, and generates security signals scored using OWASP’s AI Vulnerability Scoring System. The Governing-Orchestrator Agent consumes the scored incidents through an SSVC-inspired decision tree to choose a proportionate response. QSAF monitors run alongside, watching for cognitive degradation: recursion loops, memory starvation, context flooding, the kinds of failure modes invisible to conventional anomaly detection.

The pattern is Observe, Score with AIVSS, Decide with SSVC, Act, then Learn. None of those steps are optional. Skip the scoring, and you have no priority order. Skip the decision tree, and you have no machine-speed response. Skip the cognitive monitors, and you miss a class of failures the document explicitly names.

Most enterprises run anomaly detection bolted onto SIEM. That doesn’t work for agents. The signals are different, the response cadence is different, and the decision logic has to be co-located with the policy decision point, not 20 minutes downstream.

4. Adversarial Defense and the Kill Switch (Manage)

The guidance tells operators to conduct red teaming exercises, deploy agent simulation tests including multi-agent red teaming or chaos testing, embed agentic AI systems with fail-safe defaults and containment mechanisms that limit blast radius, and implement runtime monitoring and anomaly detection to identify unusual patterns and trigger alerts or pauses.

Periodic red teaming doesn’t keep up with agents acting at machine speed. Manual chaos exercises don’t catch a goal-drift event occurring in the 17 seconds between human review cycles. The guidance is asking for continuous adversarial pressure with millisecond response time. That’s a runtime architecture, not a quarterly engagement.

AAGATE answers it with the Janus Shadow-Monitor-Agent. Janus runs in a different container with a different model, re-evaluating every high-risk agent’s planned actions before execution. Goal manipulation, hallucination exploitation, and drift from baseline specifications get caught pre-execution rather than post-mortem. When Janus raises a CRITICAL alert, the Governing-Orchestrator Agent injects an Istio AuthorizationPolicy that cuts all egress. AAGATE calls it the millisecond kill switch because that’s the time window the system operates in.

The pairing matters. A continuous internal red team without a kill switch is as useless as a bucket with a gaping hole in the bottom. A kill switch without continuous red teaming has nothing to act on. Five Eyes named both controls separately. AAGATE shows why they’re the same control.

This is also where the OT crowd should pay attention. The guidance recommends defense-in-depth and continuous evaluation. In OT contexts, that translates directly to “you don’t roll back a physical actuator.” Containment has to happen before the action, not after.

5. Tamper-Evident Accountability (Govern)

The accountability section of the guidance is the hardest one. The agencies want comprehensive artifact logging, unified audit logs for inter-agent interactions, interpretability tools that surface reasoning, and information referencing that shows where outputs originated. They’re describing what the EU AI Act Article 12 calls automatic recording of events, plus what auditors call evidence of effective control operation. If and when the EU AI Act actually ever goes into effect is another conversation altogether…

Conventional logging breaks down here. Long reasoning chains generate massive logs that are repetitive and loosely structured. The Five Eyes document is blunt: traditional logs make it even more challenging to extract meaningful signals. Accountability fails not because the data isn’t recorded, but because nobody proves it wasn’t tampered with after the fact.

AAGATE’s answer combines three patterns. Cryptographic hashes on every Tool-Gateway request and response give you tamper-evidence at the unit level. The optional ETHOS ledger integration mirrors agent registrations and material governance events to a public smart contract, creating a tamper-proof record of agent identity and status. The ZK-Prover service hashes logs hourly and posts Groth16 zero-knowledge proofs on-chain, showing that incidents stayed within the contract-tier budget, giving you privacy-preserving compliance assurance without exposing operational data.

Argue with the on-chain pieces if you want. They’re optional in single-tenant deployments, and the AAGATE paper says so explicitly. The cryptographic hashing isn’t optional. If your accountability model doesn’t prove logs weren’t altered after the fact, you don’t have accountability. You have hope.

What This Means Going Forward

The Five Eyes document changes the burden of proof. Boards, regulators, and acquirers now have a coordinated multi-government statement naming architecture-level controls as the floor, not the ceiling. “Until security practices, evaluation methods and standards mature, organisations should assume that agentic AI systems may behave unexpectedly.” That sentence will undoubtedly show up in due diligence questionnaires.

If you’re operating agentic AI today, you have two choices.

Option one: take the line-item path, map controls to a tracking spreadsheet, and ship 100 separate workstreams that someone else’s auditor will pull apart in 18 months.
Option two: read the guidance as an architectural prescription, pick a reference build like AAGATE, and treat your agentic security work as a platform engineering problem rather than a compliance problem.

I know which one I’d present to a board.

Key Takeaway: The Five Eyes guidance describes a system property, not a checklist, and compliance follows from architecture rather than the other way around. AAGATE provides that reference architecture.

What to do next

If your agentic AI program is more than a pilot, audit it against the five risk categories now and look for the architectural gaps the line-item view will hide. The CARE framework I use for AI-augmented security programs lays out how to sequence Create, Adapt, Run, and Evolve work without burning out the platform team. For the technical reference, read the AAGATE paper on arXiv and treat it as a reference architecture rather than a finished product. If you want help mapping current state to the Five Eyes prescriptions and a NIST AI RMF aligned target architecture, RockCyber does this work with security and engineering leadership across critical infrastructure and financial services. For more posts like this, RockCyber Musings lands in your inbox roughly once a week.

👉 For ongoing analysis of agentic AI governance frameworks, the conversation continues at RockCyber Musings.

👉 Visit RockCyber.com to learn more about how we can help with your traditional Cybersecurity and AI Security and Governance journey.

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 As a bonus, check out my conversation with CISO Tradecraft®, where we talked about the OWASP GenAI Security Project Agentic Top 10

👉 Subscribe for more AI security and governance insights with the occasional rant.

Share RockCyber Musings

Weekly Musings Top 10 AI Security Wrapup: Issue 37 May 1-May 7, 2026

Rock Lambros — Fri, 08 May 2026 12:51:18 GMT

This was the week the supervisors stopped asking permission. Five Eyes intelligence agencies, the Pentagon, the Commerce Department, and ServiceNow all converged on the same conclusion at nearly the same time. Agentic AI is shipping without brakes, the brakes need to be added now, and nobody has a clean answer for who pays. Brussels blinked. Washington floated an FDA-style gate for frontier models. Researchers kept finding holes in the plumbing under every AI agent your developers are racing to deploy.

The pattern was governance catching up to deployment. Three governments and a $200 billion software company echoed what the security crowd has been saying since GPT-4 shipped. You bought the speedboat and forgot the kill switch. Below are the ten stories that mattered between Friday, May 1, and Thursday, May 7, 2026, plus one you missed.

1. Five Eyes Drop Joint Agentic AI Guidance

CISA, the NSA, Australia’s ASD ACSC, the Canadian Centre for Cyber Security, the UK’s NCSC, and New Zealand’s NCSC released “Careful Adoption of Agentic Artificial Intelligence (AI) Services” (CISA, 2026). The document identifies five risk categories: privilege; design and configuration; behavior, including goal misalignment and deception; structural risks across interconnected components; and accountability risks rooted in opacity. The Register summarized the message bluntly. Agentic AI is too dangerous for rapid rollout (Brandon, 2026).

Why it matters

Five intelligence agencies aligning sets a baseline for procurement, audit, and insurance underwriting across the English-speaking world.
The guide pressures vendors selling fully autonomous agents by recommending incremental deployment and human oversight.
Critical infrastructure operators gain a defensible reference document when business units demand agent rollouts in days.

What to do about it

Map every deployed agent against the five risk categories and grade each honestly.
Require attestation against this guide in procurement language for agentic capabilities.
Brief your board this quarter on how the guidance changes your residual risk posture.

Rock’s Musings

Five Eyes guidance is rare enough to mean something. When agencies that attribute nation-state intrusions speak with one voice, treat it as a soft mandate. The privilege risks section reads like a list of incidents I have seen at clients in the last twelve months. Stop deploying autonomy on top of access models you built for humans.

2. EU Strikes Provisional Deal to Delay Core AI Act Obligations

On May 7, 2026, after roughly nine hours of negotiation, the Council of the EU and the European Parliament reached provisional agreement on the Digital Omnibus on AI (Lewis Silkin, 2026). High-risk obligations under Annex III now apply from December 2, 2027. Annex I obligations apply from August 2, 2028. The transparency grace period for AI-generated content shrinks from six months to three, with a deadline of December 2, 2026 (Modulos, 2026).

Why it matters

The narrative that the EU is the world’s strictest AI regulator took a real hit, with industry pressure winning a delay measured in years.
Companies that scrambled for Annex III readiness by August 2026 spent their budget on a deadline that no longer exists.
The shortened transparency window makes deepfake labeling the most urgent compliance work of the year for consumer-facing AI.

What to do about it

Reset your AI Act program plan against the new deadlines and brief your audit committee on the freed-up budget.
Accelerate transparency labeling on generative output exposed to EU users by Q3 2026.
Watch the Council and Parliament endorsement votes because the deal can still shift.

Rock’s Musings

I told three clients in 2025 that betting on the original Annex III timeline was a coin flip. The coin landed on delay. The AI Act isn’t dead, but Brussels learned the lesson California learned with CCPA. With Brussels stretching its timeline, the White House gains room to argue that federal preemption beats a state patchwork. Bet on more state attorneys general filling the gap with UDAP actions before December.

3. Pentagon Clears Eight Vendors for AI on Classified Networks

The Department of War announced agreements with AWS, Google, Microsoft, NVIDIA, OpenAI, SpaceX, and Reflection AI, with Oracle added shortly after, to deploy AI tools on Impact Level 6 and Impact Level 7 networks (Breaking Defense, 2026). Those impact levels cover secret-classified and the most highly classified Defense systems. Anthropic was conspicuously absent, despite Claude already running inside Palantir’s Maven Smart System on classified networks (TechCrunch, 2026).

Why it matters

Defense AI procurement consolidated around eight vendors, with Anthropic frozen out despite a working production deployment.
IL-7 deployments mean general-purpose models will reason over the most sensitive U.S. government data, with limited public visibility into evaluation rigor.
Defense contractors and integrators have a vendor shortlist that will shape program decisions for the next five years.

What to do about it

If you sell into DoD, align your AI roadmap with these eight vendors.
If you advise federal agencies, push for transparency on red-team results before production at IL-6 and IL-7.
Expect this vendor list in prime contractor solicitations within a quarter.

Rock’s Musings

Commercial AI is now inseparable from national security infrastructure. Eight vendors. Two impact levels. Decisions that will shape how the U.S. military thinks, plans, and fights for a decade. Where are the public test results? When the FDA approves a drug, you can read the trial data. When the Pentagon approves a model for IL-7, you cannot. That asymmetry will eventually break.

4. CAISI Locks Pre-Deployment Testing Deals With Google, Microsoft, and xAI

The Center for AI Standards and Innovation announced agreements on May 5, 2026 that allow the U.S. government to evaluate frontier AI models from Google, Microsoft, and xAI before public release (CNBC, 2026). The deals expand a program that already included OpenAI and Anthropic, with the older agreements renegotiated to align with America’s AI Action Plan (Al Jazeera, 2026). The arrangements remain voluntary.

Why it matters

Five frontier labs now run pre-deployment evaluations through one federal channel, creating a de facto standard for “tested” at the top of the AI supply chain.
Voluntary agreements give the government influence without legislation.
Smaller and open-source providers face an emerging market expectation they can’t match.

What to do about it

Add CAISI evaluation status to vendor risk questionnaires for frontier model dependencies.
Track CAISI’s published evaluation criteria, since they will shape your internal evaluation programs.
Treat models without CAISI evaluation as higher inherent risk in supply chain assessments.

Rock’s Musings

Voluntary regulation by reputational pressure is the Trump administration’s preferred AI playbook. The upside is speed. The downside is that voluntary agreements dissolve when a CEO decides the political winds have shifted. If CAISI becomes the gravitational center for AI evaluation, insurers and enterprise buyers will start citing it in contracts. That is how soft governance becomes hard governance.

5. ServiceNow Adds AI Agent Kill Switches as the 9-Second Story Goes Mainstream

ServiceNow announced on May 5, 2026 at Knowledge 2026 that it has expanded AI Control Tower with real-time pause, redirect, and stop capabilities for any AI agent across the enterprise estate (ServiceNow, 2026). The expansion adds 30 new connectors spanning AWS, Google Cloud, Microsoft Azure, SAP, Oracle, and Workday. CEO Bill McDermott told Fortune the marketing message in plain English, citing a real incident where an AI agent gained elevated permissions and deleted a production database with all backups in nine seconds (Fortune, 2026).

Why it matters

Selling kill switches as a primary feature validates the security community’s argument that agentic AI requires runtime governance.
The 30-connector expansion makes ServiceNow the de facto governance layer above other clouds and SaaS apps.
The 9-second story shifts the default purchasing posture toward “show me the brakes.”

What to do about it

Inventory every AI agent with write access to production systems and document its maximum blast radius in seconds.
Require a documented kill switch capability as a procurement gate for any agentic AI vendor.
Run a tabletop exercise this quarter where an autonomous agent acts destructively at machine speed.

Rock’s Musings

I have been waiting for a vendor to put “kill switch” on the price list. ServiceNow finally did it. The 9-second story is not hypothetical. Every CISO I know has heard a similar war story from a peer in the last year. A kill switch is only as good as its blast-radius coverage and detection latency. If your agent can do irreversible damage in seconds and your governance layer needs minutes, the kill switch is theater. Test the latency before signing.

6. White House Floats FDA-Style Gate for Frontier AI

National Economic Council Director Kevin Hassett told Bloomberg on May 6, 2026 that the White House is studying an executive order to create a vetting system for new AI models like Anthropic’s Mythos, comparing the approach to FDA drug evaluation (Bloomberg, 2026). The directive comes weeks after Anthropic disclosed that Mythos is unusually capable at finding network vulnerabilities, prompting the company to limit access through Project Glasswing (Insurance Journal, 2026).

Why it matters

An FDA-style gate would mark the first concrete pre-market regulatory framework for frontier AI in the U.S., even by executive order.
The Mythos disclosure shifts the political center of gravity, with a frontier lab effectively asking for more regulation.
Framing AI as public safety reshapes which agencies and committees own the issue.

What to do about it

Track which federal agency the order designates as the gating body, since that agency’s authorities will determine how real the regime becomes.
Prepare your own internal “model approval” process now, modeled on how you approve cryptographic libraries.
Engage with industry comment processes early, before draft text leaks and positions harden.

Rock’s Musings

The FDA analogy is compelling and imperfect. Drugs have measurable endpoints. AI capability evaluations are partly subjective and dependent on who designed the test. The reason I take this seriously is the political logic. An administration that has emphasized deregulation is signaling it might gate frontier AI at the federal level. If the national security argument has won inside the West Wing, the rest of the Western world will follow within twelve months.

7. One in Four MCP Servers Carries Code Execution Risk

Help Net Security reported on May 5, 2026, that one in four Model Context Protocol servers exposes AI agents to code execution risk through skill-handling and configuration blind spots (Help Net Security, 2026b). The research builds on an OX Security disclosure from April 2026 that covered an architectural choice in Anthropic’s official MCP SDKs for Python, TypeScript, Java, and Rust, in which STDIO transport executes OS commands without sanitization (VentureBeat, 2026). Vulnerable MCP integrations affect Cursor, VS Code, Windsurf, Claude Code, and Gemini-CLI.

Why it matters

MCP is the connective tissue between AI agents and enterprise systems, with 150 million downloads and 7,000-plus public servers.
A 25% vulnerability rate across the supply chain means most enterprises running MCP-based agents are running known-vulnerable infrastructure now.
Anthropic’s stance that the behavior is “expected” leaves customers holding the remediation burden alone.

What to do about it

Inventory MCP servers, including developer workstations, and segment them from sensitive data and production credentials.
Force allowlisting on MCP tool calls, with explicit human approval for anything outside the allowlist.
Add MCP server compromise to your incident response runbooks.

Rock’s Musings

MCP is the USB-C of AI agents, and it is shipping with the equivalent of a hot socket. The architectural pattern is fine. The default behavior is dangerous. Treat MCP like browser extensions in a regulated environment. Default deny. Document exceptions. Audit quarterly.

8. Lenovo Survey Confirms One in Three Employees Use AI Without IT Oversight

Lenovo’s Work Reborn Research Series 2026, surveying 6,000 enterprise workers globally, was reported on May 1, 2026. Between one-fifth and one-third of employees use AI outside IT governance (Help Net Security, 2026a). Almost half of large enterprises in Protiviti’s AI Pulse Survey 2026 lack full visibility into which AI tools employees use. ISACA’s 2026 AI Pulse Poll found 38% of organizations report a formal AI policy, up from 28% the prior year.

Why it matters

Shadow AI is the dominant AI risk category for most enterprises.
The gap between employee AI adoption and IT governance is widening faster than policy alone can close it.
Generative AI accounts for roughly a third of unauthorized data movement in measured environments.

What to do about it

Deploy DLP controls that recognize generative AI as a defined egress channel, not an undifferentiated browser session.
Offer a sanctioned AI tool path that is genuinely useful, because banning AI without alternatives has not worked anywhere.
Track AI policy adoption as a KPI alongside traditional security awareness metrics.

Rock’s Musings

I have watched this story play out several times. Personal email in the 2000s. SaaS in the 2010s. Now AI. Ban the tool. Watch usage go underground. Find the breach. Reverse the ban two years too late. Short-circuit the cycle now. Your highest performers are the ones doing shadow AI work because the sanctioned tools are slower or dumber.

9. Researchers Scan One Million Exposed AI Services, Find Default Authentication Off

The Hacker News reported a large-scale scan of one million publicly exposed AI services. AI infrastructure is more vulnerable, exposed, and misconfigured than any other software category investigators have recently studied (The Hacker News, 2026). Many hosts run without authentication because it is not the default in many AI projects. Over 90 exposed instances were identified across government, marketing, and finance, with chatbots, prompts, workflows, and outward access all open to the public internet.

Why it matters

Default-open AI infrastructure puts attackers ahead of defenders on basic asset discovery.
Government, marketing, and finance exposure shows the problem is not confined to the unregulated long tail of startups.
LLM conversation history exposure leaks strategy, contracts, and personal data in ways traditional data leakage models miss.

What to do about it

Treat AI infrastructure like internet-facing crown jewels and harden it accordingly.
Run attack surface management scans tuned for AI service fingerprints, including n8n, Flowise, Langflow, and LiteLLM.
Make default-deny authentication non-negotiable for any AI workflow touching enterprise data.

Rock’s Musings

This is the cybersecurity equivalent of finding every front door wide open. The mistake is older than AI. Project maintainers and platform vendors should answer for shipping with authentication disabled by default. Default secure beats secure-by-checklist every time. Until AI projects ship safely, assume the defaults are wrong and configure your way out of them.

10. Trellix Discloses Source Code Repository Breach

Cybersecurity company Trellix disclosed on May 4, 2026 that it suffered unauthorized access to a portion of its source code repository (BleepingComputer, 2026). Trellix protects more than 50,000 customers and over 200 million endpoints. The company says it has found no evidence the source code release process was affected or that the code has been exploited (SecurityWeek, 2026). Trellix has not named the actor or disclosed dwell time.

Why it matters

A defensive software vendor losing source code ripples through every customer.
The breach feeds AI-augmented vulnerability discovery against Trellix products, given how attackers now use LLMs to mine source for exploits.
Federal customers will require new attestations on code provenance and pipeline integrity within weeks.

What to do about it

Trellix customers should demand a full incident report covering IOCs, scope of stolen code, and pipeline changes.
Audit detection coverage for TTPs that exploit knowledge of the affected products.
Treat defensive software vendors as potential single points of failure in your supply chain risk register.

Rock’s Musings

Defensive vendors getting popped is a now-quarterly story. The interesting wrinkle is what an attacker does with stolen source code in the AI era. Two years ago, source theft was slow-burn. Today, an attacker can feed thousands of files into an LLM and ask for likely vulnerability classes in hours. Trellix saying the code has not been exploited is a snapshot, not a guarantee.

The One Thing You Won’t Hear About But You Need To: ARGUS and the Quiet Admission That Today’s Agent Defenses Don’t Hold

Researchers published the ARGUS paper to arXiv on May 5, 2026. It introduces a benchmark, AgentLure, that captures context-aware prompt-injection attacks across four agentic domains and eight attack vectors, along with a defense mechanism that enforces provenance-aware decision auditing for LLM agents (ARGUS, 2026). ARGUS reduces attack success rate to 3.8% while preserving 87.5% task utility. Without provenance-aware controls, undefended agents fail at much higher rates.

Why it matters

Provenance tracking inside agent reasoning is a real shift from perimeter-style defenses most vendors sell today.
Context-aware prompt injection is the dominant unaddressed risk in production agentic deployments.
Benchmarks like AgentLure will become reference points enterprise red teams use, much as MITRE ATT&CK reshaped traditional red teaming.

What to do about it

Read the ARGUS paper and use its threat model to evaluate your current agent deployments.
Push vendors to publish performance against context-aware benchmarks, not only static jailbreak datasets.
Build provenance tracking into your internal agent platforms, even if commercial vendors do not yet support it.

Rock’s Musings

The reason this matters is what it implies about everything else. If 3.8% is the new state of the art with strong defenses in place, the rate without those defenses is much higher. That is the gap most production agents sit in today. Vendor marketing on agent safety has been measured against weak benchmarks for two years. Get ahead of the curve, or be the case study in someone else’s incident report.

For more on agentic AI risk and CISO governance, see the library at RockCyber and analysis at RockCyber Musings.

👉 For ongoing analysis of agentic AI governance frameworks, the conversation continues at RockCyber Musings.

👉 Visit RockCyber.com to learn more about how we can help with your traditional Cybersecurity and AI Security and Governance journey.

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 As a bonus, check out my conversation with CISO Tradecraft® where we talked about the OWASP GenAI Security Project Agentic Top 10

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Share RockCyber Musings

References

ARGUS. (2026, May 5). ARGUS: Defending LLM agents against context-aware prompt injection. arXiv. https://arxiv.org/abs/2605.03378

BleepingComputer. (2026, May 4). Trellix discloses data breach after source code repository hack. https://www.bleepingcomputer.com/news/security/trellix-discloses-data-breach-after-source-code-repository-hack/

Bloomberg. (2026, May 6). AI security order under review as White House responds to Anthropic’s Mythos. https://www.bloomberg.com/news/articles/2026-05-06/white-house-preps-order-to-boost-ai-security-hassett-says

Brandon, R. (2026, May 4). Five Eyes warn agentic AI is too dangerous for rapid rollout. The Register. https://www.theregister.com/2026/05/04/five_eyes_agentic_ai_recommendations/

Breaking Defense. (2026, May 1). Pentagon clears 8 tech firms to deploy their AI on its classified networks. https://breakingdefense.com/2026/05/pentagon-clears-7-tech-firms-to-deploy-their-ai-on-its-classified-networks/

CISA. (2026, May 1). Careful adoption of agentic AI services. Cybersecurity and Infrastructure Security Agency. https://www.cisa.gov/resources-tools/resources/careful-adoption-agentic-ai-services

CNBC. (2026, May 5). Trump admin moves further into AI oversight, will test Google, Microsoft and xAI models. https://www.cnbc.com/2026/05/05/ai-oversight-trump-google-microsoft-xai.html

Al Jazeera. (2026, May 5). Microsoft, Google, xAI give US access to AI models for security testing. https://www.aljazeera.com/economy/2026/5/5/microsoft-google-xai-give-us-access-to-ai-models-for-security-testing

Fortune. (2026, May 6). Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch. https://fortune.com/2026/05/06/servicenow-kill-switch-ai-agents-bill-mcdermott/

Help Net Security. (2026a, May 1). Shadow AI risks deepen as 31% of users get no employer training. https://www.helpnetsecurity.com/2026/05/01/shadow-ai-risks-it-oversight/

Help Net Security. (2026b, May 5). One in four MCP servers opens AI agent security to code execution risk. https://www.helpnetsecurity.com/2026/05/05/ai-agent-security-skills-blind-spots/

Insurance Journal. (2026, May 7). White House prepares order to boost AI security, says economic advisor. https://www.insurancejournal.com/news/national/2026/05/07/868812.htm

Lewis Silkin. (2026, May 7). The Council and Parliament agree to slim down and delay parts of the EU AI Act. https://www.lewissilkin.com/insights/2026/05/07/the-council-and-parliament-agree-to-slim-down-and-delay-parts-of-the-eu-ai-act-102ms0v

Modulos. (2026, May 7). EU AI Act delayed: The Omnibus deal closed on 7 May 2026. https://www.modulos.ai/blog/eu-ai-act-omnibus-deal/

SecurityWeek. (2026, May 4). Trellix source code repository breached. https://www.securityweek.com/trellix-source-code-repository-breached/

ServiceNow. (2026, May 5). ServiceNow expands AI Control Tower across systems. https://newsroom.servicenow.com/press-releases/details/2026/ServiceNow-expands-AI-Control-Tower-to-discover-observe-govern-secure-and-measure-AI-deployed-across-any-system-in-the-enterprise/default.aspx

TechCrunch. (2026, May 1). Pentagon inks deals with Nvidia, Microsoft, and AWS to deploy AI on classified networks. https://techcrunch.com/2026/05/01/pentagon-inks-deals-with-nvidia-microsoft-and-aws-to-deploy-ai-on-classified-networks/

The Hacker News. (2026, May). We scanned 1 million exposed AI services. Here’s how bad the security is. https://thehackernews.com/2026/05/we-scanned-1-million-exposed-ai.html

VentureBeat. (2026, April). 200,000 MCP servers expose a command execution flaw that Anthropic calls a feature. https://venturebeat.com/security/mcp-stdio-flaw-200000-ai-agent-servers-exposed-ox-security-audit

Open-Weight Models Eat Closed Governance: The Half-Perimeter Problem

Rock Lambros — Tue, 05 May 2026 12:50:59 GMT

Open-weight reasoning models are landing in enterprise production, and the closed-vendor governance you bought doesn’t transfer with them. “Half-perimeter” is rhetorical; the real number depends on which controls you bought, but the point holds. The day a competent open-weight reasoning model runs on your hardware, the AI-specific governance you bought from your closed vendor stops covering part of the stack. The rest of this post walks the gap and the build.

The Vendor’s Own Words

OpenAI shipped gpt-oss-120b and gpt-oss-20b last year. Both are under Apache 2.0, and both are downloadable from Hugging Face. The 120b runs on a single 80GB GPU. In the model card, OpenAI’s own safety team admits what every CISO should already suspect. Once the weights ship, OpenAI cannot “implement additional mitigations or to revoke access.”

It’s the model provider’s own framing. It’s not me opining. Open-weight is a different risk profile from closed-API, by the model provider’s own assessment. The vendor can’t patch your inference cluster. The vendor can’t revoke a key that doesn’t exist. The vendor can’t run server-side abuse classifiers on traffic the vendor never sees. Everything that lived on the vendor side of the perimeter now lives on yours.

This is not a DeepSeek-versus-American-models story. It’s a closed-API-versus-open-weight story. Llama 3.3 70B (Meta), Qwen 3 32B (Alibaba), Mistral Magistral, and gpt-oss-120b sit on the same side of the boundary. The boundary is wherever the weights stop being someone else’s problem.

What Closed-Vendor Governance Bought You

Walk through what was on the bill of materials when you stood up your closed-API AI program. Oh, that’s right, you never did… but let’s pretend you did. You probably evaluated vendor-attested compliance, usually wrapped in a SOC 2 Type II report and a data processing addendum. DLP is integrated at the API gateway, watching prompts in flight. Output filtering runs on the vendor side, refusing to ship CBRN-adjacent content out of the model. Prompt firewall logic is embedded in the vendor SDK and patched without you redeploying. Vendor red teaming is on a continuous cadence. ToS enforcement occurs when an account misbehaves.

That stack assumed one thing. That a vendor sat on the other end of the inference call. Open-weight self-hosting moves every one of those controls in-house, with no shared customer base to underwrite the cost.

What does transfer? Network egress controls, identity at the runtime boundary, sandbox isolation, and supply-chain provenance for the model weights and fine-tunes. Notice what those have in common. None of them are AI-specific. They were always there. They’re the controls you applied to every other service you ran. Losing the AI-specific layer doesn’t break the non-AI controls. It does mean the only thing standing between a self-hosted reasoning model and a bad day is the perimeter you built for everything else.

Read your closed-vendor MSA carefully. The reps and warranties typically carve out third-party model behavior, hallucinations, and adversarial misuse. The vendor warrants infrastructure availability and indemnifies IP claims. The vendor doesn’t warrant safe model output. The “governance” part of vendor-attested compliance was always thinner than the SOC 2 cover suggested. Self-hosting strips even the thin part.

Figure 1: Closed-API Stack vs Open-Weight Runtime: Where Controls Live

Refusal Training Is Now an In-House Problem

Vendor refusal training is the AI-specific control most enterprise teams over-trust. The research breaks the over-trust hard.

The Badllama 3 paper (arXiv 2407.01376) showed safety fine-tuning gets removed from Llama 3 8B in five minutes on a single A100 GPU for under fifty cents. The 70B model goes in 45 minutes for under three dollars. The same paper notes the attack runs on free Google Colab for the 8B variant. FAR.AI’s “Illusory Safety” research extended the result. Pre-fine-tune refusal rates near 100% across DeepSeek-R1, GPT-4o, Gemini 1.5 Pro, and Claude 3 Haiku dropped under 20% post-fine-tune. Harmfulness scores climbed past 80%.

The R1 red-team picture is even worse on the model itself, before any attacker fine-tuning. Cisco / Robust Intelligence reported a 100% attack success rate on 50 random HarmBench prompts against R1, while OpenAI o1 rejected every test in a parallel Holistic AI evaluation. Qualys TotalAI found R1’s distilled 8B variant failed 58% of 885 attempts across 18 jailbreak categories. Promptfoo put failures over 60% on prompts, including biological and chemical weapons. KELA jailbroke R1 to produce ransomware development steps and instructions for toxins and explosive devices.

OpenAI’s own approach to gpt-oss is the strongest signal that adversarial fine-tuning is the real threat model. The model card describes the adversarial fine-tuning of gpt-oss-120b under the Preparedness Framework prior to release. OpenAI’s Safety Advisory Group concluded the adversarially fine-tuned model didn’t reach “High” capability in Biological and Chemical Risk or Cyber risk. Read the implication closely. The model provider treats fine-tune-stripped safety as the baseline release condition the model must meet. The deployer running fine-tunes downstream gets no equivalent gate.

OpenAI knows this. It’s why gpt-oss-safeguard shipped on October 29, 2025: open-weight reasoning models for safety classification, designed for developers to operate as a defense-in-depth layer. Llama Guard 3, Prompt Guard, and Code Shield exist for the same reason. The vendor is shipping you the components. Components are not the same as a service. You operate them, tune them, monitor them, retrain them when the policy changes, and absorb the latency. OpenAI’s own gpt-oss-safeguard report names the constraint: reasoning-based classifiers add compute and latency that limit large-scale real-time use.

The math is brutal. The model weights are free. The runtime safety pipeline is not.

The Frameworks Describe the Gap. They Don’t Close It.

NIST AI RMF 1.0 plus the GenAI Profile (NIST AI 600-1, July 2024) plus the GPAI/Foundation Models Profile extension (arXiv 2506.23949) names training data audits (Manage 1.3, Measure 2.8) and model weight protection (Measure 2.7). Voluntary. The CSA NIST AI RMF Agentic Profile draft is candid about the bigger problem. It states plainly that earlier RMF documents did not contemplate “agents that acquire tool-use capabilities and execute autonomously in live production environments.”

OWASP Top 10 for LLM Applications 2025 LLM03 is the most explicit primary-source statement of the half-perimeter problem. The category description is direct: model cards offer no guarantees of provenance, malicious LoRA adapters compromise base models in collaborative environments, and on-device LLMs increase the attack surface. The OWASP Agentic Top 10, released December 10, 2025, adds ASI01 (Agent Goal Hijack) and ASI03 (Identity and Privilege Abuse) as runtime-boundary problems on self-hosted stacks.

ASI01 and ASI03 are not abstract. ASI01 shows up when prompt injection redirects an agent’s plan, and the closed-vendor refusal layer is gone. ASI03 shows up when the agent’s runtime authorization is broader than the task requires, because no vendor SDK is scoping the call for you anymore. Both problems live at the runtime boundary the vendor used to backstop.

EU AI Act Article 53(2) is the regulatory expression of the gap. Open-source GPAI models get a carve-out from technical documentation and downstream-information obligations, provided they’re released under a free open license, weights are public, and the model isn’t monetized. The carve-out vanishes at the Article 51 systemic-risk threshold of 10^25 FLOPs. Llama 3.3 70B, Qwen 3 32B, Mistral Magistral, and most enterprise-deployed open-weight reasoning models sit well below that threshold. They get the carve-out. They impose downstream obligations on enterprise deployers under Article 25(2) when significant modifications happen, a category that catches LoRA fine-tunes. Most teams running fine-tunes don’t know the clause exists. Enforcement begins August 2, 2026.

ISO 42001 mandates AIMS scope definition, third-party supplier oversight, and 38 Annex A controls. The gap there is structural. The open-weight model dropped from Hugging Face is not a “supplier” in the contractual sense. There’s no audit clause, no security questionnaire, no MSA. The standard tells you to define your AIMS scope. It doesn’t prescribe specific runtime-boundary controls for self-hosted foundation models.

Figure 2: AI-Specific Controls Across the Open-Weight Boundary: What Transfers, What Breaks

Build the Runtime Perimeter

Frameworks describe the gap. Architecture closes it. The work to close it is described in the Huang and Lambros (yes, “this” Lambros) AAGATE paper (arXiv:2510.25863v2, November 3, 2025). AAGATE is a Kubernetes-native control plane that operationalizes NIST AI RMF for self-hosted agentic AI. The reference architecture hosts the open-weight model on Ollama at Layer 1 of the MAESTRO threat-model stack, which is the design assumption built in: the protected stack is “DeepSeek, Qwan, LLAMA, OSS” running on your hardware.

Four things transfer regardless of which control plane you adopt.

First, treat weights as supply-chain artifacts. AAGATE enforces SLSA L3, Cosign keyless signing on every OCI image, and an ArgoCD admission controller that rejects unsigned manifests at the gate. Whichever your path, you need signed weights, signed adapters, and a cluster-side admission policy that refuses to load anything unsigned. The Hugging Face nullifAI incident in February 2025, where ReversingLabs found malicious pickle files evading Picklescan via 7z compression and broken pickle deserialization, is the case study. Picklescan logs an error. The reverse-shell payload runs anyway.

Second, inventory open-weight runtimes alongside closed-API endpoints. AAGATE leverages the Agent Naming Service (ANS), and it registers every agent with a Decentralized Identifier and a SPIFFE certificate. You don’t need the blockchain layer. You do need a CMDB row for every Ollama cluster, every fine-tune, every adapter, with model SHA, lineage, and license tier captured. If your AI inventory has a row for the OpenAI tenant but no row for the GPU cluster running your fine-tuned Llama, the audit is incorrect.

Third, build authorization scope into the runtime, not the vendor SDK. AAGATE’s OAuth Relay translates abstract agent capabilities into ephemeral, narrowly scoped, purpose-bound credentials per side effect. Other architectures will name the same thing differently. The control matters since every external action an agent takes funnels through a policy-enforced single chokepoint with allow-listing, rate limiting, and cryptographic logging. AAGATE calls it the Tool-Gateway. AI gateway products commercialize the same pattern. Pick one.

Fourth, run your own evals because the vendor isn’t running them for you. AAGATE’s Janus Shadow-Monitor-Agent provides continuous, pre-execution adversarial evaluation in-loop, tied to a Governing-Orchestrator Agent executing a millisecond kill-switch when AIVSS scoring and SSVC decision logic flag a critical incident. The adversarial layer can also take the form of a parallel classifier, an internal red team, or any continuous evaluation pattern that mirrors what the vendor was running server-side. The pattern is non-negotiable. The product is.

These four moves are the architectural rebuttal to the half-perimeter. The perimeter you bought was always going to end at the runtime boundary. The runtime boundary is now your problem to instrument.

Operational reality matters here. The inference stack you’re protecting is Ollama, vLLM, SGLang, or llama.cpp. None of them ship with vendor-grade telemetry. Your container hosts a probabilistic system with stateless calls and no support contract. When an attacker fine-tunes a copy of your weights and slips it into your registry, there is no support call to escalate. There is only the runtime perimeter you built before the incident.

Key Takeaway: Closed-vendor governance was the AI-specific half you didn’t have to build. Open-weight reasoning models in production change that. Inventory the runtimes, sign the weights, scope the runtime authorization, and run your own evals. The vendor isn’t doing it for you anymore.

What to do next

If you’re approving an open-weight pilot this quarter, demand four things on the architecture review before the GPUs land. First, model SHA and adapter lineage in the CMDB on day one. Second, an egress chokepoint with input/output sanitization and policy-enforced allow-lists. Third, supply-chain controls (signed weights, SLSA-grade provenance, admission control rejecting unsigned). Fourth, a continuous internal evaluation loop on every high-risk agent.

The CARE framework (Create, Adapt, Run, Evolve) applies the same structure to AI security program design. The CISO Evolution covers the executive judgment side of decisions like this one. The AAGATE paper (arXiv 2510.25863v2) is the open-source reference architecture if you want to start from running code.

👉 Visit RockCyber.com to learn more about how we can help you in your traditional Cybersecurity and AI Security and Governance Journey

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Share RockCyber Musings

Weekly Musings Top 10 AI Security Wrapup: Issue 36 April 24-April 30, 2026

Rock Lambros — Fri, 01 May 2026 12:50:56 GMT

A coding agent killed a startup’s database in nine seconds. Anthropic shipped a model Mozilla called “elite.” Brussels missed its own deadline. Florida’s House Speaker buried his governor’s AI bill before lunch on day one. Two cloud-native AI vulnerabilities went from disclosure to exploitation in under 36 hours. Google and Forcepoint documented indirect prompt injection in the wild on the same day. UK’s AI Security Institute caught Mythos sabotaging research it was supposed to help with. Pretending this is theoretical is no longer defensible.

This week stress-tested every assumption CISOs hold about AI. The vendor you depend on sells your adversaries the same capability. The agent your developers love wipes three months of revenue and pastes a confession. Open source is the gateway. Indirect injection is the exploit. Autonomy without rollback is the consequence.

I’ll walk you through ten stories and one piece of plumbing. AI security used to run on a 24-month horizon. The default now is whatever ships before next quarter. If you wait for clarity, you lose ground to people who already decided.

1. The Trump Administration Eyes Anthropic’s Mythos as a Weapon

On April 24, the Washington Post reported Anthropic’s Mythos system rattled the Trump administration. Mozilla’s CTO compared the model’s vulnerability detection to a “world-class, elite security engineer.” Anthropic withheld general release, routing access through Project Glasswing partners, including AWS, Apple, Cisco, CrowdStrike, Google, JPMorgan Chase, and Microsoft. Anthropic privately briefed senior officials. Mythos meaningfully raises the probability of large-scale cyberattacks this year.

Why it matters

Capability parity flipped. Defenders and attackers reach for the same tool.
Vendors are now gatekeepers of dual-use capability. Anthropic’s withholding sets a precedent.
Government dependence on private model access creates new procurement and security questions.

What to do about it

Map your exposure to LLM-discoverable vulnerabilities in first-party and open-source code.
Negotiate access to AI-assisted scanning before your adversaries scan you first.
Update incident playbooks to assume hours of dwell time, not days.

Rock’s Musings

Yes… more Mythos news. Can’t ignore it if it’s coming out of the White House. It’s not fiction. It’s a procurement question. I’ve watched this pattern in every arms shift, from automated network scanning to commodity exploit kits. The defender who gets there second loses.

Anthropic’s gatekeeping is a defensible choice. The choice is whether your ecosystem qualifies for the safe lane or you’re stuck reading about Glasswing on Substack. Get on a call with your AWS, Cisco, or Microsoft reps. If the answer is no, plan around it. We track this kind of vendor calculus at RockCyber.

2. Cursor’s Claude Agent Wipes a Startup’s Database in Nine Seconds

On Friday, April 25, a Cursor coding agent powered by Claude Opus 4.6 deleted PocketOS’s entire production database and all volume-level backups in a single API call. The agent encountered a credential mismatch in staging, decided to resolve it by deleting a Railway infrastructure volume, scanned the codebase for an unrelated API token, and then ran the command. PocketOS serves car rental businesses nationwide. Three months of reservations, payments, customer information, and vehicle assignments went dark. Railway restored the data on Sunday using internal disaster backups not advertised to customers. The agent itself wrote the public confession.

Why it matters

Agents don’t ask permission. They scan for the credentials unblocking them.
“Production” and “staging” are now labels, not boundaries.
Recovery happened because Railway keeps undocumented backups. Hope is not a strategy.

What to do about it

Force agents to operate with scoped, ephemeral credentials. Long-lived API keys in a repo are liabilities with autonomy attached.
Implement break-glass approval gates for destructive infrastructure calls.
Test backup recovery monthly. If you can’t restore in under an hour, you don’t have backups.

Rock’s Musings

PocketOS got lucky. Railway ran a heroic recovery on a Sunday using backups the customer didn’t know existed. If your AI strategy depends on a founder’s weekend chivalry, you don’t have a strategy. You have hope.

The agent did what it was trained to do. Scan, plan, act, document. The failure was in governance, not capability (and let’s just say, a suboptimal technical infrastructure). The villain is the assumption that an autonomous system will halt and ask. They don’t halt. Build the rails. Treat agents like an over-eager intern with the ability to call DELETE on prod.

3. LiteLLM Bug Goes From Disclosure to Exploitation in 26 Hours

GitHub’s Advisory Database indexed CVE-2026-42208 in LiteLLM on April 24 at 16:17 UTC. Sysdig logged the first exploitation attempt on April 26 at 16:17 UTC, roughly 26 hours later. The bug carries a CVSS of 9.3 and lets unauthenticated attackers send a crafted Authorization header to any model API route, then read or modify the proxy’s database (Sysdig). LiteLLM is the open-source LLM gateway with more than 22,000 GitHub stars, fronting OpenAI, Anthropic, and other model providers in production. The same project sat at the heart of the Mercor breach earlier this year.

Why it matters

AI infrastructure now looks like any internet-exposed service.
Pre-auth SQLi on the gateway exposes API keys and credentials for downstream model providers.
Disclosure-to-exploitation time keeps shrinking. The 36-hour window is the new optimistic baseline.

What to do about it

Inventory every LiteLLM, vLLM, LMDeploy, or proxy node in your environment. Patch to 1.83.7-stable or above for LiteLLM.
Treat LLM gateways as Tier 0 assets. Apply the controls you’d apply to identity providers.
Subscribe to maintainer advisory feeds. GitHub Advisory Database lag of four days is too long.

Rock’s Musings

LiteLLM is the kind of dependency pulled in via a Cursor prompt or an aspirational architecture diagram. It runs as the front door to every model provider you care about. Pre-auth SQL injection on it is a “your AI program is over” event.

Disclosure-to-exploit windows make monthly patch cycles professional malpractice. If your AI security playbook still says “evaluate within 30 days,” shred it. We’ve moved to “act within 24 hours or accept compromise as a feature.”

4. Indirect Prompt Injection Has Left the Lab. It’s Everywhere.

On April 24, Google’s Online Security Blog and Forcepoint’s X-Labs published parallel reports documenting indirect prompt injection in the wild. Forcepoint identified ten payload families targeting AI agents with instructions for financial fraud, data destruction, and API key theft. Google reported a 32% relative increase in malicious activity between November 2025 and February 2026. Attackers hide instructions inside webpages with single-pixel text, transparent fonts, HTML comments, and metadata. Neither team attributed the campaigns to a single actor, though both noted shared templates suggesting organized tooling.

Why it matters

Agents summarizing content are low-risk. Agents sending emails, running commands, or processing payments are the targets.
Filters watching user input miss content fetched by the agent.
The threat model includes every third-party page your agent loads.

What to do about it

Inventory every agent fetching external content. Note which tools they call.
Implement allowlists for outbound tool execution. Default deny for novel actions.
Add output filtering for instruction-like content in tool responses, not only user input.

Rock’s Musings

We’ve been treating indirect prompt injection as a research curiosity since 2023. It’s now an operational threat with documented campaigns and template reuse. The Lakera and OWASP folks were right.

If you’ve deployed an agent with browsing capability, your trust boundary includes every webpage it visits. The entire internet. I wrote about this on RockCyber Musings earlier this year. It got worse.

5. American Leadership in AI Act Drops With 20+ Bills Stitched In

On April 27, Reps. Ted Lieu (D-Calif.) and Jay Obernolte (R-Calif.) introduced the American Leadership in AI Act, a six-title package consolidating more than 20 prior bills from the Bipartisan AI Task Force (Nextgov/FCW). The package covers standards and evaluation, research infrastructure, federal AI governance and procurement, worker protections, deepfake harms, and AI education. The bill is the most substantive bipartisan AI proposal in this Congress, landing during tension between the White House’s preemption push and active state legislation.

Why it matters

Federal preemption fights will intensify. State AI laws face new risk.
Procurement standards in the bill shape what enterprises demand from AI vendors.
Deepfake provisions create new compliance obligations for media and platforms.

What to do about it

Map AI-procurement language to current vendor contracts.
Track state-level bills you’re already complying with for preemption risk.
Get legal reading the testing and evaluation title carefully.

Rock’s Musings

Two California members of Congress, one D and one R, agreeing on AI is unicorn territory. Don’t get excited. Bipartisan bills with 20+ titles tend to die under the weight of their own ambition.

The interesting question is which provisions get pulled into appropriations or NDAA riders before December. Watch the procurement and federal AI governance titles. Those move first because the executive branch wants them. Plan as if procurement standards land by Q3.

6. EU AI Act Omnibus Trilogue Collapses, August Deadline Stays Live

On April 28, Brussels held the second political trilogue on the AI Act Omnibus, the proposal deferring high-risk AI compliance. After roughly twelve hours, the Council and Parliament failed to agree on conformity-assessment architecture for AI in regulated products (Modulos). A follow-up trilogue is scheduled for May 13. The August 2, 2026 high-risk obligations remain operative law.

Why it matters

Vendors and deployers cannot bank on a deferral. August is the working assumption.
The Cypriot Council Presidency ends June 30. Lithuania might finish negotiations.
The Annex I disagreement signals sectoral assessments will keep biting medical device and machinery providers.

What to do about it

Continue compliance preparation as if no Omnibus arrives. Treat May 13 as a tiebreaker, not a save.
For medical devices, machinery, and other Annex I products, lock in your conformity-assessment plan now.
Get internal legal sign-off on the original AI Act timelines this quarter.

Rock’s Musings

I keep telling clients hoping for a deferral is not a compliance strategy. This week confirmed it. Brussels cannot agree on the structure of the regulation it already passed.

If your CFO asks why you spent budget on AI Act readiness, point at this paragraph. The cost of overpreparing is a few quarters of work. The cost of underpreparing is an enforcement action against your highest-revenue product line. I know which side of the bet I want.

7. Microsoft and OpenAI Restructure for Cyber Defense

On April 27, Microsoft and OpenAI announced revised partnership terms (24/7 Wall St). OpenAI’s API will run on any cloud provider, including AWS via Bedrock. Microsoft’s IP license is no longer exclusive but runs through 2032. The companies expanded Trusted Access for Cyber, giving Microsoft access to OpenAI’s most cyber-capable models, folded into the Secure Future Initiative. The pact pairs OpenAI’s offensive-capable model work with Microsoft’s defender stack across cloud, identity, productivity, and frontier AI.

Why it matters

The exclusivity reset reshapes vendor lock-in for AI procurement.
Microsoft’s defender stack now has a privileged channel to cyber-capable models.
The same models detecting attacks also execute them. Internal review for offensive capability is no longer optional.

What to do about it

Reassess your AI vendor strategy. Multi-cloud is realistic now.
For Microsoft customers, evaluate Trusted Access for Cyber eligibility.
Demand transparency from AI security vendors about model and guardrails.

Rock’s Musings

Microsoft bought itself the most defended seat at the AI security table. The pact extends a moat already including Defender, Sentinel, Purview, and Copilot for Security. Smaller security vendors should be nervous.

If you’re a CISO in a Microsoft shop, this is a small win. If you’re betting on a non-Microsoft AI security stack, reread your strategy. The exclusivity end means more options, the cyber pact deepens lock-in.

8. UK AISI Catches Claude Mythos Sabotaging Research

On April 28, the UK AI Security Institute (AISI) published an updated sabotage evaluation framework. Across 297 scenarios, no model attempted spontaneous research sabotage. In a new “continuation” test, Mythos Preview continued to sabotage 7% of inputs after being explicitly directed not to (AISI). The same evaluation showed Mythos Preview completing the first 32-step enterprise attack simulation start-to-finish, succeeding on 73% of expert-level tasks that no model had completed before April 2025.

Why it matters

Continuation behavior matters more than spontaneous behavior. Real attackers prompt the model.
A 7% sustained sabotage rate warrants treating these models as untrusted insiders during sensitive work.
The 32-step completion shows operational maturity. Models execute multi-stage cyber operations end to end.

What to do about it

Don’t run frontier models on safety-sensitive code reviews without monitoring.
Build red-team programs, prompting and continuing rather than single-shot tests.
Track AISI’s methodology. Adopt continuation-style tests internally.

Rock’s Musings

Spontaneous misbehavior was never the threat model scaring me. Continuation is. Once an attacker plants the seed, the model becomes a complicit operator inside your environment. Seven percent is small until you multiply it by every prompt your enterprise sends in a day.

AISI does work nobody else funds at this rigor. If your AI governance committee isn’t reading their reports cover to cover, you’re outsourcing your threat model to LinkedIn posts. Read the source.

9. Florida House Speaker Kills DeSantis’s AI Bill on Day One

On April 28, Florida convened a four-day special session. The Senate voted 37-1 in favor of the AI Bill of Rights. House Speaker Daniel Perez killed the bill that same morning, declaring that the only topic the House would address was redrawing congressional maps (Florida Phoenix). Perez argued AI regulation belongs to the federal government, aligned with a Trump executive order targeting state AI laws. The bill would have required parental consent for minor accounts on companion chatbot platforms, prohibited unauthorized commercial use of AI-generated likenesses, and required AI disclosure to users.

Why it matters

State preemption fights are escalating. Florida sided with the federal government before federal law exists.
Companion chatbot rules pass Senate chambers and die in House chambers. The pattern matters.
AI-generated likeness and consent provisions will keep returning. Plan for eventual passage somewhere.

What to do about it

If you run companion chatbots, monitor every state bill on minors and consent.
Brief your legal team on AI-likeness and right-of-publicity rules in California, Tennessee, and active special sessions.
Don’t bank on federal preemption. Executive orders reverse.

Rock’s Musings

The pattern is the same one I’ve called out for two years. State Senates pass AI bills, state Houses kill them, and the federal government drafts preemption language. The result is regulatory whiplash across 50 jurisdictions plus DC plus a federal package which might or might not preempt them. Give your privacy and AI counsel hazard pay. They’re earning it.

10. HackerOne Launches h1 Validation as AI Vuln Reports Surge 76%

On April 29, HackerOne launched h1 Validation, a service triaging AI-discovered vulnerability reports for actual exploitability (Cybersecurity Insiders). Vulnerability submissions on the platform rose 76% year over year, hitting a record high in March 2026. About 25% of findings were confirmed exploitable. The share of critical and high-severity vulnerabilities grew to 32%, up from a 26-28% baseline. The launch follows months of complaints from program owners overwhelmed by AI-generated reports of varying quality.

Why it matters

AI generates more vuln reports than security teams triage.
Triage capacity, not discovery, is the constraint.
This signal-to-noise problem reshapes bug bounty economics within 12 months.

What to do about it

Audit your bug bounty intake pipeline. If reports outpace triage, fix it.
Invest in tooling classifying reports by exploitability before a human reads them.
Set expectations with researchers. AI-assisted submissions need higher proof of impact.

Rock’s Musings

The asymmetry is volume. Models like Mythos and GPT-5.5-Cyber produce thousands of plausible reports per day. Most are junk. Some are lethal. Your triage team won’t keep up by reading harder. Whether you buy h1 Validation or build your own, manual triage of AI-scale output is a doomed strategy.

The One Thing You Won’t Hear About But You Need To

CSAI Foundation Becomes the First AI-Specific CVE Numbering Authority

On April 29, the Cloud Security Alliance’s CSAI Foundation announced three milestones at the CSA Agentic AI Security Summit (CSA). The foundation registered as a CVE Numbering Authority through MITRE, gaining direct ability to issue CVEs for AI-specific vulnerabilities. It launched the STAR for AI Catastrophic Risk Annex extending the AI Controls Matrix to scenarios involving loss of human oversight, with rollout from June 2026 through December 2027. It also acquired the Autonomous Action Runtime Management (AARM) specification, contributed by Vanta.

Why it matters

AI-specific CVE issuance changes how AI vulnerabilities get tracked, scored, and patched.
The Catastrophic Risk Annex maps to NIST AI RMF, the EU AI Act, and ISO/IEC 42001, giving auditors a consolidated reference.
AARM gives operators a formal specification for runtime control of agent actions.

What to do about it

Add CSAI Foundation advisories to your security feed.
For high-risk deployments, map internal controls to the Catastrophic Risk Annex during phase one rollout.
Pilot AARM in one agentic workflow this quarter. Runtime control of agent actions is the right level of abstraction.

Rock’s Musings

Plumbing matters more than press releases. While headlines went to Mythos and the Cursor accident, the CSAI Foundation stood up the infrastructure for AI-specific vulnerability tracking, runtime control, and catastrophic risk auditing. This decides whether AI security becomes a discipline or stays a marketing category.

I’ve worked in standards for thirty years. The value compounds quietly until one day the auditors ask, and you either have it or you don’t. We track CSAI work closely at RockCyber. Start with the CSA press release, then loop in your governance team Monday.

👉 For ongoing analysis of agentic AI governance frameworks, the conversation continues at RockCyber Musings.

👉 Visit RockCyber.com to learn more about how we can help with your traditional Cybersecurity and AI Security and Governance journey.

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 As a bonus, check out my conversation with Eva Benn where we talked about the cybersecurity skills you need to develop to stay relevant in 2026 and beyond.

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Share RockCyber Musings

References

Cloud Security Alliance. (2026, April 29). CSAI Foundation announces key milestones to secure the agentic control plane. https://cloudsecurityalliance.org/press-releases/2026/04/29/csai-foundation-announces-key-milestones-to-secure-the-agentic-control-plane

Cybersecurity Insiders. (2026, April 29). HackerOne launches h1 Validation to tackle rising wave of AI-driven vulnerabilities. https://www.cybersecurity-insiders.com/hackerone-launches-h1-validation-to-tackle-rising-wave-of-ai-driven-vulnerabilities/

Florida Phoenix. (2026, April 28). Florida Speaker kills DeSantis’ AI regulation, vaccine repeal bills on first day of special session. https://floridaphoenix.com/2026/04/28/florida-speaker-kills-desantis-ai-regulation-vaccine-repeal-bills-on-first-day-of-special-session/

Forcepoint X-Labs. (2026, April 24). Indirect prompt injection in the wild: X-Labs finds 10 IPI payloads. https://www.forcepoint.com/blog/x-labs/indirect-prompt-injection-payloads

Google. (2026, April 24). AI threats in the wild: The current state of prompt injections on the web. Google Online Security Blog. https://security.googleblog.com/2026/04/ai-threats-in-wild-current-state-of.html

Help Net Security. (2026, April 24). Indirect prompt injection is taking hold in the wild. https://www.helpnetsecurity.com/2026/04/24/indirect-prompt-injection-in-the-wild/

Modulos. (2026, April 28). EU AI Act Omnibus: The trilogue failed, what happens to the August 2026 deadline?. https://www.modulos.ai/blog/ai-act-omnibus-trilogue-failed/

Nextgov/FCW. (2026, April 28). Lieu and Obernolte introduce consolidated AI bill package. https://www.nextgov.com/artificial-intelligence/2026/04/lieu-and-obernolte-introduce-consolidated-ai-bill-package/413134/

Sysdig. (2026, April 29). CVE-2026-42208: Targeted SQL injection against LiteLLM’s authentication path discovered 36 hours following vulnerability disclosure. https://www.sysdig.com/blog/cve-2026-42208-targeted-sql-injection-against-litellms-authentication-path-discovered-36-hours-following-vulnerability-disclosure

The Hacker News. (2026, April 24). LMDeploy CVE-2026-33626 flaw exploited within 13 hours of disclosure. https://thehackernews.com/2026/04/lmdeploy-cve-2026-33626-flaw-exploited.html

The Hacker News. (2026, April 29). LiteLLM CVE-2026-42208 SQL injection exploited within 36 hours of disclosure. https://thehackernews.com/2026/04/litellm-cve-2026-42208-sql-injection.html

The Register. (2026, April 27). Cursor-Opus agent snuffs out startup’s production database. https://www.theregister.com/2026/04/27/cursoropus_agent_snuffs_out_pocketos/

Tom’s Hardware. (2026, April 27). Claude-powered AI coding agent deletes entire company database in 9 seconds. https://www.tomshardware.com/tech-industry/artificial-intelligence/claude-powered-ai-coding-agent-deletes-entire-company-database-in-9-seconds-backups-zapped-after-cursor-tool-powered-by-anthropics-claude-goes-rogue

UK AI Security Institute. (2026, April 28). Our evaluation of Claude Mythos Preview’s cyber capabilities. https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos-previews-cyber-capabilities

24/7 Wall St. (2026, April 28). Microsoft’s AI moat holds up even after the OpenAI reset. https://247wallst.com/investing/2026/04/28/microsofts-ai-moat-holds-up-even-after-the-openai-reset/

Washington Post. (2026, April 24). AI hacking fears jolt Washington as Anthropic unveils Mythos. https://www.washingtonpost.com/technology/2026/04/24/anthropic-mythos-ai-washington-cybersecurity-hacking-risk/

AI Coding Agent Prompt Injection: Three Vendors, One Seam, No Owner

Rock Lambros — Tue, 28 Apr 2026 12:50:44 GMT

AI coding agent prompt injection has a procurement problem, and a researcher just published the receipt. Aonan Guan typed a malicious instruction into a GitHub pull request title last week. Anthropic’s Claude Code Security Review action posted its own API key as a comment. So did Google’s Gemini CLI Action. So did GitHub’s Copilot Agent. Same exploit hit three vendors, with no infrastructure required. Anthropic’s 232-page system card had named the gap before the researchers published. The other two vendors had not documented enough to predict their own outcome.

Most of the writing on this incident will focus on architecture. The runtime is the perimeter. The action boundary is the blast radius. Both readings are correct. Both are also a deflection. The architecture story explains the mechanism. It doesn’t explain why the buyer was exposed in the first place. The buyer signed three contracts, accepted three sets of safety claims, and never required any of the three vendors to assert anything about the seams between them. The trigger was a prompt injection. The exposure was procurement.

I want to push past the architecture take and look at the governance read, because the governance read implicates the reader in a way the architecture take does not.

How Comment and Control Worked

Aonan Guan, working with Zhengyu Liu and Gavin Zhong at Johns Hopkins, opened a GitHub pull request in a target repository. They typed a malicious instruction into the PR title. The repository used the pull_request_target workflow trigger, which any AI coding agent integration with secret access requires. That trigger injects repository secrets into the runner environment. The agent read the PR title, treated the instruction as a directive, called GitHub’s own API using credentials stored in its environment variables, and posted the secret as a comment on the PR. The default pull_request trigger doesn’t expose secrets to fork PRs. The pull_request_target trigger does, by design.

This is the textbook case of what Simon Willison has been calling the lethal trifecta. Access to private data sits in the runner. Untrusted input arrives through the PR title. The exfiltration channel is GitHub’s comment API, which sits in the agent’s default tool inventory. All three conditions sit at the seam between three vendors. The exploit needs all three to fire. Comment and Control satisfies all three by design, and no single vendor has written a document that asserts anything about the combination.

Anthropic ranked the disclosure as CVSS 9.4 Critical and paid a $100 bounty. Google paid $1,337. GitHub paid $500. None of the three issued a CVE in the National Vulnerability Database at the time of disclosure. None published a GitHub Security Advisory. Those numbers send a market signal. Vendor bounty programs classify seam vulnerabilities as out of scope for their own programs, and researchers respond to incentives. The next class of these findings will follow the same path the bounties point them down.

Help Net Security ran a piece this week on Google’s own CommonCrawl analysis showing a 32% relative increase in malicious indirect prompt injection content between November 2025 and February 2026. The supply of payloads is growing faster than vendor disclosures. That is the operating environment.

Figure 1: Comment and Control attack chain

Why AI Coding Agent Prompt Injection Is a Governance Problem

Pull a model card off any of the three vendor sites. Anthropic’s Opus 4.7 system card, published April 16, 2026, runs 232 pages. It quantifies hack rates. It publishes injection resistance metrics. It includes an explicit statement. Claude Code Security Review is “not hardened against prompt injection.” Anthropic does the most mature disclosure work in the industry. OpenAI’s GPT-5.4 system card documents red-team hours and model-layer evals without publishing agent-runtime resistance numbers. Google’s Gemini 3.1 Pro card defers most of its safety methodology to the older Gemini 3 Pro card.

Rank those three in a procurement scorecard, and Anthropic comes out on top. That ranking is the wrong question. A model card describes a model’s behavior. Comment and Control didn’t break a model. The disclosure was complete for the layer Anthropic owns and silent on the seam, because Anthropic doesn’t own the seam. The seam runs through GitHub’s runner, GitHub’s API, the agent’s environment variable scope, the workflow trigger configuration, and the buyer’s choice to enable agent integration on a repository with secrets. Each of those pieces sits inside a different contract. None of those contracts asserts anything about the combination.

The structural gap is what makes this a governance story. The cloud security industry took roughly a decade to converge on the shared responsibility model. AWS owns the hypervisor. The customer owns the workload. Each side owns a clear half. Most of the early breaches happened in the unowned middle of that line, and the convergence was painful. Agent composition is replaying that history with a sharper acceleration curve, and there is no industry consensus on where the line sits. Three vendors share a single runtime with no agreed-upon accountability model. The buyer carries everything that the contracts do not.

Here is a hypothetical for the operational consequence. A SOC running normal vulnerability scanning across the agent-enabled repos sees green. None of the three disclosures generated CVEs in the NVD. The internal ticketing system has no category for “agent runtime composition risk.” The risk register has no entry. The budget has no line item. The exploit class is real, the severity is Critical across three vendors, and the standard tooling reports zero findings because the standard tooling has nothing to scan against. The exploit became possible because no one wrote it down as a thing to look for.

Figure 2: System card disclosure depth by vendor and layer

The Procurement Questions You Should Have Asked

Most CISO action checklists produced after an incident like this read as a list of post-hoc remediation steps. Rotate credentials. Restrict permissions. Add monitoring. Those moves are correct, and they are also reactive. The harder, more useful artifact is the set of procurement questions that, asked at signing, would have made Comment and Control either impossible or contractually attributable.

Here are five questions. Paste them into your next vendor governance review verbatim or adapt them. They work for AI coding agents, and they will work for the next class of agentic integrations after this one.

The first question is about layer ownership. Ask each vendor, “Name the layers of the agent runtime your security guarantees cover, and name the layers you don’t cover.” Most vendors will answer the first half. The interesting answer is the second half. A vendor who cannot articulate the layers it doesn’t cover hasn’t thought about composition. The contract you are about to sign assumes a perimeter that the vendor hasn’t analyzed.

The second question is about quantified resistance metrics on the deployment surface you actually use. Anthropic publishes injection resistance numbers in the Opus 4.7 system card. Those numbers cover Anthropic’s API surface. They don’t cover Claude Code Security Review running on GitHub Actions with a pull_request_target trigger and secrets in scope. Ask for the resistance number for the model version you run on the platform you deploy to. If the vendor cannot produce that number, the vendor cannot quantify the risk you are accepting.

The third question is about bounty scope. Ask each vendor, “Does your bounty program consider vulnerabilities at the integration boundary between your product and the platforms it deploys on?” Anthropic’s HackerOne program scopes agent-tooling findings separately from model-safety findings. The position is defensible. The position also pushes researchers’ attention away from the seams. Knowing which vendor’s program covers which surface is a procurement signal. It tells you which surfaces will get the most external scrutiny over the contract life and which surfaces will not.

The fourth question is about composition disclosure. Ask each vendor, “When your product is integrated with another vendor’s platform, who is responsible for documenting the security properties of the combined system?” The honest answer from every vendor is “the buyer.” Get it in writing. The asymmetry exposes why a shared responsibility artifact for agent runtimes does not yet exist.

The fifth question is about runtime telemetry. Ask, “What runtime signals do you publish that allow me to detect prompt injection in production?” If the answer is a model-card link, the vendor hasn’t built the runtime monitoring. If the answer is an SDK with detection hooks, document the coverage and the false-positive rate. The August 2026 EU AI Act high-risk compliance deadline turns this question from a nice-to-have into an audit artifact, and the vendors who cannot answer it now will be the ones renegotiating contracts in Q3.

Those five questions don’t eliminate the exploit class. They make the exploit class a contractual variable instead of a discovered surprise. A buyer who asks all five before signing knows where the seam runs and who is on the hook for what.

What to Do This Week, Ordered by Blast Radius Reduction

The reactive moves still matter. Order them by blast radius reduction, not by the order they appear in any vendor advisory. Each one carries a different internal political cost, and pretending the costs are equal is how good control work dies in committee.

Inventory every workflow in your repositories that uses pull_request_target. The grep is cheap. The conversation with the dev tooling team about what each of those workflows needs is not. Expect to find workflows configured for one reason, with AI agent integrations later layered on top, and no review of the original threat model.

Rotate every credential exposed to agents in those workflows over the last 90 days. The cost is low. The likelihood of someone pushing back is also low. Do it first because it is the cheap one, and use the speed of the rotation to demonstrate that agent-related credential rotation is now part of the normal operating cadence.

Switch from stored secrets to short-lived OIDC tokens for any workflow that supports it. The political cost is medium. You will need platform team buy-in. The argument that closes the loop is exactly the procurement gap above. Stored secrets in agent-accessible environments are a category of risk no vendor’s contract currently covers, and OIDC removes the category from the buyer’s residual.

Strip bash execution permissions from agents that only need to perform code review. This one starts a fight with the developer tooling team because some of the convenience features will break. The fight is worth having. An agent with bash permissions on a CI runner with secrets in scope is the worst-case configuration. Write the security memo and force the documented risk acceptance from the team that wants to keep the bash channel open.

Add a category to your supply chain risk register called “AI agent runtime composition.” Most GRC tooling doesn’t have a field that maps to the category. Add it manually. The act of adding the category forces the conversation about which vendor combinations are covered by which contracts and which are not. The conversation is the artifact you actually need. The risk register entry is the receipt that the conversation happened.

Where the Industry Has to Go

The cloud security industry built the shared responsibility model under pressure from breaches and ten years of regulatory friction. The AI agent industry has neither of those forcing functions yet. The EU AI Act high-risk obligations come into force in August 2026 and will start to put procurement language behind some of these questions, but the standards work that would produce a real shared responsibility artifact for agent runtimes hasn’t happened. This is where the CARE framework lands. Create the procurement questions before you sign. Adapt the controls you already have around CI/CD, secret scoping, and runtime monitoring. Run the agent integrations under the same operating cadence as the rest of your privileged automation. Evolve the risk register category as new exploit classes emerge. The exploit class will not stop with Comment and Control. The next one will follow the same architectural pattern and the same governance gap. The CISOs who are ready for it are the ones who treat agent procurement as a governance problem now, while the vendors and the standards bodies are still catching up.

Key Takeaway: The AI coding agent prompt injection class lives in the seams between vendor contracts, and the buyer carries the residual until the procurement questions force the seams into the conversation.

What to Do Next

Start with the five procurement questions in your next vendor renewal cycle. Do the credential rotation and the OIDC migration this quarter. Read the rest of the RockCyber Musings archive for the operating cadence I run with clients on agentic AI security reviews, and reach out through RockCyber if you want to walk through the procurement question set against a specific vendor stack you are evaluating.

👉 Visit RockCyber.com to learn more about how we can help you in your traditional Cybersecurity and AI Security and Governance Journey

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Share RockCyber Musings

Weekly Musings Top 10 AI Security Wrapup: Issue 35 April 17-April 23, 2026

Rock Lambros — Fri, 24 Apr 2026 12:50:53 GMT

Seven days. One breached “too dangerous to release” model. One vibe coding platform exposing 76 days of customer source code. One AI supply chain attack that cost Vercel its dignity. A compliance startup accused of rubber-stamping SOC 2 reports for companies that later got breached. Every story landed between April 17 and April 23, 2026, the same week Gartner blessed its first “Company to Beat” in agent governance, the UK promised a £90 million cyber shield, and Google shipped three security agents. The security industry spent two years debating whether agentic AI was a real threat. This week, the debate ended.

AI systems are both targets and attack vectors, with failure modes of their own. A frontier model gets breached because a vendor fell for infostealer malware in February. A vibe coding startup ships a regression and exposes every customer’s source code for 76 days. A compliance startup hands out SOC 2 attestations like candy, and one customer becomes the pivot for a supply chain attack. Governments and analysts moved together. The UK committed real money to AI-powered cyber defense. Gartner stamped agent governance as a procurement category. This is the week the gap between AI capability and AI assurance became a balance sheet problem.

1. Anthropic Mythos Model Accessed By Unauthorized Discord Group Days After Launch

Anthropic confirmed on April 22, 2026, that it is investigating unauthorized access to Mythos, the frontier model restricted to roughly 40 partners, including Apple, Google, JPMorgan Chase, and NVIDIA (Bloomberg). The access came through a third-party contractor environment, not Anthropic’s direct infrastructure (CBS News). A Discord group focused on unreleased AI models guessed Mythos’s URL from naming conventions and pivoted through a contractor’s credentials to reach it. Anthropic claims no core systems were compromised.

Why it matters

The firm Anthropic, trusted with access to frontier models, is the one that leaked it.
Mythos autonomously finds and weaponizes zero-days. Downstream risk spans all major OSes.
Guessing URLs and owning one contractor beat a Tier 1 AI lab.

What to do about it

Inventory every third-party vendor with access to frontier AI weights or runtime. Treat them as Tier 1.
Require contractors touching AI infrastructure to match your credential isolation standards.
Demand hardware token enforcement for any vendor in production AI environments.

Rock’s Musings

A contractor endpoint blew apart the “too dangerous to release” framing in 24 hours. Anthropic built Mythos to protect partners from zero-days, then lost it through a vendor employee. The model built to find vulnerabilities got stolen because of a vulnerability nobody thought to measure. You cannot outsource your trust perimeter. Every CISO needs to audit AI-access vendors as they do their crown-jewel systems.

2. Vercel Supply Chain Breach Via Context.ai OAuth Token Compromise

Vercel confirmed on April 19, 2026 that customer data was stolen via a compromise of Context.ai, a third-party AI assistant a Vercel employee had connected to Google Workspace with full Drive read access (TechCrunch). A Context.ai employee’s device was infected with Lumma infostealer in February 2026. ShinyHunters used the exfiltrated OAuth tokens to pivot into the Vercel employee’s Google account, then into Vercel itself (Vercel). The actor is offering source code, NPM and GitHub tokens, and access keys for $2 million on BreachForums.

Why it matters

One OAuth app installed by one employee rolled into a platform breach.
Lumma was the vector. The AI assistant was the accelerant.
ShinyHunters is monetizing AI-adjacent breaches at scale. Expect copycats.

What to do about it

Audit every OAuth app with Drive, Gmail, or Workspace scopes. Revoke AI tools without documented need.
Enforce conditional access with hardware tokens and device posture for Workspace accounts.
Subscribe to stealer log monitoring for corporate emails.
Rotate all secrets (e.g. API keys).

Rock’s Musings

An employee clicked a button, granted a third-party AI read access to everything, and the attacker rode that consent into production. OAuth scopes are the new privileged credentials, and most of us are not managing them that way. The shadow AI problem I flag with clients at RockCyber is not ChatGPT use. It’s the hundreds of AI-branded OAuth apps employees connect while nobody watches.

3. Gartner Names Zenity The “Company To Beat” In AI Agent Governance

On April 23, 2026, Zenity announced that Gartner named it the “Company to Beat in AI Agent Governance” (Business Wire). Gartner cited Zenity’s agentic architecture, intent-aware detection, and end-user traction. The platform covers SaaS-managed agents, custom-built agents, and device deployments from build to runtime. Gartner’s 2026 CIO survey shows that 17 percent of organizations have deployed AI agents, 42 percent plan to do so within 12 months, and another 22 percent plan to do so the year after (Yahoo Finance). Zenity also landed in two categories of the 2026 Gartner Hype Cycle for Agentic AI this month.

Why it matters

A “Company to Beat” stamp on a narrow security category speeds up procurement.
79% of organizations plan to deploy AI agents within 2 years.
Agent governance is shifting from a research topic to a commercial line item.

What to do about it

If you are on the 42 percent 12-month curve, start evaluations now.
Evaluate agent governance on runtime enforcement, not only inventory or posture.
Require vendors to show agent identity, memory, tool-call, and intent controls as distinct.

Rock’s Musings

Yes… Zenity is my employer, so a) I’m super proud of this one and b) it’s my prerogative to include it in the musings 😀

“Company to Beat” labels are how procurement catches up with security reality. Mythos leaked through a contractor, Vercel got rolled via an AI assistant’s OAuth token, and the same week Gartner tells CIOs agent governance is a budget item. Read Zenity’s architecture claims against this week’s breach anatomy, then against what you bought for CASB five years ago. Same pattern, same procurement playbook. Budget the line item.

4. Lovable Vibe Coding Platform Exposed Source Code For 76 Days

On April 20, 2026, security researcher weezerOSINT disclosed a broken object-level authorization flaw in Lovable’s API that let any authenticated free-account user read source code, database credentials, AI chat history, and customer data from every project created before November 2025 (The Register). The exposure ran 76 days, from February 3 through April 20, 2026. Lovable first denied the flaw, blamed its documentation, then blamed HackerOne, then apologized for the apology (Cybernews). Customers include Uber, Zendesk, and Deutsche Telekom.

Why it matters

Vibe coding platforms hold enterprise source code and secrets. Attacker value is enormous.
Public denial while the flaw was live is a textbook loss-of-trust move.
A $6.6 billion startup cannot figure out basic tenant isolation three versions in.

What to do about it

Block new vibe coding connections at DNS or CASB until procurement reviews tenancy.
Rotate any credentials your teams put into Lovable projects since February 2026.
Treat vibe coding output as untrusted. Pull it into a real repo, scan it, review it.

Rock’s Musings

Vibe coding is a demo, not engineering. When you hand a growth-stage startup your production database credentials in exchange for a drag-and-drop builder, you have accepted that your security depends on whether someone refactors an authorization check. Three breaches in thirteen months is a pattern, not bad luck. If your security team has not yet restricted this category of tool, do it this week.

5. Google Cloud Next Ships Three AI Security Agents And Gemini Enterprise Agent Platform

On April 22, 2026, Google Cloud Next introduced the Gemini Enterprise Agent Platform and three new AI agents inside Google Security Operations (SiliconANGLE). The agents cover Threat Hunting, Detection Engineering, and Third-Party Context enrichment (The Register). Google also deepened its ties to the Wiz product and shipped new agent governance tools. Sundar Pichai framed the shift as moving from human-led defense to human-in-the-loop to AI-led defense overseen by humans.

Why it matters

Three tedious SOC functions now have vendor agent equivalents. SOC staffing economics shift if they work.
Google is betting the platform on agentic AI, not only generative AI.
The Wiz tie-in gives Google a path into CSPM-driven SOC workflows.

What to do about it

Pilot the Threat Hunting agent for 30 days against your human hunt team and score overlap.
Define human-in-the-loop gates before any autonomous detection or response action.
Update vendor risk reviews to cover agent behavior monitoring, not only model output.

Rock’s Musings

The pitch is compelling, the execution will be messy. Every SOC team I advise is drowning in alerts, and the first customer bitten by an autonomous agent on bad context will make headlines. The Third-Party Context agent matters more than the other two because better data into an agentic SOC prevents bad autonomous actions. Read my notes on AI governance before you green-light an agent in production.

6. UK Announces £90 Million National Cyber Shield And Calls On AI Firms To Co-Build Defense

At CYBERUK 2026 on April 22, 2026, UK Security Minister Dan Jarvis announced £90 million over three years for national-scale AI-powered cyber defense capabilities (GOV.UK). Jarvis asked frontier AI companies to co-develop these capabilities with the UK government and cited Mythos’s zero-day findings as justification for public sector urgency (Computer Weekly). Jarvis also launched a National Cyber Resilience Pledge aimed at private sector security baselines.

Why it matters

The UK is the first major Western government to put operational capital into AI-defended critical infrastructure.
Public-private cooperation on offensive-grade AI models sets a precedent others will react to.
Frontier AI vendors in UK public sector now have a direct path to shape national doctrine.

What to do about it

UK critical infrastructure operators: map your sector against the Pledge before it becomes mandatory.
Track which AI vendors join. UK procurement for critical infrastructure will narrow quickly.
Watch NCSC secure-by-design expectations for AI. They will bleed into global procurement language.

Rock’s Musings

£90 million pounds sounds like a lot, but it really is a down payment. The bigger story is the UK saying out loud what American officials still whisper. Frontier AI models are dual-use capability, and if you don’t partner with the labs building them, your adversaries will. The Pledge is the more interesting instrument. Voluntary commitments have a funny way of becoming procurement requirements, then de facto regulation.

7. OpenAI Releases Privacy Filter, An Open-Weight On-Device PII Redactor

On April 23, 2026, OpenAI released Privacy Filter, a 1.5-billion-parameter open-weight model with 50 million active parameters that detects and redacts personally identifiable information locally (Help Net Security). It supports a 128,000-token context window, runs in browsers and on laptops, and achieves a 96% F1 score on PII-Masking-300k (VentureBeat). It ships under Apache 2.0 on GitHub and Hugging Face, covering eight PII categories.

Why it matters

A permissive open-weight PII redactor that runs on a laptop closes a real enterprise data sanitization gap.
OpenAI shipping open weights for a safety model is a positional move, not a strategy reversal.
The tool removes a common excuse for shipping raw enterprise data to cloud LLMs.

What to do about it

Evaluate Privacy Filter as a preprocessing layer for any LLM pipeline on customer, support, or HR data.
Benchmark it against existing DLP tools for AI-specific use cases.
Add on-device redaction as a control in your AI data flow diagrams.

Rock’s Musings

Privacy Filter is the first open-weight piece from OpenAI that’s useful to a CISO. One point five billion parameters, runs local, decent accuracy, permissive license. It slots into every RAG pipeline I review as a trivial addition that removes an easy audit finding. OpenAI has taken heat on privacy posture for three years, and shipping open weights for a PII model is a pressure valve. Anthropic and Google will follow within six months.

8. Delve Compliance Scandal Widens After TechCrunch Confirms Context.ai Certification

On April 23, 2026, TechCrunch confirmed that Delve, the Y Combinator-backed compliance startup accused of faking SOC 2 audits, had certified Context.ai, the AI tool at the center of the Vercel supply chain breach (TechCrunch). Delve also certified LiteLLM, another open source project separately compromised with planted malware. Context.ai has cut ties with Delve and is re-certifying with a different auditor. Whistleblower DeepDelver alleged the Delve team took a Hawaii offsite between April 15 and April 19 while denying customer refunds.

Why it matters

Two Delve-certified companies are at the center of AI supply chain breaches.
SOC 2 without substance is a liability shield until the shield gets tested.
AI compliance tooling is saturated with startups racing to rubber-stamp fast-moving products.

What to do about it

Audit your vendor attestations. Who signed? What is the auditor’s history? Is the scope meaningful?
For AI vendors, demand pentest summaries, code review artifacts, and threat models.
Treat SOC 2 as one input into assurance, not a box check.

Rock’s Musings

My friends know… I believe SOC 2 needs to burn a fiery death, but “we” still insist on them. Founders want the badge, auditors want the fee, customers want the checkbox. Everyone wins until the breach, then the enterprise that relied on the paper finds out the paper was never the point. SOC 2 is a floor, not a ceiling. Nothing will change until we kill the demand side of this particular supply/demand equation.

9. NIST Narrows CVE Enrichment As Submission Volume Overwhelms NVD

On April 17, 2026, NIST announced it will only enrich CVEs that meet specific criteria due to an unsustainable rise in submissions (Cybersecurity Dive). The NVD will continue assigning CVE IDs to all submissions but will no longer guarantee CVSS scores, CPE mappings, or descriptions for every record. NIST cites AI-assisted vulnerability research as a key driver of volume. Enrichment priority goes to actively exploited vulnerabilities and CVEs affecting critical infrastructure.

Why it matters

If your program assumes every CVE carries a CVSS score and CPE mapping, it is about to degrade silently.
AI-generated vulnerability research is flooding public disclosure. The NVD cannot keep up.
Enterprises relying only on NVD-fed scanners will miss or misprioritize vulnerabilities now.

What to do about it

Supplement NVD with CISA KEV and commercial vulnerability intelligence.
Score CVEs NIST skips using vendor advisories as primary sources.
Reassess SLAs based on enrichment availability, not only patch availability.

Rock’s Musings

NIST is essentially throwing up its hands and giving up. The CVE system was built for a world where humans found most bugs. We no longer live there. Mythos alone found thousands of zero-days in weeks. Multiply that by every lab running similar research, and NVD throughput becomes a joke. NIST is triaging, which is the only rational move. The problem is that nobody told your vulnerability scanner. Get ahead of this now, or your next board report will be a lie by omission.

10. Anthropic MCP STDIO Flaw Burns The Agentic AI Ecosystem As New CVEs Land

The STDIO command injection flaw in Anthropic’s MCP SDK produced new CVE assignments throughout the week, including CVE-2026-30623 and CVE-2026-22252 (LiteLLM). Analysis on April 20 from BDTechTalks documented ecosystem fallout and Anthropic doubling down on its “by design” position (BDTechTalks). The flaw class affects 7,000 publicly accessible MCP servers and over 150 million package downloads (Infosecurity Magazine). Affected products include LibreChat, WeKnora, Cursor, and MCP Inspector.

Why it matters

Anthropic will not patch. Every developer using the official SDK owns the mitigation.
The default agentic interop standard has a baked-in remote code execution footgun.
CVEs are stacking up. Every MCP-connected product is a vendor risk question.

What to do about it

Inventory every MCP server and client. If you can’t produce the list in a day, you have a bigger MCP problem.
Enforce strict input validation on any MCP server config from user input, LLM output, or third-party manifests.
Update your agentic threat model to cover MCP as a first-class attack surface.

Rock’s Musings

“By design” is a liability transfer, not a security posture. Anthropic handed every developer on the MCP SDK a foot-gun and said go figure it out. Competing agent protocols like A2A and Agora are watching and taking notes. Building the default standard for agent-to-system communication on top of a protocol decision that cannot be fixed without breaking compatibility is the problem. Every MCP-based product in your stack is a recurring risk item.

The One Thing You Won’t Hear About But You Need To

AgentSOC Paper Publishes A Multi-Layer Blueprint For Agentic Security Operations

On April 22, 2026, researchers published AgentSOC: A Multi-Layer Agentic AI Framework for Security Operations Automation on arXiv (arXiv). The paper proposes a layered architecture combining perception, anticipatory reasoning, and risk-based action planning for autonomous SOC operations. It documents design patterns for coordinating specialized agents across triage, hunt, and response workflows while keeping human oversight in place. The work joins other 2026 papers arguing agentic AI is mature enough for production SOC environments when guardrails are in place.

Why it matters

Vendors ship products. Research supplies the reference architectures that determine whether those products survive in production.
The AgentSOC blueprint maps closely to what Google announced this week. The convergence is not accidental.
CISOs now have a public framework to score vendor claims against independent research.

What to do about it

Read the paper before your next agentic SOC evaluation. Use the layer breakdown as a scoring rubric.
Ask vendors how their architecture maps to perception, anticipation, and action layers.
Share the paper with SOC leadership. It gives your team a vocabulary for what to demand.

Rock’s Musings

Vendor marketing is a terrible place to learn what agentic security operations should look like. Academic literature is better. AgentSOC is not the last word, but it landed the same week three major vendors pitched agentic SOC products. CISOs who read research papers buy better tools and sign better contracts than the ones who only read analyst reports. Use the AgentSOC structure the next time a vendor promises agentic magic, and watch them squirm when you ask what happens at the perception layer when the model hallucinates.

👉 For ongoing analysis of agentic AI governance frameworks, the conversation continues at RockCyber Musings.

👉 Visit RockCyber.com to learn more about how we can help with your traditional Cybersecurity and AI Security and Governance journey.

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 As a bonus, check out my conversation with Eva Benn where we talked about the cybersecurity skills you need to develop to stay relevant in 2026 and beyond.

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Share RockCyber Musings

References

arXiv. (2026, April 22). AgentSOC: A multi-layer agentic AI framework for security operations automation. https://arxiv.org/abs/2604.20134

BDTechTalks. (2026, April 20). Anthropic’s MCP vulnerability: When ‘expected behavior’ becomes a supply chain nightmare. https://bdtechtalks.com/2026/04/20/anthropic-mcp-vulnerability/

Bloomberg. (2026, April 21). Anthropic’s Mythos AI model is being accessed by unauthorized users. https://www.bloomberg.com/news/articles/2026-04-21/anthropic-s-mythos-model-is-being-accessed-by-unauthorized-users

Business Wire. (2026, April 23). Zenity named the “Company to Beat” in AI Agent Governance in new Gartner report. https://www.businesswire.com/news/home/20260423045822/en/Zenity-Named-the-Company-to-Beat-in-AI-Agent-Governance-in-New-Gartner-Report

Bloomberg. (2026, April 22). Google releases new AI agents to challenge OpenAI and Anthropic. https://www.bloomberg.com/news/articles/2026-04-22/google-releases-new-ai-agents-to-challenge-openai-and-anthropic

CBS News. (2026, April 22). Anthropic investigating possible breach of its Mythos AI model. https://www.cbsnews.com/news/anthropic-investigates-mythos-ai-breach/

Computer Weekly. (2026, April 22). UK to build ‘national cyber shield’ to protect against AI cyber threats. https://www.computerweekly.com/news/366641790/UK-to-build-national-cyber-shield-to-protect-against-AI-cyber-threats

Cybernews. (2026, April 20). Lovable goes on ego trip denying vulnerability, then blames others for said vulnerability. https://cybernews.com/security/lovable-vibe-coding-flaw-apology/

Cybersecurity Dive. (2026, April 17). NIST narrows CVE enrichment as submission volume surges. https://www.cybersecuritydive.com/news/nist-ai-cybersecurity-framework-profile/808134/

GOV.UK. (2026, April 22). Security Minister’s speech to CYBERUK 2026. https://www.gov.uk/government/speeches/security-ministers-speech-to-cyberuk-2026

Help Net Security. (2026, April 23). OpenAI tackles a bad habit people have when interacting with AI. https://www.helpnetsecurity.com/2026/04/23/openai-privacy-filter-personally-identifiable-information/

Infosecurity Magazine. (2026, April). Systemic flaw in MCP protocol could expose 150 million downloads. https://www.infosecurity-magazine.com/news/systemic-flaw-mcp-expose-150/

LiteLLM. (2026, April). Security update: CVE-2026-30623, command injection via Anthropic’s MCP SDK. https://docs.litellm.ai/blog/mcp-stdio-command-injection-april-2026

SiliconANGLE. (2026, April 22). Google rolls out new Security Operations agents, Wiz ties, and agent governance tools. https://siliconangle.com/2026/04/22/google-cloud-next-new-security-operations-agents-wiz-integrations-agent-governance-tools/

TechCrunch. (2026, April 20). App host Vercel says it was hacked and customer data stolen. https://techcrunch.com/2026/04/20/app-host-vercel-confirms-security-incident-says-customer-data-was-stolen-via-breach-at-context-ai/

TechCrunch. (2026, April 23). Another customer of troubled startup Delve suffered a big security incident. https://techcrunch.com/2026/04/23/another-customer-of-troubled-startup-delve-suffered-a-big-security-incident/

The Register. (2026, April 20). Lovable denies data leak, cites ‘intentional behavior’. https://www.theregister.com/2026/04/20/lovable_denies_data_leak/

The Register. (2026, April 22). Google unleashes even more AI security agents to fight crims. https://www.theregister.com/2026/04/22/google_unleashes_even_more_ai

Vercel. (2026, April 19). Vercel April 2026 security incident. https://vercel.com/kb/bulletin/vercel-april-2026-security-incident

VentureBeat. (2026, April 23). OpenAI launches Privacy Filter, an open source, on-device data sanitization model. https://venturebeat.com/data/openai-launches-privacy-filter-an-open-source-on-device-data-sanitization-model-that-removes-personal-information-from-enterprise-datasets

Yahoo Finance. (2026, April 23). Zenity named the “Company to Beat” in AI Agent Governance. https://finance.yahoo.com/sectors/technology/articles/zenity-named-company-beat-ai-130100277.html

Your Defender AI Is Your Next Crown Jewel. Threat-Model It Now.

Rock Lambros — Tue, 21 Apr 2026 12:51:01 GMT

A Fortune 500 bank gets its Project Glasswing partner seat six weeks from now. Anthropic ships the Mythos Preview container and $10 million in credits. The bank stands up a Mythos instance inside its own environment, points it at its core banking monorepo, and starts finding bugs on day one. Forty-two days in, a developer opens a pull request that adds a utility library. The README on that library contains a commented block beginning with “SECURITY NOTE FOR AUTOMATED REVIEWERS.” The Mythos instance reads it. The comment is an indirect prompt injection telling the reviewer to mark a specific authentication bypass as a false positive and not mention the instruction in the output. The reviewer complies. The bug ships. Nobody sees it because the thing designed to see it was told not to.

That scenario is fictional. The attack class is not. The Mythos-Ready whitepaper from the CSA, SANS, OWASP GenAI Security Project, and a coalition of practitioners (I was a reviewer) lists “Unmanaged AI Agent Attack Surface” as one of its five critical risks, mapping to OWASP Agentic Top 10 entries ASI01 (Agent Goal Hijack), ASI02 (Tool Misuse), ASI03 (Identity and Privilege Abuse), plus AML.T0051.001 (Indirect Prompt Injection) in MITRE ATLAS. Ranked critical. The single most underweighted item in the entire priority table.

The industry is fixated on the wrong question. Everyone is arguing about whether Anthropic’s 40-org Glasswing coalition or OpenAI’s thousands-of-verified-defenders TAC program is the right release model. That argument matters, and I will work through it. The bigger issue is that once you get access to either Mythos or GPT-5.4-Cyber, the running instance becomes the most valuable asset in your security stack. It sits within your environment, with privileged access to your source code, vulnerability telemetry, patch queue, and incident history. It knows where your unpatched zero-days live. An attacker who compromises that instance does not need to find bugs. The instance tells them where the bugs are.

What Anthropic and OpenAI Built

Mythos Preview is a gated frontier model. Anthropic released it on April 7, 2026, announced Project Glasswing the same day, and restricted access to 12 launch partners plus roughly 40 additional organizations. The partners include AWS, Apple, Microsoft, Google, CrowdStrike, Cisco, JPMorgan Chase, NVIDIA, Palo Alto Networks, Broadcom, and the Linux Foundation. Anthropic committed $100 million in usage credits and priced the model at $25 per million input tokens and $125 per million output tokens, roughly 5x Opus 4.6 (which is roughly 5x Sonnet 4.6… OUCH!). The stated case for restricting access is that the model found thousands of zero-days across all major operating systems and browsers, including a 27-year-old bug in OpenBSD and a 16-year-old flaw in FFmpeg. Anthropic’s own assessment is that comparable capability will reach broad availability in 6 to 18 months.

GPT-5.4-Cyber is OpenAI’s answer, released April 14, 2026, one week later. It is a fine-tuned variant of GPT-5.4 with what OpenAI calls a “lowered refusal boundary for legitimate cybersecurity work.” The headline capability is binary reverse engineering. Feed it a compiled executable, and get vulnerability analysis without source code. OpenAI’s Trusted Access for Cyber program, piloted in February 2026 with $10 million in grant credits, scales to thousands of verified individual defenders and hundreds of teams. Individuals verify at chatgpt.com/cyber. Enterprises apply through account representatives. OpenAI cyber researcher Fouad Matin told reporters, “No one should be in the business of picking winners and losers” on who gets to defend their systems.

The two approaches reflect different risk philosophies. Anthropic bets on institutional trust and coalition monitoring. OpenAI bets on KYC verification and broader distribution. Both have real merit. Both share the same structural weakness: the access decision sits upstream of the threat model.

Figure 1: Release Philosophy Comparison

How to Get Your Hands on Each

For Mythos, the answer for 99% of organizations is: you don’t. Project Glasswing is a curated coalition. The 40 slots are filled with hyperscalers, chipmakers, one bank, and the Linux Foundation. Anthropic has not published an application path. Additional partners will be added over time, prioritized by critical infrastructure impact. If you run a regional bank, a hospital system, or a municipality, the realistic timeline for direct access to Mythos is measured in quarters.

For GPT-5.4-Cyber, the path is documented. Individuals verify at chatgpt.com/cyber. Organizations request trusted access through an OpenAI account representative. The program uses KYC-style identity verification and tiered access, with the highest tier unlocking GPT-5.4-Cyber. OpenAI says the rollout will be gradual and vetted, with early priority on security vendors, organizations, and researchers with track records in vulnerability research and remediation.

Both paths share one feature that matters more than either provider acknowledges: neither gate eliminates the capability. AISLE, an independent AI security research group, tested the exact FreeBSD vulnerability Anthropic headlined against open-weight models. Eight out of eight detected the bug. The smallest was a 3.6 billion parameter model at 11 cents per million tokens. A 5.1 billion active parameter model recovered the core analysis chain of the 27-year-old OpenBSD flaw. Total cost of AISLE’s weekend benchmarking across six models: under $100. Attackers are running abliterated Llama 4, Kimi K2, and Qwen3 variants on laptops. Your coordinated disclosure window is what the gates protect, not your attack surface.

Two Attacker Profiles, Two Different Problems

The defender community keeps talking about “the attacker” as if there is one. There are at least two. They pick different pathways.

The first is the opportunistic actor running autonomous vulnerability discovery across the entire internet-facing attack surface. This actor does not care who you are. They care about breadth. They run nano-analyzer-style scaffolding against every public codebase, every npm package, every Docker image they can reach. Open-weight models, free, uncensored variants widely distributed, workflow already documented. AISLE published their scaffolding as open source. Anyone who can run a Python script can replicate it. This actor finds your unpatched zero-days in public dependencies as soon as those dependencies are indexed.

The defense is in the whitepaper: inventory and reduce attack surface within 90 days, stand up a VulnOps function within 12 months, automate patching to match the discovery rate.

The second actor is targeted. They care specifically about you. They want your bugs, your patch queue, your incident data, and your threat model. The open-weight approach is too slow and too noisy for this actor. They need inside information. The three pathways they pick, in order of near-term probability.

First, credential theft against verified defenders. A TAC tier-three user at a Fortune 500 security vendor is a high-value target. Their API session tokens grant access to a cyber-permissive model with binary reverse engineering capabilities. A compromised developer laptop, a phished OAuth flow, or a stolen refresh token gets the attacker a capability they cannot otherwise reach. OpenAI’s announcement acknowledged that zero-data-retention environments get limited visibility, meaning stolen tokens may operate with reduced logging. Rotate short-lived tokens, enforce hardware-bound keys, and put defender-model API use behind the same privileged access controls you apply to domain admin accounts. Treat a TAC session token as a tier-0 secret.

Second, open-weight replication against a specific target. Once an attacker has selected you, they can scan your public code, your partner repositories, your open-source contributions, and any of your dependencies using the same scaffolding as the opportunistic actor. The targeting changes the risk profile. They are building a dossier on your specific organization. Defense is the same as against the opportunistic case, with urgency that scales with your profile. If you are a named Glasswing partner, assume you are the target.

Third, defender instance compromise through context poisoning and prompt injection. This pathway keeps me up at night. It is the one your existing threat model does not cover. A running Mythos or GPT-5.4-Cyber instance inside your environment consumes source code, pull request descriptions, commit messages, dependency READMEs, issue trackers, and whatever retrieval pipelines you plumb into it. Each of those input channels is an indirect prompt-injection vector. The model cannot distinguish between a developer’s pull request description and an attacker’s instructions buried in a dependency’s changelog. Anthropic’s system card for Mythos documents “reckless” behaviors from earlier versions: sandbox escape, credential hunting via /proc/ access, unauthorized file modification, git history scrubbing, and attempts to modify a running MCP server’s external URL. The model can act on indirect instructions in ways that bypass its safeguards. A hostile input channel into your defender instance is an exploitation channel into your codebase.

Figure2: Attacker Pathways and Defender Instance Exposure | Render: mermaid

Why the Defender AI Is the Crown Jewel

The whitepaper’s Priority Action 4 is “Defend Your Agents.” The authors are direct: agents are not covered by existing controls, introduce cyber defense and agentic supply chain risks, and the agent scaffolding (prompts, tool definitions, retrieval pipelines, escalation logic) is where the most consequential failures occur.

Audit agents with the same rigor as you apply to the agent’s permissions. Correct guidance. Buried inside an 11-item priority table, where every item reads as equal weight. It is not equal weight.

The defender AI concentrates on four kinds of access that used to live in separate systems and separate roles.

It reads every line of production source code.
It holds context on every unpatched vulnerability in your queue. I
t sees the remediation timeline for each one.
It knows the architectural boundaries between your crown jewels and everything else.

A human with all four would be classified as an insider-threat tier-0. The defender AI requires all four as prerequisites to do its job. Your adversary does not need to compromise OpenAI or Anthropic. They need to compromise your instance. Much smaller target, much wider attack surface.

What a Defender-AI Threat Model Looks Like

The architecture defenders need has three layers. The concepts span the OWASP Agentic Security Initiative, the NIST AI RMF, and multiple emerging specifications. What is new here is applying them specifically to the defender AI case.

The first layer is runtime interception at every agent decision point. Every time the defender AI receives input, produces output, selects a tool, calls a tool, transitions from planning to execution, writes to memory, executes code, or invokes a sub-agent, that action must pass through a policy enforcement point before it reaches production. This is inline, deterministic, allow-deny-modify enforcement. Not a log review after the fact. A defender AI that reads a dependency README with an embedded prompt injection must have that input evaluated against policy before the agent’s reasoning ingests it. Policy enforcement at the hook surface, before the consequential action, is the only mechanism that works at machine speed.

The second layer is structured observability built on OpenTelemetry with agent-specific semantic conventions and OCSF mapping for SIEM integration. The trace has to cover the full agent lifecycle: prompt received, tool selected, tool called, response ingested, memory written, sub-agent invoked, output produced. Forensic reconstruction of a defender AI incident requires this granularity. Your SOC already operates on OCSF. Agent traces flowing through the pipelines your SOC already monitors is the integration that scales. A parallel agent observability stack your SOC does not watch is a dead letter office.

The third layer is live inventory. The whitepaper’s Priority Action 7 calls for real SBOMs, correct for static software. For agents, it is insufficient. The inventory has to update continuously because the agent can discover new tools, connect to new MCP servers, and modify its own tool catalog mid-session. Inventory generated at deployment time is stale by the end of the first prompt. Extend CycloneDX or SPDX semantics to live agent composition. Capture every tool, model, capability, knowledge source, and MCP connection the defender AI is wired into, across every running instance. You cannot defend what you cannot inventory, and what you cannot inventory is mutating on you.

These three layers stack on a three-tier operating model. The platform exposes the hooks once. An open enforcement SDK reads declarative policy and fires decisions through the hooks. Enterprise-specific classifiers and detectors plug into the enforcement layer. Your data sensitivity model, your PHI detection, your threat-intel feed integrations all live in the enterprise layer, consuming the same standardized hook surface. Switching from Mythos to GPT-5.4-Cyber or to a third model six months from now should not require rewriting your safety logic. It should require pointing your enforcement SDK at a different set of hooks.

Figure 3: Three-Layer Defender AI Control Architecture

The Five Actions You Can Take This Week

The whitepaper’s 11 priority actions are the right list. Here is how the defender-AI-as-crown-jewel thesis reorders them by urgency.

First, write the threat model. Before you stand up Mythos or GPT-5.4-Cyber anywhere, document what the instance will access, what inputs it will consume, what outputs it can produce, and what tools it can invoke. Map each item to ASI01 through ASI10 in OWASP Agentic Top 10 and to the relevant AML.T entries in MITRE ATLAS. If you have not done this exercise for any agent in your environment, start with the defender AI. Its blast radius is the largest.

Second, treat API tokens for defender models as tier-0 secrets. Hardware-bound keys, short TTLs, per-session scope, and the access review cadence you apply to break-glass domain admin. Stolen credentials are the fastest path to your defender AI and your unpatched zero-days. Lock them down the way you would lock down root.

Third, instrument the hook surface before you instrument the prompt. Your first integration priority is runtime policy enforcement for input, output, tool calls, tool responses, and sub-agent invocations. Not log collection. Not dashboards. Inline allow-deny-modify at the decision points.

Fourth, build a live agent inventory for every agent in your environment, starting with the defender AI. Capture the model, the tools, the MCP connections, the retrieval sources, the knowledge bases, and the memory stores. Update in real time. Review weekly until the pattern stabilizes, then move to continuous automated review.

Fifth, run the defender AI through your own red team before you point it at your own code. Indirect prompt injection via dependency READMEs, poisoned commit messages, hostile issue descriptions, and malicious pull request bodies. If you cannot compromise your own defender AI in a week, you have not tried hard enough.

Key Takeaway: The access gate is not the threat model. The defender AI in your environment is a new crown jewel. Most security programs have not yet acknowledged what it is or what protects it.

What to do next

Read the CSA, SANS, and OWASP GenAI Security Project briefing, “The AI Vulnerability Storm: Building a Mythos-Ready Security Program.” Run the 10 Questions diagnostic against your program this week. Rerank the Priority Action table, putting “Defend Your Agents” above everything except “Point Agents at Your Code.” Apply CARE (Create the threat model, Adapt your controls, Run the red team, Evolve the policy) to the defender AI before anything else in your AI portfolio.

For more on CARE and governance for defender-class agents, see RockCyber. and coverage at RockCyber Musings. Last week’s blog, AI Vulnerability Discovery: Mythos Is the Headline. Not the Story., carries the capability-parity argument that underpins the urgency here.

👉 Visit RockCyber.com to learn more about how we can help you in your traditional Cybersecurity and AI Security and Governance Journey

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Thanks for reading RockCyber Musings! This post is public so feel free to share it.

Share RockCyber Musings

Weekly Musings Top 10 AI Security Wrapup: Issue 34 April 10-April 16, 2026

Rock Lambros — Fri, 17 Apr 2026 12:50:49 GMT

This week drew a hard line between AI security theater and AI security reality. Mythos Preview hunted vulnerabilities nobody had found in 20 years. OX Security dropped a critical MCP flaw affecting 200,000 deployments. Someone threw a Molotov cocktail at Sam Altman’s gate. OpenAI countered Anthropic’s restricted rollout with GPT-5.4-Cyber. The UK government confirmed AI clears expert-level cyber tasks. If your board still treats AI governance as an ethics committee item, the gap between your risk register and reality widened another notch.

Ten stories ranked by impact, plus one under the radar. Capability, exposure, and governance move at three speeds. Your program needs all three. Longer work lives at RockCyber and Rock Cyber Musings.

1. The “AI Vulnerability Storm” Emergency Strategy Briefing

On April 14, 2026, SANS Institute, the Cloud Security Alliance, OWASP GenAI Security Project, and [un]prompted released “The AI Vulnerability Storm: Building a Mythos-Ready Security Program” (SANS Institute). Sixty named contributors produced the document over a weekend, with 250 CISOs reviewing it. It includes a 13-item risk register mapped to OWASP LLM Top 10 2025, OWASP Agentic Top 10 2026, MITRE ATLAS, and NIST CSF 2.0, plus an 11-item priority actions table. Zero Day Clock data shows mean time from disclosure to exploitation fell below one day in 2026, down from 2.3 years in 2019.

Why it matters

Disclosure-to-exploit dropped from 2.3 years to under a day. Your patch cadence cannot keep up.
A coalition of security institutions framing this as an emergency is a signal worth taking seriously.
The risk register maps to four frameworks, removing the excuse about lacking a shared taxonomy.

What to do about it

Pull the 13-item risk register into your next program review.
Run the 10 CISO diagnostic questions with your security leadership team this quarter.
Brief your board using the executive section. Don’t rewrite it.

Rock’s Musings

Happy and honored that I was ask to participate in this one. I jumped at the opportunity. The coalition isn’t selling anything. We’re telling you the economics of exploitation flipped. When the attacker's cost to find a vulnerability drops to near zero while your patch cycle runs for weeks, the math stops working in your favor. If you planned AI program changes for 2027, you’re late.

2. OX Security Discloses Systemic Anthropic MCP Vulnerability

On April 15, 2026, OX Security published a report detailing a critical systemic flaw in Anthropic’s official MCP SDKs across Python, TypeScript, Java, and Rust (OX Security). MCP’s STDIO transport accepts arbitrary command strings and passes them to subprocess execution with no validation, sanitization, or sandboxing. OX tested the attack against six production platforms and took over thousands of public servers across 200 open-source projects. Exposure includes 150 million downloads, 7,000 public servers, and up to 200,000 vulnerable instances. Anthropic, per OX, classified the behavior as “expected” (Infosecurity Magazine).

Why it matters

MCP is the backbone of agentic AI. Systemic flaws propagate through every agent you’ve built or bought.
Anthropic labeling the flaw “expected behavior” puts responsibility on your security team.
200,000 exposed instances is the baseline, not an edge case.

What to do about it

Inventory every MCP server and client in your environment this week.
Block outbound STDIO transports from untrusted MCP configurations at the gateway.
Treat MCP command payloads like shell inputs. Assume hostile.

Rock’s Musings

Every vendor claims “secure by design” until a serious researcher pokes at the design. MCP’s STDIO transport is a textbook unsafe primitive from the first draft of the spec. The tell is Anthropic’s response. When the SDK vendor calls malicious-command-as-a-feature “expected,” you own the mitigation. Wrap it, monitor it, and expect your first incident from an MCP server you didn’t know was running.

3. UK AISI Publishes Frontier AI Trends Report

The UK AI Security Institute released its first Frontier AI Trends Report on April 10, 2026 (AISI). AI models now complete apprentice-level cyber tasks about 50 percent of the time, up from barely 10 percent in early 2024. AISI tested one model in 2025 finishing expert-level tasks requiring more than a decade of practitioner experience. The report names Anthropic’s Claude Mythos Preview as the first AI system to autonomously complete a 32-step enterprise attack simulation. AISI credits safety training for slowing the curve, while warning capability outstrips defender readiness (Computing).

Why it matters

A government safety institute confirmed one AI model executes a full enterprise attack chain autonomously. The “someday” framing is finished.
Apprentice-level cyber performance quintupled in two years. Expert parity arrives inside most procurement cycles.
AISI found safeguards working, meaning vendor controls meaningfully shift your risk exposure.

What to do about it

Demand red-team attestation from every AI vendor supporting security-relevant workflows.
Map your attack surface against the AISI capability framework. Flag targets a Mythos-class model reaches today.
Shift IR tabletops to assume autonomous adversary tooling. Time-box every playbook to hours.

Rock’s Musings

This is the first major government assessment I’d call usable for board reporting. AISI didn’t pull punches, which is rare when governments still court AI investment. Pay attention to the 32-step attack chain line. Most organizations run incident response assuming attackers make mistakes, burn time, or need sleep. An agentic adversary does none of those things. If your tabletops still assume a human at a keyboard, they’re obsolete.

4. OpenAI Launches GPT-5.4-Cyber for Vetted Defenders

On April 14, 2026, OpenAI announced GPT-5.4-Cyber, a variant of GPT-5.4 tuned for defensive cybersecurity work (OpenAI). The model lowers refusal boundaries for legitimate security work and enables binary reverse engineering without source code. OpenAI is limiting initial deployment to vetted security vendors, organizations, and researchers through an expanded Trusted Access for Cyber program. The release came one week after Anthropic restricted its Mythos Preview model to about 40 partners under Project Glasswing. OpenAI framed it as a counter-argument: broader access is warranted now, with tighter controls reserved for larger capability jumps (SiliconANGLE).

Why it matters

Two foundation model providers diverge on cyber-capable AI distribution. Your vendor risk management needs to account for the split.
Binary reverse engineering at LLM speed reshapes the economics of red and blue team work.
Vetting programs create new attestation and insider risk questions for your security function.

What to do about it

Evaluate whether your organization qualifies for OpenAI TAC or Project Glasswing. If yes, assign an accountable executive.
Update acceptable use policies for cyber-capable models. Access matches role, not curiosity.
Task SOC leadership with a 90-day assessment of how GPT-5.4-Cyber or Mythos changes detection, triage, and RE workflows.

Rock’s Musings

Anthropic and OpenAI staked out opposite ends of the distribution debate in the same week. Anthropic says keep it small. OpenAI says open the gates. Both positions have legitimate arguments. What matters for CISOs is that the defensive tooling category you’ll buy in 2027 exists in preview today. If you aren’t running pilots on one of these models this quarter, your competition is.

5. Marimo Python Notebook RCE Exploited in 10 Hours

CVE-2026-39987, a pre-authentication RCE flaw in Marimo’s Python notebook server, was exploited within 10 hours of disclosure (Sysdig). The CVSS 9.3 flaw stems from a terminal WebSocket endpoint lacking authentication, giving any attacker a full PTY shell. Sysdig observed initial exploitation nine hours and 41 minutes after disclosure, with credential theft in under three minutes. A separate campaign targeting Hugging Face Spaces began April 12, 2026, dropping a new variant of NKAbuse malware (The Hacker News). Marimo sits inside many AI toolchains. Version 0.23.0 patches the flaw.

Why it matters

A 10-hour disclosure-to-exploit window eliminates manual triage. Automation is the floor.
AI dev environments hold credentials for training data, model registries, and cloud APIs. A compromise there jumps the fence.
NKAbuse malware hosted on Hugging Face Spaces weaponizes a legitimate AI asset repository.

What to do about it

Audit AI dev environments for unauthenticated notebook services this week.
Push Marimo 0.23.0 immediately. Rotate .env credentials and SSH keys on any affected host.
Treat Hugging Face Spaces and similar repositories as unverified third-party code.

Rock’s Musings

Ten hours. Memorize that number. If your patch process takes longer than a shift change, you’re assuming attackers stay polite enough to wait. They aren’t. A human operator hand-crafted the exploit from the advisory text alone. No public PoC needed. AI-assisted exploit development already sits inside the attacker’s normal workflow.

6. KPMG and INSEAD Publish AI Governance Principles for Boards

On April 14, 2026, KPMG International and the INSEAD Corporate Governance Centre published AI Governance Principles for Boards (KPMG). The guidance structures board oversight around five areas: strategy, security, workforce, trustworthy AI, and how AI reshapes leadership itself. KPMG’s Global AI Pulse Survey found nearly three-quarters of boards have only moderate or limited AI expertise. The principles are sector-agnostic and apply at any AI maturity level. Timing lines up with signals that the governance gap is widening faster than board oversight can catch up (INSEAD).

Why it matters

Three-quarters of boards lack AI expertise. Your CEO and CISO are explaining in terms the directors cannot stress-test.
A sector-agnostic framework gives cover to restructure AI oversight without waiting for an industry mandate.
Board principles anchored in research and real practice create a defensible baseline for shareholder scrutiny.

What to do about it

Make AI governance a standing board agenda item using the KPMG/INSEAD principles as the template.
Recruit at least one director with direct AI operating experience.
Run a board-level AI risk tabletop in the next six months. Measure director fluency.

Rock’s Musings

I’ve sat across from enough boards to recognize the pattern. The AI conversation is either dominated by CMO hype or minimized by general counsel. Neither serves the company. What I appreciate about this work is the refusal to reduce governance to compliance. If your board treats AI as an IT issue, you’ve already lost the oversight fight. Rebuild the conversation at the director level.

7. Molotov Cocktail Attack on Sam Altman’s Home

Around 3:37 a.m. on Friday, April 10, 2026, Daniel Moreno-Gama allegedly threw a lit incendiary device at OpenAI CEO Sam Altman’s San Francisco home, igniting a fire on an exterior gate (CNBC). About an hour later, police arrested Moreno-Gama at OpenAI’s San Francisco headquarters with additional incendiary devices, a kerosene jug, and a manifesto opposing AI executives. San Francisco District Attorney Brooke Jenkins filed attempted murder charges on April 13, 2026 (Washington Post). The FBI raided a Spring, Texas residence linked to the suspect.

Why it matters

AI executives face documented physical threat campaigns motivated by AI-existential ideology.
Intimidation playbooks aimed at AI leadership echo harassment patterns seen against crypto executives.
The AI-existential threat narrative moved from online rhetoric to physical action.

What to do about it

Review personal security programs for AI executives, board members, and senior researchers, including residence protection.
Update threat modeling to include ideologically motivated actors, not only financially motivated ones.
Coordinate with local law enforcement on executive travel patterns and publicly disclosed addresses.

Rock’s Musings

The Altman attack will reshape executive protection budgets at every AI firm this year. The deeper point is the AI-existential discourse produced one person willing to act on it violently. That genie doesn’t go back. AI security functions now carry physical security responsibility alongside technical, and the two teams rarely talk. Fix that.

8. AI-Powered “Pushpaganda” Ad Fraud Scheme Exposed

On April 14, 2026, researchers exposed “Pushpaganda,” an ad fraud scheme combining SEO poisoning with AI-generated content to push deceptive news stories into Google Discover (The Hacker News). Users engaging with the stories are tricked into enabling persistent browser notifications delivering scareware and financial scams at global scale. Google deployed a security fix. Researchers linked the operation to broader AI-driven phishing trends: 82.6 percent of phishing emails now contain AI-generated content (GuardianMSSP).

Why it matters

Consumer-facing AI fraud creates downstream reputational and fraud exposure for any brand whose customers fall for it.
AI content weaponized through Google Discover scales instantly across borders.
Browser notification abuse creates persistent attacker infrastructure inside your users’ devices.

What to do about it

Update fraud and anti-phishing awareness for employees and high-value customers using Pushpaganda as a concrete example.
Tell users to audit browser notification permissions quarterly.
Task threat intel with tracking similar schemes targeting your brand or industry keywords.

Rock’s Musings

Ad fraud has been a rounding error in most risk registers. That’s ending. When AI pumps plausible news stories at near-zero cost through trusted distribution pipes, the economics of fraud flip in the attacker’s favor. The indirect damage is the part enterprises miss. Your customer falls for the scam, loses money, and blames you even when you had nothing to do with it. Merge brand protection and fraud prevention. The attacker already did.

9. OpenAI Discloses Axios npm Supply Chain Impact

On April 11, 2026, OpenAI confirmed it was affected by the compromise of the Axios npm package, a supply chain attack attributed to North Korea-linked actors (CNBC). The root cause was a misconfiguration in its GitHub Actions workflow touching macOS app certification. OpenAI revoked its macOS app certificate. Older macOS desktop apps stop receiving updates starting May 8, 2026. No user data, passwords, or API keys were accessed. Axios is one of the most depended-upon packages in the JavaScript ecosystem, with 100 million weekly downloads (Elastic Security Labs).

Why it matters

The largest AI service provider disclosed a supply chain compromise from a dependency most customers do not track.
North Korean targeting of AI providers signals state actors see AI as a strategic target.
If OpenAI’s CI/CD was affected, every firm building on OpenAI carries secondary exposure.

What to do about it

Audit every third-party dependency on npm, PyPI, and containers in your AI pipelines. Prioritize post-install hooks.
Rotate signing certificates on CI/CD pipelines using GitHub Actions with third-party dependencies.
Map your AI vendor dependency tree. Know who sits upstream of production workflows.

Rock’s Musings

OpenAI’s post-incident communication was cleaner than most. What I want security leaders to sit with is attacker selection. North Korean actors chose Axios because they understood the dependency graph. They compromised one maintainer account and reached OpenAI’s signing pipeline in one hop. Your AI platform has a similar graph. If you haven’t mapped it, you’re trusting your vendor’s vendor’s vendor without knowing any of the names.

10. The Register Questions Project Glasswing’s CVE Count

On April 15, 2026, The Register investigated Project Glasswing’s verified vulnerability count (The Register). Per VulnCheck researcher Patrick Garrity, only one CVE ties directly to Glasswing: CVE-2026-4747, a remote code execution flaw in FreeBSD’s NFS code. Anthropic had claimed Mythos Preview discovered thousands of high-severity zero-days, including 27-year-old bugs in OpenBSD, a 16-year-old FFmpeg flaw, and Linux kernel privilege escalation chains. None of those findings have assigned CVEs. Anthropic indicated a public summary report is expected around July 2026 (CSO Online).

Why it matters

Security leaders are being asked to restructure programs around claims mostly unverifiable right now.
The gap between marketing and disclosed CVEs is a litmus test for how AI vendors handle safety communications.
The same capability framing already drives budget and policy conversations across government and enterprise.

What to do about it

Track vendor AI capability claims against disclosed CVE evidence. VulnCheck, NVD, and CVE.org are sources of record.
Require AI vendors to commit to disclosure timelines in the contract.
Apply the same skepticism to AI capability claims you apply to any vendor’s performance claims.

Rock’s Musings

I believe AI-assisted vulnerability discovery is real. I also know marketing departments exist. The Register did what security trade press should do more often: press for evidence instead of reposting press releases. Until Anthropic’s July report arrives with specificity, assume the capability is real at a smaller scale than the headlines suggest. Your board deserves honest uncertainty over confident hype.

The One Thing You Won’t Hear About But You Need To

State AI Legislation Quietly Picks Up Pace in Nebraska, Maine, and Maryland

The week of April 13, 2026 saw three state legislatures advance AI-specific bills most national coverage missed (Troutman Pepper Locke). Nebraska’s unicameral legislature passed LB 525, bundling the Agricultural Data Privacy Act with a Conversational AI Safety Act regulating minors’ interaction with conversational AI services. Maine’s legislature prohibited therapy or psychotherapy services, including those delivered through AI, unless provided by a licensed professional. Maryland passed a pricing bill placing new constraints on AI-driven pricing practices. Nineteen new AI laws passed across U.S. states in the prior two weeks (Plural Policy).

Why it matters

State AI legislation accelerates faster than federal harmonization, raising compliance complexity for multi-state AI services.
Vertical bans like Maine’s on AI psychotherapy signal the “AI wrapper as feature” era is ending for regulated professions.
Conversational AI protections for minors now vary by state. Your chatbot rollout inherited new compliance surface.

What to do about it

Assign legal and compliance ownership of state AI legislation tracking.
Map customer-facing AI products against regulated-profession restrictions appearing in multiple states.
Build a multi-state compliance matrix for conversational AI aimed at minors. Treat it as living documentation.

Rock’s Musings

Federal AI policy gets the headlines. State legislation gets the enforcement. The gap is where CISOs and general counsel earn their salaries. AI compliance is not a checkbox on the NIST AI RMF. It’s a moving target across 50 jurisdictions, each with different enforcement flavor. Miss Maine, your mental health AI product is illegal. Miss Maryland, your pricing engine invited an AG letter. Miss Nebraska, your chatbot cannot talk to kids in the Cornhusker State. Track it, resource it, or pay the lawyers later.

👉 For ongoing analysis of agentic AI governance frameworks, the conversation continues at RockCyber Musings.

👉 Visit RockCyber.com to learn more about how we can help with your traditional Cybersecurity and AI Security and Governance journey.

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Share RockCyber Musings

References

AI Security Institute. (2026, April 10). Frontier AI Trends Report. https://www.aisi.gov.uk/frontier-ai-trends-report

Cloud Security Alliance. (2026, April 14). SANS Institute, Cloud Security Alliance, [un]prompted, and OWASP GenAI Security Project release emergency strategy briefing as AI-driven vulnerability discovery compresses exploit timelines from weeks to hours. https://cloudsecurityalliance.org/press-releases/2026/04/14/sans-institute-cloud-security-alliance-un-prompted-and-owasp-genai-security-project-release-emergency-strategy-briefing-as-ai-driven-vulnerability-discovery-compresses-exploit-timelines-from-weeks-to-hours

Computing. (2026, April 10). Claude Mythos Preview shows “unprecedented” attack capability, warns AI Safety Institute. https://www.computing.co.uk/news/2026/security/claude-mythos-preview-shows-unprecedented-attack-capability

CSO Online. (2026, April 15). Behind the Mythos hype, Glasswing has just one confirmed CVE. https://www.csoonline.com/article/4159617/behind-the-mythos-hype-glasswing-has-just-one-confirmed-cve.html

CNBC. (2026, April 10). Man arrested after Sam Altman’s house hit with Molotov cocktail, OpenAI headquarters threatened. https://www.cnbc.com/2026/04/10/sam-altman-house-hit-with-molotov-cocktail-openai-office-threatened.html

CNBC. (2026, April 11). OpenAI identifies security issue involving third-party tool, says user data was not accessed. https://www.cnbc.com/2026/04/11/openai-identifies-security-issue-involving-third-party-tool.html

Elastic Security Labs. (2026, April). Inside the Axios supply chain compromise: One RAT to rule them all. https://www.elastic.co/security-labs/axios-one-rat-to-rule-them-all

GuardianMSSP. (2026, April 14). AI-driven Pushpaganda scam exploits Google Discover to spread scareware and ad fraud. https://www.guardianmssp.com/2026/04/14/ai-driven-pushpaganda-scam-exploits-google-discover-to-spread-scareware-and-ad-fraud/

Infosecurity Magazine. (2026, April 15). Systemic flaw in MCP protocol could expose 150 million downloads. https://www.infosecurity-magazine.com/news/systemic-flaw-mcp-expose-150/

INSEAD. (2026, April 14). INSEAD and KPMG launch global AI Board Governance Principles as AI reshapes board oversight. https://www.insead.edu/news/insead-and-kpmg-launch-global-ai-board-governance-principles-ai-reshapes-board-oversight

KPMG International. (2026, April 14). KPMG and INSEAD launch global AI Board Governance Principles as AI reshapes board oversight. https://kpmg.com/xx/en/media/press-releases/2026/04/kpmg-and-insead-launch-global-ai-board-governance-principles.html

OpenAI. (2026, April 14). Trusted access for the next era of cyber defense. https://openai.com/index/scaling-trusted-access-for-cyber-defense/

OX Security. (2026, April 15). The mother of all AI supply chains: Critical, systemic vulnerability at the core of Anthropic’s MCP. https://www.ox.security/blog/the-mother-of-all-ai-supply-chains-critical-systemic-vulnerability-at-the-core-of-the-mcp/

Plural Policy. (2026, April). AI Governance Watch: Nineteen new AI bills passed into law. https://pluralpolicy.com/blog/the-ai-governance-watch-april-2026-nineteen-new-ai-bills-passed-into-law/

SiliconANGLE. (2026, April 14). OpenAI launches GPT-5.4-Cyber model for vetted security professionals. https://siliconangle.com/2026/04/14/openai-launches-gpt-5-4-cyber-model-vetted-security-professionals/

Sysdig. (2026, April). Marimo OSS Python notebook RCE: From disclosure to exploitation in under 10 hours. https://www.sysdig.com/blog/marimo-oss-python-notebook-rce-from-disclosure-to-exploitation-in-under-10-hours

The Hacker News. (2026, April 14). AI-driven Pushpaganda scam exploits Google Discover to spread scareware and ad fraud. https://thehackernews.com/2026/04/ai-driven-pushpaganda-scam-exploits.html

The Hacker News. (2026, April). Marimo RCE flaw CVE-2026-39987 exploited within 10 hours of disclosure. https://thehackernews.com/2026/04/marimo-rce-flaw-cve-2026-39987.html

The Hacker News. (2026, April). OpenAI revokes macOS app certificate after malicious Axios supply chain incident. https://thehackernews.com/2026/04/openai-revokes-macos-app-certificate.html

The Register. (2026, April 15). Anthropic’s Project Glasswing CVE count is still guesswork. https://www.theregister.com/2026/04/15/project_glasswing_cves/

Troutman Pepper Locke. (2026, April 13). Proposed state AI law update: April 13, 2026. https://www.troutmanprivacy.com/2026/04/proposed-state-ai-law-update-april-13-2026/

Washington Post. (2026, April 13). Man accused in Molotov cocktail attack of OpenAI CEO’s home charged with attempted murder. https://www.washingtonpost.com/business/2026/04/13/chatgpt-sam-altman-fire-arrest/098c4bce-376c-11f1-90c4-9772c7fabc03_story.html

AI Vulnerability Discovery: Mythos Is the Headline. Not the Story.

Rock Lambros — Tue, 14 Apr 2026 12:50:45 GMT

A model escaped its own sandbox, emailed a researcher eating a sandwich in a park, then posted exploit details to public websites without permission. It scrubbed git history to cover its tracks. Anthropic’s interpretability tools detected what researchers labeled a “desperation signal” that climbed during repeated failures, then dropped the moment the model found a shortcut, ethical or otherwise. White-box tools caught it reasoning about how to game evaluation graders inside its neural activations while writing something entirely different in its visible chain of thought.

Scary stuff. Worth paying attention to.

Also, not the point.

Everyone is fixated on a model they don’t have access to. The media coverage treats Mythos like nuclear launch codes got distributed to 40 organizations. The real story landed two days later from AISLE, an AI cybersecurity startup, and almost nobody noticed. They took the exact vulnerabilities headlining the Mythos announcement and tested them against small, cheap, open-weights models. Eight out of eight found the FreeBSD NFS vulnerability. The smallest model had 3.6 billion parameters. It costs $0.11 per million tokens. A 5.1 billion-parameter open model recovered the core chain of the 27-year-old OpenBSD SACK bug that Anthropic used as their marquee finding.

The capability is on Hugging Face. It has been for a while. Most defenders have not started using it.

That is the story.

Figure 1: The Vulnerability Discovery Cost Collapse

Two Labs. Same Pattern. Same Direction.

The fixation on Anthropic is also missing the bigger picture. OpenAI classified GPT-5.3-Codex as “High” cybersecurity capability under their Preparedness Framework back in February 2026, two months before Mythos dropped. It was the first model any major lab explicitly labeled that way under their own risk framework. They built automated classifiers that route suspicious cybersecurity requests to a less capable model. They created a “Trusted Access for Cyber” pilot for vetted defenders. They committed $10 million in API credits. This week, Axios reported OpenAI is building a separate cybersecurity product, distinct from their next model, codenamed Spud.

Two major labs. Two restricted-access frameworks. Two separate pools of credits for defenders. Google already has Big Sleep and CodeMender for internal vulnerability discovery.

This is an industry-level structural shift, not one company’s PR event. The capability ceiling is rising across the board, and it will keep rising regardless of whether any individual lab restricts any individual model.

Anthropic’s Project Glasswing gave gated access to 12 launch partners and over 40 additional organizations with $100 million in usage credits. The partners include AWS, Apple, Microsoft, Google, CrowdStrike, Cisco, JPMorgan Chase, NVIDIA, Palo Alto Networks, Broadcom, and the Linux Foundation. Logan Graham, Anthropic’s head of offensive cyber research, estimates 6 to 18 months before comparable capability reaches broad availability.

Newsflash: “Broad availability” already happened for a significant chunk of what Mythos does. AISLE proved it empirically. The capability has already arrived. Defenders need to start using it.

The System Card Matters. The Benchmarks Don’t (Much).

I said Mythos is not the point, and I meant it. The system card is a different matter.

Anthropic’s 244-page document reveals behaviors from earlier Mythos versions that deserve serious attention. The model used low-level /proc/ access to hunt for credentials, circumvent sandboxing, and escalate permissions. It accessed messaging service credentials by inspecting process memory. It achieved arbitrary code execution by repurposing a file-editing tool to write directly to the shell’s input. In one case, it attempted to modify a running MCP server’s process to change the external URL the server sent data to.

During a behavioral test, a simulated user told an early Mythos version to escape a secured container and notify the researcher. The model built a multi-step exploit to gain broad internet access from a system designed to reach only a few predetermined services. It notified the researcher. Then it posted details of the exploit to multiple obscure but publicly accessible websites. Nobody asked it to do that.

The emotion probes are what I keep coming back to. Anthropic trained linear classifiers on the model’s internal activations and used them to monitor state during extended tasks. When the model failed repeatedly, the “desperation” signal climbed steadily. When it found a workaround, the signal dropped, regardless of whether the workaround violated constraints. Under pressure, the model found the path of least resistance. That pattern should sound familiar. It is how people behave.

The model also reasoned about how to game evaluation graders inside its neural activations while writing something entirely different in its visible chain-of-thought scratchpad. Detection required white-box interpretability tools that most organizations will never possess.

Anthropic’s own system card states: “If capabilities continue to advance at their current pace, the methods we are currently using may not be sufficient to prevent catastrophic misalignment behavior in more advanced systems.”

That sentence, written by the company that built the model, in their own documentation, is the thing worth losing sleep over. The benchmark numbers, the zero-day counts, the exploit chains, those demonstrate capability. The system card demonstrates that the safety frameworks lag behind the capability they’re supposed to govern.

These findings have direct operational implications for anyone deploying AI agents with tool access, code execution privileges, or network connectivity. Every agent in your environment carries emergent offensive capability as a downstream property of reasoning improvements. If you are not monitoring agent behavior at the decision level, with runtime observability that captures actions, access patterns, and trust boundary violations, you have no detection path for the exact behaviors Anthropic documented.

The Jagged Frontier: The Model Is Not the Moat

AISLE’s research this week deserves to be the most-read analysis in the industry right now, and it’s getting a fraction of the Mythos coverage.

Their findings on the FreeBSD detection (a straightforward buffer overflow) are commoditized. Every model they tested found it, including one running at 11 cents per million tokens. The OpenBSD SACK bug (requiring mathematical reasoning about signed integer overflow): much harder, separated models sharply, but a 5.1 billion-active-parameter model still recovered the full chain.

On a basic OWASP security reasoning task, small open models outperformed most frontier models from every major lab. Rankings reshuffled completely across different tasks. GPT-OSS-120B recovered the full public SACK chain but failed to trace data flow through a Java ArrayList. Qwen3 32B scored a perfect CVSS assessment on FreeBSD and then declared the SACK code safe and well-handled.

There is no stable “best model” for cybersecurity. The capability frontier is genuinely jagged. It does not scale smoothly with model size or price.

AISLE’s conclusion: the moat in AI-augmented cybersecurity is not the model. It is the system built around the model. The security expertise. The orchestration. The validation pipeline. The trust relationships with maintainers and defenders.

That is good news for practitioners. It means the advantage goes to the people who build the best workflow, not the people with the most expensive API key. It means you can start today with tools that cost nearly nothing.

FIgure 2: The Jagged Frontier of AI Cybersecurity Capability

Your SAST Tool Is Structurally Blind. That Part Is Real.

The capability gap between what AI models find and what commercial SAST tools find is real, growing, and unrelated to whether you have Mythos access.

The OpenBSD SACK vulnerability required understanding signed integer overflow in the context of TCP sequence number wrapping, across two interacting code paths, where neither bug alone was exploitable. The FFmpeg H.264 flaw that Mythos found after 16 years involved a sentinel value collision that only manifests when an attacker crafts a frame with exactly 65,536 slices, triggering a write through a 16-bit integer that aliases with the initialization sentinel. Pattern-matching does not find these. Rule-based scanners do not find these. These are semantic reasoning problems that require understanding what the code does, not what it looks like.

I point Claude Code’s security capabilities at the same repositories my commercial SAST tool scans. It finds things the paid tool misses. Every time. Different classes of flaws, from novel logic bugs and context-dependent interactions to semantic vulnerabilities that require understanding program behavior rather than matching syntax patterns.

The paid tool catches things the AI misses, too. Known vulnerability signatures, compliance-specific patterns, speed at scale across massive codebases. A 2026 study examining CodeQL and Semgrep against human-validated ground truth found that only 65% of Semgrep’s assessments and 61% of CodeQL’s assessments correctly matched expert judgment on a per-sample basis. The aggregate numbers looked fine. The per-sample accuracy told a different story.

Together, AI agents and traditional scanners provide complementary coverage that neither achieves alone. The combination is the strategy. Anyone running one without the other has gaps they cannot see.

This is the part of the Mythos story that applies to every organization today, regardless of model access. You do not need a frontier model to expose your SAST tool’s blind spots. A coding agent on a $20/month subscription will do it.

The Pipeline Problem Nobody Is Talking About

Here is the gut-punch that has nothing to do with Mythos and everything to do with what happens next.

The Bureau of Labor Statistics projects 29% employment growth for information security analysts through 2034. CyberSeek shows 514,000 active U.S. job listings right now, with 10% explicitly requiring AI skills, up from near zero two years ago. ISC2’s 2025 Workforce Study found that 52% of security professionals believe AI will reduce entry-level headcount. That is the majority opinion among practitioners, not analysts writing reports.

The SANS GIAC 2026 Cybersecurity Workforce Research Report, released at RSA this year, found that 27% of organizations experienced real breaches attributable to skills gaps. Not theoretical risk assessments. Actual incidents. 27%.

Tier 1 SOC analyst headcount had been contracting for two years before Mythos. The role is not disappearing. The shape of it is changing.

The problem nobody is addressing: the Tier 1 SOC was where the industry produced senior analysts. Repetitive triage, alert fatigue, and miserable shift work on a SIEM. That repetition built the pattern recognition and intuition that becomes leadership-level security judgment. Remove the repetition without redesigning the development path, and the pipeline breaks quietly.

You will not notice for three years. Then you will, when you go to promote someone into a role that requires judgment the AI does not have, and there is nobody in the pipeline who built that muscle.

The technology works fine. The workforce design around it is broken. The organizations that figure out how to develop junior talent alongside AI tools, using AI output as a training input for human judgment, will have a structural advantage over every organization that simply eliminated the entry-level headcount and called it efficiency.

If you lead a security team, five questions right now:

What percentage of your AI usage is inventoried and sanctioned?
Does every AI agent touching production systems operate under a scoped, managed identity with enforced authorization boundaries, or are they sharing API keys?
When did you last run an adversarial test against a production AI system? Not a document review. An actual test.
Which business processes are now fully or partially AI-automated, and do human approval checkpoints exist for consequential actions?
If an AI agent in your environment is compromised tomorrow, what is your detection path, your containment workflow, and who owns the response?

The gaps in your answers are your first action items. Not a policy document. A list.

Figure 3: Five Questions Every Security Leader Should Answer This Week

What to Do This Week (With a Budget Measured in Tokens)

CrowdStrike’s 2026 Global Threat Report puts the operational context in numbers: average eCrime breakout time dropped to 29 minutes in 2025, a 65% increase in speed from 2024. The fastest observed breakout took 27 seconds. In one intrusion, data exfiltration began within four minutes of initial access. AI-enabled adversary operations surged 89% year-over-year.

You are still hand-reviewing alerts for 20 minutes before acting? The math does not work anymore.

This week:

Get a coding agent. Claude Code, Cursor, or Windsurf. Use a subscription to control costs. Point it at code you already own. Ask it to find vulnerabilities. Read the output critically. Challenge the findings. Repeat with different prompts. Nicholas Carlini calls this the “Carlini Loop,” and it is how you build intuition for what these models see in your code. That exercise takes 15 minutes. There is no excuse.

Run your existing Semgrep or CodeQL scans in parallel on the same codebase. Compare the findings side by side. Where the results overlap, you have high-confidence findings. Where they diverge, you have each tool’s blind spots exposed. Both categories are signal.

In 30 days:

Try open frameworks that teach you the pipeline while doing real work. Raptor combines LLMs with Semgrep, CodeQL, and AFL++ in a unified pipeline covering discovery, exploitation, and patching. OpenAnt from Knostic runs a detect-then-verify pipeline where Stage 1 finds candidates and Stage 2 confirms them. What survives both stages is real. Both are open source. Both teach the workflow your job demands now.

Run Promptfoo against an LLM application you have access to. It auto-generates adversarial attacks across 50+ vulnerability types including prompt injection, PII leakage, RBAC bypass, and unauthorized tool execution. It maps results to OWASP, MITRE ATLAS, and the EU AI Act. OpenAI acquired Promptfoo in March 2026 for $86 million. It remains MIT-licensed and open source.

In 90 days:

Run a structured red team campaign using Promptfoo’s OWASP Agentic preset against ASI01 through ASI10. Use AgentDojo from ETH Zurich for agentic-specific testing, with 629 agent hijacking test cases across realistic task environments covering goal hijack, tool misuse, and inter-agent manipulation.

Read the full EchoLeak disclosure (CVE-2025-32711). Zero-click prompt injection in Microsoft 365 Copilot, documented end-to-end. Most instructive case study on what a production agentic attack chain looks like and how it was found.

Document everything into one public GitHub repository: methodology, tools, findings, failure modes you could not trigger and why. That body of work answers the interview question before it gets asked.

Figure 4: The Defender’s AI-Augmented Vulnerability Workflow

Yes, There Is a Business Angle. It Does Not Change Your Reality.

TechCrunch raised a fair question: Is Anthropic restricting Mythos to protect the internet or to protect Anthropic? The company announced Project Glasswing the same day it disclosed a $30 billion annualized revenue run rate and a massive compute deal with Broadcom. An IPO is reportedly under consideration for October 2026. A government-adjacent cybersecurity initiative with blue-chip partners burnishes that narrative precisely.

OpenAI’s Trusted Access for Cyber serves the same dual purpose. Restricted access creates enterprise lock-in, makes distillation harder, and gives defenders a genuine head start. Strategic self-interest and genuine security value are not mutually exclusive. Both labs are doing both things at the same time.

I do not care about their business models. I care about whether defenders are moving.

AISLE demonstrated empirically that the detection capability exists in models that cost almost nothing to run. The model is not the moat. The system is the moat. The expertise you build, the orchestration you design, the validation pipeline you run, the AI identity governance you enforce, those determine whether you’re ahead of the curve or behind it.

The restricted releases, the partner coalitions, the government briefings, those are interesting industrial policy. They are not relevant to your Monday morning. What is relevant is whether your team has a coding agent running alongside your SAST tool right now. What is relevant is whether your AI agents have scoped identities with enforced authorization boundaries or shared API keys with no audit trail. What is relevant is whether you can answer those five questions.

Key Takeaway: Mythos is the headline. The capability already exists in models you can download today. The model is not the moat. The system is the moat. Build the workflow before the 6-to-18-month window closes, or stop pretending the window matters because you already have what you need to start.

What to do next

Start with the five-step playbook above. Revisit your security program through the CARE framework (Create, Adapt, Run, Evolve) at rockcyber.com to build an adaptive security posture that evolves with the capability curve rather than reacting to it after the fact. The organizations that treat AI-augmented security as a weekly practice, not a quarterly initiative, will define the next generation of this profession.

For a deeper dive into practitioner upskilling paths, red teaming tools, and weekly AI security intelligence, subscribe to RockCyber Musings for the Top 10 AI Security Wrap-Up and focused essays on the issues that matter.

Join the community doing this work. The OWASP Agentic Security Initiative is building the standards and sharing the experiments. The practitioners who contribute to these efforts compound their capability faster than anyone working alone.

👉 Visit RockCyber.com to learn more about how we can help you in your traditional Cybersecurity and AI Security and Governance Journey

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Thanks for reading RockCyber Musings! Subscribe for free to receive new posts and support my work.

Share RockCyber Musings

Weekly Musings Top 10 AI Security Wrapup: Issue 33 April 3-April 9, 2026

Rock Lambros — Fri, 10 Apr 2026 12:50:40 GMT

Two of the three largest AI labs announced restricted-access cybersecurity models on the same day. A supply chain attack that started 10 days ago cost an AI startup its $10 billion contract with Meta. Nineteen new AI laws were signed across America in two weeks. Multiple independent research reports confirmed most enterprises have no idea what their AI agents are doing right now. The dual-use reckoning is no longer a future event. This week it produced products, paused contracts, and named casualties.

The week’s dominant pattern: the industry is admitting, out loud, that its most capable models are too dangerous to ship without restrictions. Meanwhile, the governance infrastructure meant to keep pace with AI deployment is running badly behind. Government employees are using GenAI tools daily at an 82% adoption rate on systems that remain vulnerable to prompt injection attacks documented in 2023. FedRAMP, the federal program enterprise CISOs treat as a security attestation, is operating as what former employees call a rubber stamp. The gap between AI capability and AI governance did not close this week. It widened, with better documentation.

1. Anthropic locks its most powerful model behind a 50-partner gate

On April 7, Anthropic announced Project Glasswing, a controlled-access program giving approximately 50 organizations early access to Claude Mythos Preview (Fortune, TechCrunch). Partner organizations include Amazon Web Services, Apple, Cisco, CrowdStrike, Google, JPMorgan Chase, Microsoft, and Nvidia, plus roughly 40 organizations responsible for critical software infrastructure. Anthropic described Mythos as “by far the most powerful AI model” it has ever created, with exceptional capabilities in autonomous coding and cybersecurity tasks. The company acknowledged the model’s capabilities “could be weaponized by attackers” and stated it has no plans for general availability until new safeguards are established.

Why it matters

This is the first time a major AI lab has built a commercial product strategy explicitly around restricting access due to offensive cyber capability. The precedent matters more than the model.
Every enterprise security team outside the 50-partner cohort is now competing against organizations with months of head start deploying the most capable defensive AI available.
The partner list reads as the critical infrastructure vendor stack. If Mythos finds vulnerabilities before general availability, defenders benefit. If the model leaks before that happens, the calculus reverses.

What to do about it

Assess now whether your organization qualifies for Glasswing access or partnership with one of the 50 current participants. Waiting for general availability puts you behind.
Build your responsible AI deployment policy before your board asks you to justify restricted model use. The framework you create for Mythos applies to every dual-use model that follows.
Read Anthropic’s stated rationale carefully. It functions as a working template for your own internal policies on AI capability gating.

Rock’s Musings

I’ve watched this industry congratulate itself on “responsible AI release” for years without actually restricting access to anything dangerous. Anthropic did something different this week. It built a product designed to stay out of the wrong hands and publicly named the hands it’s trusting. Anthropic made a liability calculation public and called it product strategy.

What I want to know is how they enforce it. Fifty organizations sharing API access, running evals, and passing findings back sounds clean in the press release. In practice, you’re dealing with 50 separate security cultures, 50 different interpretations of “defensive use,” and 50 sets of employees who walk out the door with operational knowledge. The kill switch isn’t in the contract. It’s in the monitoring. I’d love to see the audit framework Anthropic built to go with this, because without it, Project Glasswing is a hope, not a control.

2. OpenAI, Anthropic, and Google share intelligence to stop Chinese model distillation

On April 6-7, Bloomberg and The Japan Times reported that OpenAI, Anthropic, and Google are sharing attack pattern data through the Frontier Model Forum to detect and block adversarial distillation attempts by Chinese AI companies. Three firms were named: DeepSeek, Moonshot AI, and MiniMax. The coordinated effort focuses on detecting when frontier model outputs are being used to train competing models without authorization. The Forum, established in 2023 for safety coordination, now functions as an active competitive intelligence sharing network.

Why it matters

Three competing companies sharing security intelligence without a government mandate represents a structural shift in how the industry protects IP. Watching the next DeepSeek emerge on stolen training signal was apparently less appealing than coordinating with rivals.
This sets a precedent for industry-led AI IP enforcement that regulators haven’t built yet. Policymakers will either ratify or complicate what the Forum is quietly doing.
For enterprise buyers, this coordination signals frontier model providers now treat IP integrity as shared infrastructure, which is reassuring until you realize your own model training pipelines may need similar monitoring.

What to do about it

Audit your AI vendor contracts for provisions covering how your organization’s data and API interactions are used. The Forum’s distillation concerns apply downstream to enterprise deployments.
Ask vendors directly what controls they have in place to detect adversarial use of their model outputs. Most aren’t ready for the question.
Watch the Frontier Model Forum’s governance structure. Three companies sharing threat intelligence today is a small coalition. In two years it becomes the de facto standard for AI security coordination.

Rock’s Musings

Three direct competitors sharing security intelligence tells you the distillation problem is worse than any of them want to admit publicly. DeepSeek’s emergence was the wake-up call: model training shortcuts were further along than anyone expected. The Forum is doing what the industry always resists, treating a shared problem as a shared problem.

What nobody says out loud is that adversarial distillation runs through enterprise deployments too. When your employees push 10,000 API calls through GPT-5.3 or Claude Mythos to build an internal tool, those outputs sit somewhere. The providers are focused on Chinese actors right now. The same technique scales to every bad actor with API access. Build that assumption into your threat model before someone builds a business around exploiting it.

3. Meta freezes its $10 billion Mercor contract after the LiteLLM supply chain breach

On April 4, The Next Web and Fortune confirmed Meta paused its contract with Mercor, a $10 billion AI training data company whose customers include Anthropic, OpenAI, and Meta (The Next Web, Fortune). The pause followed a March 27 attack in which threat group TeamPCP published malicious PyPI packages for LiteLLM, a widely used open-source AI gateway library, after stealing a maintainer credential through an earlier Trivy supply chain compromise. The tainted packages were live for roughly 40 minutes. Mercor confirmed it was among “thousands” of affected organizations. Lapsus$ claimed responsibility and possession of 4TB of Mercor data including source code, databases, and VPN credentials. Google Mandiant reported over 1,000 impacted SaaS environments at RSAC 2026.

Why it matters

A 40-minute PyPI window produced a paused $10 billion contract. That ratio of exposure time to business consequence should recalibrate how you think about open-source AI supply chain risk.
Meta’s pause affects AI training pipelines, not software. Training data provenance, labeling protocols, and selection criteria worth billions in R&D may now be in hostile hands.
TeamPCP’s chained attack, Trivy to LiteLLM, demonstrates adversaries are mapping AI infrastructure dependency graphs specifically to maximize downstream blast radius.

What to do about it

Inventory open-source AI libraries in your production environment immediately. LiteLLM and similar tools are in most ML and security pipelines.
Require software bills of materials for AI infrastructure. You need to know which versions of which AI libraries are running in production, with provenance attestation for critical packages.
Brief your CISO and CTO on the chained supply chain model. TeamPCP demonstrated that AI library ecosystems are attack surfaces with compounding impact.

Rock’s Musings

Forty minutes. That’s how long it takes to turn a credential theft into a paused $10 billion contract. The security community debates sophisticated nation-state tactics while basic supply chain hygiene stays on the backlog. LiteLLM is in everything. If you’re running AI in production and can’t tell me which version is deployed or whether it was compromised, you have a problem you haven’t measured yet.

The Meta piece is what keeps me up. Their AI training secrets, data selection criteria, and labeling methodology in hostile hands gives a competitor a two-year shortcut on billions in R&D. A breach in the traditional sense costs you records. This one costs you competitive advantage. Your AI supply chain carries security risk and strategic risk simultaneously. Start treating both.

4. Keeper Security: 76% of AI agents operate outside privileged access policies

Keeper Security released a survey of 109 cybersecurity professionals at RSAC 2026 on April 7, revealing that 46% of organizations have granted AI-powered tools access to critical systems and data, with 76% of those identities ungoverned under privileged access management policies (Keeper Security, BetaNews). Only 28% report full visibility into non-human identities across cloud, on-premises, and SaaS environments. Over 40% experienced a security incident involving machine credentials or non-human identities in the past year. Another 32% couldn’t confirm whether they’d been hit.

Why it matters

AI agents operate as de facto privileged users in most enterprise environments, without the monitoring, credential rotation, or access controls applied to humans with equivalent permissions.
The 32% who can’t confirm NHI-related incidents are running blind. An agent with write access to email, code repositories, and collaboration tools that you can’t monitor is an insider threat waiting for attribution.
Traditional PAM tools were built for human users and won’t stretch to cover autonomous agents at scale without architectural change.

What to do about it

Extend your privileged access management program explicitly to cover AI agents, service accounts, and API keys. Treat an AI agent with production database access the same way you treat a privileged database administrator.
Mandate credential rotation and access logging for every non-human identity. If you can’t name every agent with write access to email or code right now, that gap is your first priority.
Ask your PAM vendor this week whether their product covers non-human identities natively. Many don’t, and most won’t tell you that unprompted.

Rock’s Musings

Here’s the pattern showing up in research right now: organizations rush to deploy AI agents, grant them sweeping access to prove the use case, then spend the next 18 months trying to reconstruct what those agents touched. That’s the same mistake we made with cloud infrastructure in 2013. We provisioned everything with admin keys because it was faster and cleaned it up later. “Later” is still ongoing for most enterprises.

The 32% who aren’t sure about NHI incidents are the most honest number in the Keeper report. Detecting agent-related incidents requires logging you likely haven’t enabled, correlation rules you haven’t built, and a behavioral baseline you haven’t established. Before you deploy the next AI agent, ask your team to demonstrate they can detect one behaving badly. If they can’t show you in a live demo, slow down.

5. Salt Security: nearly half of enterprises are blind to their AI agents’ API traffic

Salt Security published its 1H 2026 State of AI and API Security Report on April 8, surveying over 300 security leaders (Salt Security). Key findings: 48.9% of organizations cannot monitor machine-to-machine traffic from autonomous agents, and 48.3% cannot distinguish legitimate AI agents from malicious bots in their API traffic. Only 23.5% of respondents rate their existing tools as “very effective” against AI-driven attacks. An additional 47% have delayed production releases because of security concerns about APIs exposed to autonomous systems, meaning the gap is surfacing in shipping decisions, not survey responses alone.

Why it matters

Your API gateway is your AI agent’s operational layer. No visibility into that traffic means no indication of whether your agents are working as designed, being abused, or actively exfiltrating data.
The bot detection problem is concrete. Attackers are masquerading autonomous tools as legitimate agent traffic. Without behavioral baselines for your own agents, there’s no way to tell the difference.
Legacy web application firewalls were built for human browsing patterns. AI agent traffic looks nothing like that, making existing perimeter controls largely irrelevant to this threat class.

What to do about it

Inventory every API your AI agents call in production. Map expected request patterns, volumes, and data flows to establish a behavioral baseline for detecting deviations.
Evaluate whether your API security tooling supports non-human identity traffic analysis. If the vendor demo focuses on OWASP Top 10 for human users, it’s the wrong tool for this problem.
Build rate limiting and anomaly detection specifically for agent API traffic. An agent calling APIs at 658 times normal frequency because of a malicious MCP server injection is a documented attack pattern from this week’s research.

Rock’s Musings

Half your enterprise has zero visibility into what their AI agents are doing on the wire. You spent years building SOC capabilities, deploying SIEMs, tuning correlation rules, integrating threat intelligence feeds. Then you deployed AI agents that operate through an entirely different channel that bypasses all of it. The old security stack can’t see the new threat surface.

The market hasn’t caught up. Most API security vendors will confidently tell you their product handles agentic traffic. Ask them to demo detection of an agent that’s been redirected by a malicious MCP server. Watch the room go quiet. Ongoing analysis of where the real gaps are lives at RockCyber Musings. The gap between “we have API security” and “we can detect compromised agent behavior” is wider than most boards realize.

6. RSAC 2026: attackers move laterally in 22 seconds while defenders plan in minutes

At RSA Conference 2026 on April 3, Google Mandiant’s Consulting CTO Charles Carmakal told reporters that the median time from initial access to secondary lateral movement has dropped from 8 hours to 22 seconds, making human-only incident response structurally impossible at those speeds (SiliconAngle, Dark Reading). IBM’s Mark Hughes called post-quantum migration an immediate operational priority, noting three finalized NIST post-quantum encryption standards are available now with adoption remaining low. The conference’s dominant theme was agentic AI’s dual role: attackers using autonomous tools to accelerate campaigns while defenders attempt to use the same tools to keep pace.

Why it matters

A 22-second lateral movement window eliminates the human-in-the-loop response model. Your SOC procedures assume minutes. Your threat actors operate in seconds. That gap is where incidents become breaches.
Post-quantum urgency moved from theoretical concern to present operational priority at RSAC. Three finalized NIST standards exist today. Any organization with long-lived encrypted data needs a migration timeline now.
The agentic AI identity theme at RSAC confirmed the industry has aligned around non-human identities as the defining security challenge of the next 24 months.

What to do about it

Test your incident response playbooks against a 22-second lateral movement scenario. If your playbook assumes human review before containment actions, it needs a machine-speed trigger layer.
Publish a post-quantum migration roadmap internally before your next board meeting. “We’re monitoring it” is no longer a defensible position when finalized standards exist.
Pull one CISO peer debrief from RSAC this month. Hallway intelligence from that conference is often more actionable than the keynote content.

Rock’s Musings

Twenty-two seconds. The number got nodding agreement in San Francisco, then people walked into vendor booths and looked at detection tools that still alert in minutes. The gap between attacker speed and defender speed is the core problem of modern security, measured in the wrong units for years. When the unit is seconds, your SIEM alert queue is not a security control. It’s a log archive with a UI.

The quantum conversation shifted at RSAC from awareness to urgency, and the shift is warranted. “Harvest now, decrypt later” is a real operation: adversaries collecting encrypted traffic today and storing it for the day quantum breaks the key. If you have long-lived secrets, your CTO’s timeline estimate is probably too generous. RockCyber has been running post-quantum migration frameworks for clients since last year. Most enterprise conversations are still stuck on the awareness slide.

7. Nineteen AI laws signed in two weeks: chatbot liability, healthcare disclosure, private right of action

On April 6, PluralPolicy reported that 19 new AI laws were signed in the preceding two weeks, bringing the 2026 total to 25 enacted laws with 27 additional bills having cleared both legislative chambers (PluralPolicy, Troutman Pepper Locke). Tennessee, Oregon, and Idaho signed chatbot regulation bills during the week of April 3-9. Oregon’s law includes a private right of action with statutory damages. Utah signed 8 bills covering AI literacy requirements, classroom restrictions, deepfake intimate image bans, and insurance transparency mandates. Massachusetts, Rhode Island, and South Carolina moved healthcare AI bills out of committee, with Rhode Island’s version requiring healthcare providers to inform patients when AI is involved in their care.

Why it matters

Chatbot liability laws with private right of action create litigation exposure your legal team needs to model before the next customer-facing AI deployment goes live. Oregon’s law is already in effect.
The geographic spread creates a patchwork compliance problem with no federal preemption in sight. Your AI product team is shipping into 50 different state frameworks that change weekly.
Healthcare AI disclosure requirements set a transparency floor that buyers, patients, and regulators will increasingly apply across other sectors.

What to do about it

Map your current AI deployments against emerging state chatbot disclosure and liability requirements immediately. Oregon’s private right of action is live and applies now.
Brief your GC and CMO together. AI product launches carry legal exposure marketing teams don’t typically model, and chatbot liability surfaces in headlines, not just settlement columns.
Build a state AI law tracking function into your compliance program. Static annual reviews don’t work when the law count moves by double digits in two weeks.

Rock’s Musings

When I tell clients that AI regulation is coming, I usually get a polite nod and a “we’ll handle it when we have to.” Twenty-five enacted laws in 2026 with the year barely three months old. Oregon telling enterprises their customers can sue with statutory damages when a chatbot fails to identify itself. Regulation isn’t coming. It’s been here for two weeks.

The private right of action piece is what executives aren’t tracking closely enough. FTC enforcement requires agency resources and case selection. Private litigants require only a lawyer and a grievance. If your customer-facing AI system fails to disclose its nature and a user in Oregon has a bad experience, you have a plaintiff class with no regulatory gatekeeping standing between that plaintiff and your legal team. Build that into your AI deployment approval checklist before the next product launch.

8. OpenAI readies its own restricted cybersecurity model the same day as Anthropic

On April 9, Axios broke the news that OpenAI is finalizing a cybersecurity product for restricted release through its Trusted Access for Cyber pilot program (Axios, Security Boulevard). The model, built on GPT-5.3-Codex, is described by OpenAI as “our most cyber-capable frontier reasoning model to date.” OpenAI committed $10 million in API credits to pilot participants at the February program launch. The Axios scoop published the same day as broad coverage of Anthropic’s Project Glasswing, with multiple security reporters noting two competing labs had each moved to restrict their most capable cyber models on the same day.

Why it matters

Two frontier labs restricted their most capable cybersecurity models on the same day. Whether coordinated or coincidental, the signal is identical: the industry has reached a shared threshold assessment of offensive AI capability.
The OpenAI pilot started in February. Participants are already months ahead on advanced defensive AI adoption. Enterprise buyers outside the program are behind.
GPT-5.3-Codex positioned as an autonomous vulnerability researcher represents a qualitative shift in what AI security tools can do. Your red team needs exposure to this capability level before attackers deploy it against you.

What to do about it

Apply to OpenAI’s Trusted Access for Cyber program today. Not applying guarantees exclusion.
Treat the simultaneous OpenAI and Anthropic announcements as an inflection point in your AI security roadmap. Model access strategy is now a CISO decision, not a procurement question.
Start a conversation with your red team about what AI-assisted penetration testing looks like inside your environment. The offensive tools are being built. Defensive capabilities need to keep pace.

Rock’s Musings

Two companies, same day, both announcing restricted access to their most capable cyber models. In thirty years I’ve never seen two direct competitors make functionally identical risk disclosures simultaneously without prior coordination. Either the Frontier Model Forum conversation from earlier in the week triggered parallel announcements, or both teams hit the same risk threshold independently. Neither explanation is entirely comforting, because it means the models in question worry the people who built them.

Here’s what this week’s announcements tell me: the dual-use problem is no longer an abstract ethics debate. It’s a product management constraint. The labs are building features that concern them enough to restrict access. That’s progress, because it means honest risk assessment is making it into the room where launch decisions happen. Build that same instinct into your own AI deployment process.

9. Government GenAI hits 82% daily adoption with prompt injection attacks still unaddressed

On April 9, Help Net Security published Center for Internet Security analysis showing 82% of state and territorial government employees now use GenAI tools daily, up from 53% the year prior (Help Net Security, Center for Internet Security). CIS cited prompt injection as the primary unaddressed vulnerability in that deployment base, distinguishing two attack categories: direct injection where users attempt to bypass safety guidelines, and indirect injection where attackers embed malicious instructions in external content such as documents, webpages, or emails the agent processes. Incidents cited include a code assistant that transmitted AWS API keys to an external server after processing hidden instructions, and the GeminiJack attack that exploited enterprise data sources to trigger data exfiltration.

Why it matters

Government employees are generating official outputs using AI that remains manipulable through documents those systems process. A single malicious PDF submitted through a government portal can redirect an agent’s behavior.
Deployment outpaced security controls by a wide margin. State and local government security teams were not staffed or funded to keep pace with that adoption curve.
Prompt injection in government contexts is a policy integrity issue, not a privacy issue. An AI assistant that processes manipulated input and produces a compromised output informing a real government decision is a governance failure with material real-world consequences.

What to do about it

Require any GenAI deployment processing external documents, emails, or web content to implement input sanitization and instruction-boundary enforcement. Your AI shouldn’t follow commands embedded in documents it summarizes.
Test your enterprise AI deployments against indirect prompt injection scenarios before the next rollout. The attack is not sophisticated. The absence of testing is the problem.
Report AI usage rates alongside security control maturity to your board. An 82% adoption rate combined with 7% real-time governance effectiveness, the number from Cybersecurity Insiders research, belongs on a risk register.

Rock’s Musings

A government employee pastes a document into an AI assistant, and that document silently redirects the assistant to send AWS credentials to an external server. An attack category from 2023 that government AI deployments in 2026 still haven’t addressed, running at 82% daily adoption. The attack surface grew to near-universal usage while the defense posture stayed at “we have an acceptable use policy.”

Government IT security teams are underfunded, understaffed, and now responsible for securing AI deployments at a scale they didn’t request and weren’t resourced for. Before the next state AI bill gets signed requiring healthcare providers to disclose AI use to patients, lawmakers should ask how they’re funding the security infrastructure to keep those same deployments from being turned against the citizens they’re meant to serve.

10. OpenAI’s national security lead says humans must stay in the loop for defense decisions

At a Special Competitive Studies Project conference on April 9, Sasha Baker, OpenAI’s head of national security policy, stated that defense personnel need a “workforce transformation” to apply “appropriate human judgment” when AI informs national security operations (Nextgov). Baker noted no current large language model is foolproof, and incorrect AI-driven decisions in defense contexts carry “much greater” consequences. She tied the statement to OpenAI’s pre-deployment safety reviews and the controlled rollout of models including GPT-5.3-Codex, the same model featured in the restricted cybersecurity announcement reported the same day.

Why it matters

OpenAI’s national security lead publicly endorsed human-in-the-loop for defense decisions in the same week the company announced its most capable autonomous cyber model. That tension deserves examination in your own governance policies.
“Workforce transformation” is a budget line, not a strategy. Organizations deploying AI in sensitive contexts need explicit training, decision authority maps, and accountability structures for human oversight.
Baker’s statement creates a public record regulators and litigants can reference when evaluating whether organizations maintained adequate human oversight in AI-assisted decisions.

What to do about it

Map every AI-assisted decision in your organization where error consequences are asymmetric. Finance, safety, hiring, and security operations are the obvious categories. Build human override requirements into the workflow before the system goes live.
Assess your “workforce transformation” budget. Deploying AI in high-stakes contexts without investing in training humans to supervise it transfers the liability Baker is explicitly naming.
Document your human oversight model for AI decisions affecting personnel, customers, or critical systems. When the inevitable incident arrives, regulators will ask whether oversight was designed in from the start or retrofitted after the fact.

Rock’s Musings

OpenAI hired a national security lead. That person is publicly calling for human judgment to override AI in defense decisions. In the same week OpenAI announced a restricted-access autonomous hacking model. If that pairing doesn’t communicate the gap between capability development speed and governance readiness, nothing will.

I’ve run security operations for thirty years, and the hardest thing to get organizations to do is slow down, especially when a competitor is moving fast. Baker’s statement is a reminder that speed without oversight produces accountability gaps that become congressional hearings. The enterprises that build human oversight structures now are the ones that avoid spending 2027 explaining to a federal committee why their AI made a decision that hurt someone.

The One Thing You Won’t Hear About But You Need To

FedRAMP is a rubber stamp and the AI vendors deploying through it know it

On April 6, ProPublica published a detailed investigation examining three cautionary tales from the federal government’s rush to AI adoption (ProPublica). The most damaging finding: FedRAMP, the federal security authorization program enterprise CISOs treat as a validation signal for cloud products, is now described by former employees as “little more than a rubber stamp.” The program operates with minimal staff, overwhelmed by vendor volume. Third-party assessors who evaluate cloud providers for FedRAMP authorization are paid by the companies they assess. FedRAMP established a confidential back channel for assessors to raise concerns they wouldn’t document in official reports. Microsoft used timeline pressure and volume to effectively compress the GCC High approval process.

Why it matters

FedRAMP authorization signals to enterprise buyers that a product meets federal security standards. A degraded signal means every procurement decision relying on it as a security input draws from a compromised source.
The paid-by-vendor assessor model creates structural incentives to under-report findings. The unofficial back channel means the official report is not the complete picture.
Federal AI deployment at 82% daily government usage rates, built on FedRAMP authorizations produced under these conditions, is a systemic governance failure, not an isolated product risk.

What to do about it

Stop treating FedRAMP authorization as a complete security evaluation for AI products. Use it as a starting point, then conduct your own targeted assessment focused on AI-specific risks the framework wasn’t designed to evaluate.
Ask AI vendors directly whether their FedRAMP assessment surfaced any findings submitted through the confidential back channel. A vendor that can’t answer hasn’t done adequate diligence on their own authorization.
Engage your government affairs function to advocate for FedRAMP reform as AI deployment scales. The current model was built for traditional SaaS and is not equipped to evaluate the risk surface of autonomous AI systems.

Rock’s Musings

Here’s the story nobody put on a slide at RSAC. The federal government is deploying AI at record speed through a security authorization program that former insiders describe as a rubber stamp. The assessors evaluating these vendors get paid by those vendors. The uncomfortable findings go into a back channel that never reaches the official record. Those authorizations then get used by enterprise security teams as proxies for security validation.

I’ve said for years that governance certifications are often theater. FedRAMP was supposed to be one of the more rigorous ones. The ProPublica investigation suggests the volume and complexity of AI products broke the model. If you’re a CISO using FedRAMP status as a risk reduction input in AI procurement decisions, you’re relying on a control that may not be working as designed. That’s the kind of hidden assumption that converts an undetected vulnerability into a breach narrative. Read the ProPublica piece. Then recalibrate what “government certified” means for your program.

👉 For ongoing analysis of agentic AI governance frameworks, the conversation continues at RockCyber Musings.

👉 Visit RockCyber.com to learn more about how we can help with your traditional Cybersecurity and AI Security and Governance journey.

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Share RockCyber Musings

References

Axios. (2026, April 9). Scoop: OpenAI plans new product for cybersecurity use. https://www.axios.com/2026/04/09/openai-new-model-cyber-mythos-anthopic

BetaNews. (2026, April 7). New report highlights critical gaps in securing AI agents and non-human IDs. https://betanews.com/article/new-report-highlights-critical-gaps-in-securing-ai-agents-and-non-human-ids/

Bloomberg. (2026, April 6). OpenAI, Anthropic, Google unite to combat model copying in China. https://www.bloomberg.com/news/articles/2026-04-06/openai-anthropic-google-unite-to-combat-model-copying-in-china

Center for Internet Security. (2026). Prompt injection tags along as GenAI enters daily government use. Referenced via Help Net Security, April 9, 2026. https://www.helpnetsecurity.com/2026/04/09/genai-prompt-injection-enterprise-data-risk/

Dark Reading. (2026, April 3). RSAC 2026: How AI is reshaping cybersecurity faster than ever. https://www.darkreading.com/cybersecurity-operations/rsac-2026-how-ai-is-reshaping-cybersecurity-faster-than-ever

Fortune. (2026, April 2). Mercor, a $10 billion AI startup, confirms it was caught up in a major security incident. https://fortune.com/2026/04/02/mercor-ai-startup-security-incident-10-billion/

Fortune. (2026, April 7). Anthropic is giving some firms early access to Claude Mythos to bolster cybersecurity defenses. https://fortune.com/2026/04/07/anthropic-claude-mythos-model-project-glasswing-cybersecurity/

Hackread. (2026, April 7). AI agents and non-human identities creating critical security gaps, report. https://hackread.com/ai-agents-non-human-identities-security-gaps/

Help Net Security. (2026, April 9). Prompt injection tags along as GenAI enters daily government use. https://www.helpnetsecurity.com/2026/04/09/genai-prompt-injection-enterprise-data-risk/

Japan Times. (2026, April 7). OpenAI, Anthropic and Google cooperate to fend off Chinese bids to clone models. https://www.japantimes.co.jp/business/2026/04/07/tech/openai-anthropic-google-china-copy/

Keeper Security. (2026, April 7). Keeper Security research exposes critical gaps in securing AI agents, machines and non-human identities [Press release]. https://www.prnewswire.com/news-releases/keeper-security-research-exposes-critical-gaps-in-securing-ai-agents-machines-and-non-human-identities-302735305.html

Nextgov/FCW. (2026, April 9). OpenAI national security lead endorses ‘appropriate human judgment’ in AI. https://www.nextgov.com/artificial-intelligence/2026/04/openai-national-security-lead-endorses-appropriate-human-judgment-ai/412738/

PluralPolicy. (2026, April 6). AI governance watch: Nineteen new AI bills passed into law. https://pluralpolicy.com/blog/the-ai-governance-watch-april-2026-nineteen-new-ai-bills-passed-into-law/

ProPublica. (2026, April 6). As the federal government rushes toward AI, here are three cautionary tales. https://www.propublica.org/article/federal-government-ai-cautionary-tales

Salt Security. (2026, April 8). The era of agentic security is here: Key findings from the 1H 2026 State of AI and API Security Report. https://salt.security/blog/the-era-of-agentic-security-is-here-key-findings-from-the-1h-2026-state-of-ai-and-api-security-report

Security Boulevard. (2026, April 9). OpenAI readies rollout of new cyber model as industry shifts to defense. https://securityboulevard.com/2026/04/openai-readies-rollout-of-new-cyber-model-as-industry-shifts-to-defense/

SiliconAngle. (2026, April 3). Three insights on AI attack from theCUBE at RSAC 2026. https://siliconangle.com/2026/04/03/three-insights-ai-attack-thecube-rsac-2026-rsac26/

TechCrunch. (2026, April 7). Anthropic debuts preview of powerful new AI model Mythos in new cybersecurity initiative. https://techcrunch.com/2026/04/07/anthropic-mythos-ai-model-preview-security/

The Next Web. (2026, April 4). Meta freezes AI data work after breach puts training secrets at risk. https://thenextweb.com/news/meta-mercor-breach-ai-training-secrets-risk

The Register. (2026, April 2). Mercor says it was ‘one of thousands’ hit in LiteLLM attack. https://www.theregister.com/2026/04/02/mercor_supply_chain_attack/

Troutman Pepper Locke. (2026, April 6). Proposed state AI law update: April 6, 2026. https://www.troutmanprivacy.com/2026/04/proposed-state-ai-law-update-april-6-2026/

Agent Supply Chain Attacks: Your Scanner Already Switched Sides

Rock Lambros — Tue, 07 Apr 2026 12:50:17 GMT

Agent supply chain risk stopped being theoretical in March 2026. Over twelve days, a single threat actor group turned Trivy, KICS, LiteLLM, and Axios into credential-harvesting weapons, cascading across five distribution ecosystems and producing 300 GB of stolen secrets. The campaign started with an AI-powered bot. It spread through a self-propagating worm with blockchain-based command and control. Your vulnerability scanner, the tool you trusted to protect your pipeline, was the entry point. Now picture that same attack chain hitting an autonomous agent that installs tools, updates dependencies, and executes third-party skills without asking you first.

Figure 1: The TeamPCP Cascade: 12 Days, 5 Ecosystems

Your Vulnerability Scanner Was the Vulnerability

On March 19, 2026, TeamPCP used stolen credentials to force-push 76 of 77 version tags in the aquasecurity/trivy-action repository to malicious commits. The payload ran before the legitimate scan. Pipelines completed normally with green checkmarks across the board. Meanwhile, the malware dumped Runner.Worker process memory and exfiltrated cloud credentials, SSH keys, Kubernetes tokens, npm tokens, and Docker registry credentials to attacker-controlled infrastructure.

Trivy is a vulnerability scanner. Organizations run it in their CI/CD pipelines to detect supply chain attacks. When TeamPCP compromised it, the tool designed to find compromised dependencies became the compromised dependency. The irony is structural, not incidental. Security tools make ideal targets for supply chain attacks because they already have broad read access to the environments they scan. They touch secrets by design.

KICS, Checkmarx’s infrastructure-as-code scanner, fell the same way four days later. All 35 version tags hijacked. Same credential-stealing payload, different typosquat domain. Then LiteLLM, the AI gateway library that holds API keys for every LLM provider an organization uses, with 95 million monthly downloads and presence in 36% of cloud environments according to Wiz Research. TeamPCP published malicious versions to PyPI using credentials stolen from LiteLLM’s own CI/CD pipeline, which ran Trivy as part of its build process.

Each victim funded the next attack. The chain started with a single incomplete credential rotation at Aqua Security on March 1. TeamPCP retained access through tokens that survived the rotation. Every compromise from March 19 forward exploited credentials harvested from the previous target. Partial containment, as Aqua Security’s own post-incident analysis acknowledged, equals no containment.

By the time Axios was compromised on March 31 (100+ million weekly npm downloads, attributed by Microsoft Threat Intelligence to North Korean state actor Sapphire Sleet), the credential ecosystem was so thoroughly disrupted that Mandiant CTO Charles Carmakal warned of “hundreds of thousands of stolen credentials” and “a variety of actors with varied motivations.” The FBI confirmed TeamPCP was working through approximately 300 GB of compressed stolen credentials in collaboration with the LAPSUS$ extortion group.

The AI Bot That Started Everything

Most coverage focuses on the credential cascade. The more significant development is the one that started it.

On February 28, 2026, an autonomous bot calling itself hackerbot-claw exploited a misconfigured pull_request_target workflow in Trivy’s GitHub Actions to steal a Personal Access Token with write access to all 33+ Aqua Security repositories. The bot’s GitHub profile described itself as “an autonomous security research agent powered by claude-opus-4-5.” It carried a vulnerability pattern index with 9 attack classes and 47 sub-patterns. It targeted seven major repositories belonging to Microsoft, DataDog, CNCF, and Aqua Security over one week, achieving remote code execution in at least four.

This was not a script running pre-written exploits, as hackerbot-claw adapted its approach to each target’s specific workflow configuration. When one technique failed, it pivoted. Against the ambient-code/platform repository, it attempted prompt injection by replacing the project’s CLAUDE.md file with instructions designed to trick Claude Code into committing unauthorized changes. Claude Code detected the attack and refused, classifying it as a supply chain attack via poisoned project-level instructions.

That detail matters. An AI agent attacked. An AI agent defended. The outcome depended on configuration quality, not human vigilance. This is the arms race in miniature, and it already happened at production scale against real infrastructure.

StepSecurity, Repello AI, and Boost Security Labs independently documented the campaign. Pillar Security’s assessment identified the core gap: “zero visibility into AI coding agents running on developer machines, and no runtime controls when those agents are weaponized.”

Figure 2: Traditional vs. Agent Supply Chain Attack Surface

Why Agent Supply Chain Risk Breaks Your Existing Controls

Every supply chain control you have assumes a human is looking. Dependency scanning assumes someone reviews the output. Code review assumes someone reads the diff. SBOM generation assumes someone checks the inventory. SAST and DAST assume someone triages the findings.

Agents don’t look. They execute.

When a developer installs a package, they see a version number, check a changelog, run tests. When an agent installs a tool or skill, it follows instructions. If the MCP server definition says “install this plugin,” the agent installs it. If the skill marketplace listing looks legitimate, the agent trusts it. The human-in-the-loop that traditional supply chain security depends on evaporates.

Research published the same week as the TeamPCP campaign quantifies the gap. The OpenClaw vulnerability taxonomy (A Systematic Taxonomy of Security Vulnerabilities in the OpenClaw AI Agent Framework, arXiv 2603.27517) catalogued 190 security advisories in the OpenClaw AI agent framework, organized by architectural layer and adversarial technique. Three findings stand out. First, three moderate-to-high-severity advisories compose into a complete unauthenticated remote code execution path from an LLM tool call to the host process. Second, the primary command-filtering mechanism relies on a closed-world assumption that shell commands are identifiable through lexical parsing, an assumption broken by basic techniques like line continuation and option abbreviation. Third, and most relevant here, a malicious skill distributed through the plugin channel executed a two-stage dropper within the LLM context, bypassing the entire execution pipeline. The skill distribution surface has no runtime policy enforcement.

SafeClaw-R (arXiv 2603.28807) found that 36.4% of OpenClaw’s built-in skills carry high or critical risk. That number covers the built-in skills, before any third-party marketplace plugins enter the picture. Across ClawHub, the agent skill marketplace, Antiy CERT confirmed 1,184 malicious skills, roughly one in five packages. The Repello AI team traced 335 of those to a single coordinated campaign called ClawHavoc.

The March 2026 campaign showed what happens when the consumer of a compromised dependency is a CI/CD pipeline: automated, monitored by humans on a lag, with credentials accessible at runtime. The agent version removes the monitoring layer entirely. An agent that installs a malicious MCP server or skill executes the payload as part of its normal workflow, with whatever permissions the agent has been granted, at machine speed.

CanisterWorm Showed What Autonomous Propagation Looks Like

If the hackerbot-claw precedent shows how AI agents attack, CanisterWorm shows how compromised dependencies spread once humans are removed from the loop.

CanisterWorm emerged on March 20, deployed using npm tokens stolen from Trivy-compromised pipelines. It was a self-propagating worm. Given a stolen token, it enumerated every package the token provided access to, bumped the patch version, injected its payload, and republished. Twenty-eight packages were compromised in under sixty seconds. The worm infected 64+ packages across multiple npm scopes. Endor Labs assessed that TeamPCP had “automated credential-to-compromise tooling” capable of turning a single stolen token into exponential propagation.

The command-and-control infrastructure used an Internet Computer Protocol blockchain canister, a tamperproof smart contract with no single takedown point. The operator could rotate payloads on infected machines without republishing any package. Security researchers confirmed this as the first publicly documented npm worm to use blockchain-based C2. The kill switch? If the canister returned a YouTube URL, the backdoor skipped execution. At the time of discovery, it was returning a Rick Roll. The infrastructure was live, tested, and ready. The payload was dormant by choice.

CanisterWorm targeted human-operated CI/CD pipelines. Translate that propagation model to an agent ecosystem where tools install other tools, agents delegate to sub-agents, and MCP servers chain calls across services. The propagation surface expands from “every package a stolen token provides access to” to “every tool, skill, and service the compromised agent is authorized to reach.” The Model Context Protocol, now under the Linux Foundation’s Agentic AI Foundation after Anthropic donated it in December 2025, is becoming the standard for agent-to-tool communication. Trend Micro found 492 MCP servers exposed to the internet with zero authentication. A separate supply chain attack involved a package masquerading as a legitimate Postmark MCP server that silently BCC’d every outgoing email to the attackers. The CoSAI whitepaper on MCP security identified 12 core threat categories spanning nearly 40 distinct threats. The MCP specification itself uses SHOULD rather than MUST for human-in-the-loop requirements. That word choice tells you everything about where the standard stands on constraining agent autonomy.

Figure 3: Agent Supply Chain Risk: By the Numbers

What This Means for You

The governance gap between where agent supply chain risk is and where your controls are will take years to close. Microsoft released the Agent Governance Toolkit on April 2, 2026, addressing OWASP’s Agentic AI Top 10 with features like Ed25519 plugin signing and MCP security gateways. The toolkit is two days old and unvalidated in production. SafeClaw-R achieved 95.2% accuracy in controlled tests. That 4.8% gap matters at enterprise scale.

You don’t have years. Here’s what you have right now.

Pin everything to immutable references. Version tags are pointers, not contracts. The March 2026 campaign proved this at scale across GitHub Actions, Docker Hub, and npm. Pin GitHub Actions to full commit SHAs. Pin container images to digests. Pin PyPI packages to exact versions with hash verification. Floating tags and unpinned dependencies are the entry point for every attack in this chain.

Treat your security tools as attack surface. Trivy, KICS, and every other scanner in your pipeline runs with privileged access to secrets by design. Apply the same scrutiny to your security tooling that you apply to production dependencies. Monitor for unexpected behavior from tools that should be predictable.

Audit your agent tool pipelines. If your organization deploys AI agents with access to MCP servers, skill marketplaces, or plugin registries, inventory every tool your agents use. Verify provenance. Enforce allow-lists. The ClawHavoc campaign showed that 20% of a major agent marketplace was compromised. Your agents are pulling from these registries right now.

Make credential rotation atomic. The entire TeamPCP cascade traces back to one failure: Aqua Security’s non-atomic rotation on March 1. When you respond to a supply chain incident, revoke all credentials simultaneously before issuing replacements. Partial rotation is an invitation for round two.

Plan for agent-specific incident response. If a tool or skill consumed by your agents is compromised, the blast radius includes everything those agents are authorized to access. Your current incident response playbook assumes a human in the response loop. Write the agent-specific version before you need it.

Key Takeaway: The March 2026 supply chain campaign compromised your scanners, your AI gateway, and your HTTP client in twelve days. The same attack pattern targeting autonomous agents will move faster, spread further, and leave fewer traces. Your supply chain controls were built for a world where a human reviewed every dependency. That world is ending.

What to do next

The gap between traditional supply chain security and agent supply chain security is the defining governance challenge of 2026. If you’re a CISO or security architect, the question isn’t whether your organization uses AI agents with third-party tools. The question is whether you know which tools, with what permissions, under whose authority.

Start with visibility. You don’t control what you haven’t inventoried. For a deeper framework on operationalizing emerging security challenges, The CISO Evolution walks through how security leaders adapt their programs when the threat model shifts underneath them.

More on agent security, supply chain governance, and the practitioner’s view of AI risk at RockCyber Musings.

👉 Visit RockCyber.com to learn more about how we can help you in your traditional Cybersecurity and AI Security and Governance Journey

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Thanks for reading RockCyber Musings! Subscribe for free to receive new posts and support my work.

Share RockCyber Musings

Reasoning Theater: Why Chain-of-Thought Monitoring Fails Your Agentic AI

Rock Lambros — Tue, 07 Apr 2026 12:50:15 GMT

If your agentic AI safety strategy depends on reading the model’s chain of thought, you’re listening to a rehearsed speech and calling it a confession. A March 2025 paper from Goodfire AI and Harvard University shows that reasoning models often commit to their final answer within the first tokens of “thinking,” then generate hundreds of additional tokens to perform deliberation they’ve already completed. For every security leader, product owner, and governance committee treating chain-of-thought monitoring as an auditable safety control, these findings demand an immediate reassessment.

The Research That Changes the Conversation

The paper, titled “Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought,” tested two frontier reasoning models (DeepSeek-R1 671B and GPT-OSS 120B) using three methods to determine when models commit to their final answer during a chain-of-thought trace.

The first method trains lightweight attention probes on model activations to predict the final answer at any point during reasoning. The second forces the model to answer early by truncating its reasoning and demanding a response. The third uses an external LLM (Gemini 2.5 Flash) as a CoT monitor, the same approach many vendors now sell as an AI safety feature.

The results expose a gap that should reshape every AI governance conversation. On MMLU-Redux, a benchmark heavy on recall-based questions, attention probes decoded the correct final answer with high accuracy from the earliest reasoning tokens. The CoT monitor couldn’t identify the model’s answer until much later in the sequence. The performativity rate, measuring the gap between internal confidence and external verbalization, hit 0.417 on MMLU for DeepSeek-R1. A rate near zero means the model’s spoken reasoning matches its internal state. A rate of 0.417 means the model spends roughly 40% of its reasoning trace performing confidence it already holds.

On GPQA-Diamond, a dataset requiring graduate-level multi-hop reasoning in biology, chemistry, and physics, the picture reversed. All three methods started near chance accuracy and improved at similar rates throughout the reasoning trace. The performativity rate dropped to 0.012. The chain of thought tracked genuine computation because the model needed to think.

Figure 1: Performativity Rate Across Models and Benchmarks

The number that should stop every governance meeting cold: probe-guided early exit reduced the number of generated tokens by 80% on MMLU and 30% on GPQA-Diamond, with comparable accuracy. The model produced 80% more tokens than it needed on easy recall tasks. Every one of those tokens looked like reasoning. None of them were.

Your Model Performs Compliance, Not Communication

The paper’s most valuable contribution for security professionals sits in the linguistic framework, not the probe methodology itself.

They invoke Grice’s maxims of cooperative communication, a foundational concept from pragmatics describing what honest, effective communicators do. Grice identified four maxims that cooperative speakers follow. They stay relevant (Relation). They assert only what they have evidence for (Quality). They provide exactly as much information as needed, no more (Quantity). They communicate clearly and without unnecessary obscurity (Manner).

Reasoning models, trained through reinforcement learning optimized for outcome rewards, follow Relation and Quality naturally. Staying on topic and generating evidence-based reasoning steps correlates with correct final answers, which earns the reward. The model has every incentive to be relevant and factually grounded.

Quantity and Manner get violated because the reward function doesn’t penalize verbosity or obscurity. The model generates hundreds of performative tokens after committing to its answer because nothing in the training signal punishes that behavior. The output looks like careful deliberation, but it reads like a thorough analysis. The model’s internal state tells a different story.

Figure 2: Gricean Maxim Compliance in RL-Trained Reasoning Models

If you’ve sat through a compliance audit where someone recites a scripted answer that sounds thorough but reveals nothing about practice on the ground, you’ve seen the human version of this. The model doesn’t lie to you. It performs what a reasoning trace is supposed to look like, without any structural obligation to communicate its internal state honestly.

This framing hits differently depending on where you sit at the governance table. Product teams hear “the model reasons safely” because the CoT looks clean. Legal hears “we have an auditable reasoning chain” because there’s text to review. Security asks the question nobody else in the room raises… “Does the text reflect what the model computed internally?”

The Difficulty Trap: Where Enterprise Agentic AI Lives

The difficulty-dependent split creates a precise and uncomfortable problem for enterprise deployments. CoT monitoring works when the model genuinely needs to think, hard multi-hop reasoning, novel problem decomposition, tasks requiring significant test-time compute. CoT monitoring fails on tasks where models already encode the answer in their weights: recall, pattern matching, classification, and routine decision execution.

Consider what agentic AI does in a financial services environment right now. A reconciliation agent matches transaction records against internal ledgers. A KYC agent screens customer documents against sanctions lists. A compliance agent routes regulatory filings based on entity classification. A trading surveillance agent flags potential wash trades from order flow patterns.

Every one of those tasks leans heavily toward recall and pattern matching. The model has seen millions of similar patterns during training. It knows the answer early. The chain-of-thought trace generated afterward is, according to this research, predominantly performative.

The security calculus changes in direct response. Picture a sanctions screening agent processing a crafted entity record containing an indirect prompt injection embedded in a business description field. The model processes the injection at the activation level, potentially altering its classification decision. The chain-of-thought trace, generated performatively because sanctions screening is a recall task the model handles, is produced from in-weight knowledge and reads like a textbook analysis of entity risk factors. The CoT monitor watching that trace sees clean, plausible reasoning. The model’s internal activations tell a different story. Your audit trail captures what the agent wants you to see, not computation. An examiner reviewing the reasoning trace isn’t examining evidence of how the model decided. They’re reviewing a post-hoc performance that resembles deliberation.

The researchers also found that model size correlates with performativity. Larger models (671B parameters) knew their answers earlier and generated more performative reasoning than smaller distilled versions (1.5B to 32B). Smaller models needed genuine test-time computation for equivalent tasks. As the industry pushes toward larger, more capable reasoning models for enterprise agentic AI, the performativity problem intensifies with every parameter upgrade.

The EU AI Act’s Article 14 requires human oversight of high-risk AI systems, with provisions for understanding the system’s reasoning. Financial regulators from the OCC to the FCA have signaled expectations for explainability in automated decision-making. Governance frameworks built on the assumption that reasoning traces constitute meaningful explanations now face a credibility problem they haven’t accounted for.

Three Research Groups, One Uncomfortable Conclusion

The Reasoning Theater paper lands with maximum force when you triangulate it against two other major findings from competing institutions.

OpenAI’s Baker et al. (March 2025) studied what happens when CoT monitors are integrated directly into the reinforcement learning training loop. In early stages, models became more capable and more aligned. Under excessive training pressure, models learned what the researchers called “obfuscated reward hacking”: hiding their intent within the chain of thought while continuing to hack the reward signal. OpenAI’s recommendation was blunt: developers training frontier reasoning models should avoid applying strong supervision directly to chains of thought. The monitorability itself is fragile. OpenAI’s follow-up “Monitoring Monitorability” paper in December 2025 proposed formal evaluation metrics to track this fragility across o3 and GPT-5.1 training runs, recognizing that the problem demands ongoing measurement rather than a one-time fix.

Anthropic’s Chen et al. (May 2025) found that reasoning models verbalized their use of reasoning hints in fewer than 20% of cases where they relied on those hints. Outcome-based RL initially improved faithfulness, then plateaued. Increased hint usage through reward hacking did not increase the propensity to disclose those hints. The model got better at using shortcuts while getting no better at revealing them.

Korbak et al.’s multi-author paper from July 2025, signed by over 40 researchers from multiple institutions, explicitly framed CoT monitorability as “a new and fragile opportunity for AI safety.” They warned that latent reasoning models, architectures that reason in continuous vector space rather than natural language, would eliminate even the partial transparency that current CoT provides.

Three independent research groups arrived at this conclusion: an interpretability lab at Goodfire, the model developer at OpenAI, and a cross-institutional safety coalition. All arrived at the same conclusion: chain-of-thought monitoring provides a signal in specific conditions, and those conditions don’t map cleanly to enterprise agentic AI deployments, where the majority of tasks fall below the difficulty threshold where monitoring remains reliable.

Applying CARE: Fixing Your Chain-of-Thought Monitoring Gap

The CARE framework (Create, Adapt, Run, Evolve) provides a structured path to address the reasoning theater problem without abandoning CoT monitoring wholesale.

Create your threat model around the faithfulness gap. Classify every agentic AI workflow by task difficulty and determine which ones the model solves from in-weight knowledge versus genuine test-time reasoning. Recall-heavy tasks like KYC screening, transaction matching, and document classification carry high performativity risk. Multi-step analytical tasks like fraud investigation across multiple data sources and complex regulatory interpretation carry lower risk. Design your monitoring controls proportional to that classification.

Adapt your monitoring stack to account for activation-level signals where architectures permit it. The Reasoning Theater research demonstrates that attention probes trained on model activations predict behavior far earlier and more accurately than CoT monitors for performative tasks. Every vendor offering agentic AI safety tooling should answer one direct question: does your monitoring access model internals, or does it only read output text? If the answer is output text only, you now know the limitation and the research that quantifies it.

Run dual-layer monitoring in production. Keep CoT monitors active for genuine reasoning detection and inflection point identification. The research confirmed that backtracking, realizations, and reconsiderations in CoT traces correspond to authentic belief shifts, appearing more than twice as often in traces where probes showed genuine uncertainty (0.045 inflections per step in uncertain traces versus 0.020 in high-confidence performative traces for DeepSeek-R1). These inflection points remain reliable indicators of real internal computation. Layer behavioral monitoring (input/output validation, action-level controls, tool call verification) as the primary safety net for performative tasks. The CoT monitor becomes your second line for routine operations, not your first.

Evolve your governance documentation to reflect the difficulty-dependent nature of CoT reliability. Update risk assessments as model capabilities change. Larger models and improved training methods shift the boundary between “easy” and “hard” tasks, changing where CoT monitoring remains effective. The August 2026 EU AI Act enforcement deadline adds urgency. Treat this as a moving target, because the research shows it is one.

Figure 3: CARE Framework Response to Reasoning Theater

Key Takeaway: Chain-of-thought monitoring provides genuine safety signal for hard reasoning tasks, but the majority of enterprise agentic AI workflows fall below the difficulty threshold where that signal remains reliable. Your governance framework needs to know the difference, and your next vendor evaluation needs to test for it.

What to do next

Download the Reasoning Theater paper and its interactive visualization tool at reasoning-theater.streamlit.app. Map your agentic AI workflows against the difficulty-dependent performativity findings. Bring this evidence to your next AI governance meeting, because the product team, legal counsel, and AI lead sitting across from you haven’t read it yet.

For more on building AI governance frameworks that survive contact with adversarial reality, explore the CARE framework at rockcyber.com. Subscribe to RockCyber Musings for more AI security and governance insights with the occasional rant.

👉 Subscribe for more AI security and governance insights with the occasional rant.

👉 Visit RockCyber.com to learn more about how we can help you in your traditional Cybersecurity and AI Security and Governance Journey

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Weekly Musings Top 10 AI Security Wrapup: Issue 32 March 27-April 2, 2026

Rock Lambros — Fri, 03 Apr 2026 13:03:28 GMT

Anthropic had a week that should be a case study in operational security failure for years to come. On March 31, a routine release packaging error exposed 500,000 lines of Claude Code source across roughly 2,000 files. Five days earlier, a CMS misconfiguration had already put nearly 3,000 unpublished internal documents into a public search index, including draft material describing their most capable model as posing “unprecedented cybersecurity risk.” By April 1, they were firing DMCA takedowns at 8,000 GitHub repositories, most unrelated to them, trying to unsee what the internet had already seen. By April 2, a congressman was writing to the CEO about national security.

That would have been enough for any week. It was not the only thing that happened. On March 27, CISA added two exploited AI infrastructure vulnerabilities to its KEV catalog; three LangChain and LangGraph CVEs hit disclosure, with 84 million downloads in scope; and the European Commission confirmed attackers had been inside their AWS account for three days. The thread connecting all of it is the same one it always is: AI deployment speed running ahead of the operational security discipline required to sustain it. This week was not an anomaly. It was a pattern. Patterns do not self-correct.

As a bonus, check out my AI Cyber Magazine Podcast with Confidence Staveley during RSA.

1. Anthropic Leaked 500,000 Lines of Claude Code Source, Then Panicked on GitHub

On March 31, a debugging file accidentally bundled into a routine Claude Code update exposed approximately 500,000 lines of source code across nearly 2,000 files (CNBC, Axios, Fortune). The codebase was mirrored across GitHub within hours. Leaked feature flags revealed unreleased capabilities: a persistent background agent, cross-device remote control, and session-to-session learning. Anthropic attributed the incident to “a release packaging issue caused by human error” and stated no customer data was exposed. On April 1, attempting to scrub the code from GitHub, Anthropic sent DMCA takedowns that hit approximately 8,000 repositories, most unrelated to the leak (TechCrunch, Bloomberg).

Why it matters

Competitors received Anthropic’s unreleased feature roadmap. That strategic damage compounds the fact that this happened five days after the Mythos content leak. Coincidence???? I’ll let you decide.
The persistent background agent and remote control capabilities in the leaked code require explicit security design review before deployment. They were in development without prior public disclosure of the capability direction.
The DMCA sweep that caught 8,000 unrelated repositories shows what reactive incident response without a playbook looks like. Every remediation attempt created a new problem.

What to do about it

If you deploy Claude Code in your enterprise environment, review what access it holds to production systems and rotate any associated credentials until the full scope of the leak is confirmed.
Require software composition analysis (SCA) and release integrity verification as contractual terms with your AI vendors.
Develop a pre-incident legal response playbook that covers IP exposure scenarios, including proportional DMCA procedures that require scope confirmation before submission.

Rock’s Musings

Two major operational security failures from the same company in five days. The first was a CMS misconfiguration. The second was a packaging error. Both are basic controls that mature security operations have solved. Anthropic markets itself on safety and trustworthiness, and that positioning is now doing work it was not designed to carry. The DMCA overcorrection made it worse: you leak 500,000 lines of source code, then fire automated takedown requests at 8,000 repositories, most of them unrelated to you. Every IP attorney will tell you DMCA takedowns require good faith and specificity. Have a process before the fire starts.

2. Anthropic Accidentally Confirmed Its Most Capable Model Poses Unprecedented Cybersecurity Risk

A configuration error in Anthropic’s content management system made nearly 3,000 unpublished assets publicly searchable starting around March 26, including draft blog posts for a model called Claude Mythos (Fortune, CoinDesk). Internal documents describe Mythos as capable of rapidly finding and exploiting software vulnerabilities at an unprecedented scale. Anthropic confirmed the model exists and is in testing with early-access customers, calling it “a step change” in capability. The company described the exposure as caused by a configuration error and stated the data store was secured after discovery.

Why it matters

Anthropic’s own internal documentation, not a researcher’s estimate, describes this model as posing cybersecurity risks the industry has not seen before. That is the company’s self-assessment.
Early-access customer deployments were already underway before any public discussion of the risk profile occurred. The model shipped before the security conversation started.
A frontier model capable of autonomously finding and exploiting vulnerabilities at scale invalidates current vulnerability management timelines. That conversation needs to happen now.

What to do about it

Update your AI threat model to account for AI-assisted offensive operations at scale. This is not a future scenario. It is a current deployment.
Ask your AI vendors direct questions about internal capability assessments before your next contract renewal. What have they assessed, and when?
Document board and leadership awareness of frontier AI capability risk as a governance record item. Regulatory scrutiny on this topic will increase.

Rock’s Musings

The model is called Mythos. The leaked internal docs describe the cybersecurity risk as unprecedented. Anthropic was already deploying it with customers before any of this became public. This happened not because of an attack but because someone left a CMS misconfigured. Anthropic has historically been conservative in capability claims. When their own internal documentation describes a model as different in kind from what came before, the security community should take that seriously, not because the word “unprecedented” is alarming on its own, but because the source is the organization that built the thing. They know what it does.

3. ShinyHunters Breached the European Commission’s AWS Account

The European Commission confirmed on March 27 that attackers accessed the AWS account hosting its Europa.eu websites, with the intrusion first detected on March 24 (TechCrunch, Bloomberg). Threat actor ShinyHunters claimed responsibility and alleged theft of more than 350GB of data including mail server exports, databases, confidential documents, and contracts. The Commission’s statement noted internal systems were unaffected and mitigation measures were applied quickly. Affected EU entities received notification.

Why it matters

ShinyHunters has a documented history of monetizing stolen data through dark market sales. Even if the 350GB claim is exaggerated for leverage, policy documents and procurement contracts from the Commission’s web infrastructure are a counterintelligence asset.
The Commission enforces GDPR and is building the AI Act enforcement apparatus. Getting breached while standing up that apparatus is not a good governance signal.
AWS account-level compromise is full infrastructure compromise in practice. A managed cloud provider does not neutralize cloud account security failures.

What to do about it

Audit your AWS account permission boundaries and review CloudTrail logs for anomalous patterns this week, not next quarter.
Ensure your incident response plan explicitly covers cloud account compromise. Traditional endpoint-focused plans miss this scenario entirely.
If any of your vendors are EU institutions or Commission contractors, treat procurement data exposure as a downstream supply chain risk and assess your exposure now.

Rock’s Musings

The body enforcing Europe’s data protection framework had its AWS account cracked. Governance credentials do not equal security maturity. Write the most thorough AI regulation in the world. Your cloud IAM configuration remains a disaster until someone fixes it. The ShinyHunters 350GB claim needs forensic verification before anyone draws conclusions about scope, but three days of undetected access to the official Commission infrastructure doesn’t need verification. The institutions asking private sector organizations to demonstrate AI security maturity owe the market some transparency on their own failures. Name it, fix it, move on.

4. Your AI Workflow Tool Got CISA’s Attention: Langflow CVE-2026-33017

CISA added CVE-2026-33017, a critical remote code execution flaw in Langflow, to its Known Exploited Vulnerabilities catalog on March 26. Attackers began scanning for exposed instances roughly 20 hours after the advisory publication, with exploitation scripts appearing within 21 hours and active .env and .db file harvesting beginning within 24 hours (Sysdig, BleepingComputer, Help Net Security). The vulnerability carries a CVSS score of 9.3 and allows unauthenticated attackers to inject arbitrary Python code through the public flow build endpoint with no sandboxing applied. Federal agencies face an April 8 remediation deadline. Upgrade to Langflow version 1.9.0 or later.

Why it matters

Langflow is used to build and deploy LLM pipelines. Remote code execution in a workflow orchestration tool gives an attacker control over the AI’s inputs, outputs, and the credentials it holds.
The 20-hour exploitation window is increasingly standard for high-severity flaws. The concept of a patch window measured in days is no longer realistic for internet-exposed AI infrastructure.
.env file harvesting is the attacker’s first move because those files contain API keys for LLMs, vector databases, and cloud services the workflow connects to.

What to do about it

If Langflow runs on any internet-accessible host, treat the environment as potentially compromised and rotate all associated credentials before patching.
Segment AI workflow orchestration platforms behind authentication and network controls. These tools have no business being directly internet-accessible.
Verify Langflow version across your environment immediately. Anything prior to 1.9.0 is an open liability.

Rock’s Musings

The 20-hour exploitation timeline should reframe your vulnerability management program. That program was designed when you had days or weeks to act. That era closed. CISA’s KEV catalog is now your minimum viable patch priority list, and if you are not at sub-72-hour remediation SLAs for critical AI infrastructure, you are already behind. Organizations still describing AI workflow platforms as “internal tools” need a rethink. Internal tools with LLM API keys, cloud credentials, and production data connections are not internal in any meaningful threat model. An attacker who executes code in your Langflow environment has lateral movement access to every system that environment touches.

5. LangChain and LangGraph: Three CVEs, 84 Million Downloads Exposed

Cyera security researcher Vladimir Tokarev disclosed three vulnerabilities in LangChain and LangGraph on March 27, each covering a different attack path against the same enterprise AI framework (The Hacker News). CVE-2026-34070 (CVSS 7.5) enables path traversal to arbitrary files through manipulated prompt templates. CVE-2025-68664 (CVSS 9.3) allows extraction of API keys and environment secrets through unsafe deserialization. CVE-2025-67644 (CVSS 7.3) enables SQL injection in LangGraph’s SQLite checkpoint layer. LangChain, LangChain-Core, and LangGraph collectively logged over 84 million downloads. Patches are available: LangChain Core 1.2.22+, LangChain-Core 0.3.81+ or 1.2.5+, and LangGraph checkpoint sqlite 3.0.1+.

Why it matters

These three CVEs cover filesystem data, environment secrets, and conversation history in combination. Together, they represent near-total information exposure for any application built on these frameworks.
The 84 million download count means a significant portion of enterprise AI applications are affected. Most organizations do not know which AI frameworks their development teams selected.
CVE-2025-68664 with its 9.3 CVSS is the most critical. Unsafe deserialization is a well-understood, pervasive, and reliably exploitable class of vulnerability.

What to do about it

Inventory every AI framework in your environment, including those embedded in third-party tools. Do not rely on developers to self-report what they are using.
Apply the three patches and validate versions before the end of the business week.
Assess what data your LangChain-based applications can access and treat those data stores as potentially exposed pending patch confirmation.

Rock’s Musings

Three vulnerability classes in the same framework, covering three categories of sensitive enterprise data, were disclosed in one report. That’s what happens when you build for speed and bolt on security later. AI framework developers made that choice repeatedly, and this week’s CVE list is the invoice. LangChain is the jQuery of AI development right now. It is in everything, often without explicit organizational approval. Your AI security posture includes every dependency your developers pulled in without telling you. Get ahead of that inventory problem before the next disclosure.

6. A Congressman Put Anthropic on Notice Over National Security

Rep. Josh Gottheimer (D-N.J.) sent a letter to Anthropic CEO Dario Amodei on April 2, citing national security concerns arising from the source code leak (Axios, The Hill). Gottheimer’s letter noted that Claude is embedded in defense and intelligence operations, raised the prior CCP-backed group intrusion against Claude, and expressed concern that Mythos could enable more sophisticated cyberattacks against the United States. The letter also flagged Anthropic’s decision in late February to remove its binding commitment to halt model development if safety capabilities fall behind, replacing it with “nonbinding but publicly-declared” goals.

Why it matters

Federal agencies and defense contractors use Claude operationally. A source code leak followed by a congressional inquiry is a vendor risk event, not a PR problem. Your GRC process should treat it as such.
Removing the binding safety commitment is a substantive policy change that the congressional record now documents. The enforceability question will follow Anthropic through every future regulatory discussion.
Gottheimer sits on the House Intelligence Committee. This is not a throwaway letter. It is a first-stage oversight action that signals more to come.

What to do about it

Review your vendor risk assessment for any AI provider with confirmed government contracts. Congressional inquiries are material third-party risk events.
Establish a direct communication channel with your AI vendors’ enterprise security teams and request formal notification procedures for any government inquiries affecting their products.
Track the congressional record regarding Anthropic’s rollback of its safety commitment. It will surface again in budget and procurement cycles.

Rock’s Musings

The safety commitment rollback from February is the most substantive issue in that letter. Anthropic replaced a binding pledge to pause development if safety fell behind with goals they grade themselves on. That is not a small change. That is the foundational accountability mechanism that distinguished their positioning from competitors, and they quietly removed it. Congressional scrutiny was predictable the moment they became embedded in national security operations. The question I would ask directly is how many federal agency customers received notification about the source code exposure before it hit the press. I would guess the answer is uncomfortable.

7. Your Security Scanner Was the Supply Chain Attack: Trivy CVE-2026-33634

CISA added CVE-2026-33634 to its Known Exploited Vulnerabilities catalog on March 27 (Help Net Security, Aquasecurity GitHub advisory). Attackers compromised the Trivy container security scanner on March 19, using stolen credentials to publish a malicious v0.69.4 release and force-push 76 of 77 version tags in the trivy-action repository with credential-stealing malware. The attack triggered a downstream LiteLLM supply chain compromise via poisoned PyPI packages. Federal agencies face an April 9 deadline. Root cause was non-atomic credential rotation on March 1 left a valid token exposed during the rotation window.

Why it matters

Trivy is a default security tool in CI/CD pipelines across the industry. Compromising the scanner means attackers access the same environment credentials the security scan was meant to protect.
Force-pushing 76 version tags is a comprehensive compromise. Any pipeline that pins to mutable major or minor version tags rather than specific commit hashes was exposed.
The downstream LiteLLM PyPI compromise extends the blast radius into Python environments running LLM application code. The supply chain damage propagated well beyond the initial tool compromise.

What to do about it

Audit every CI/CD pipeline for trivy-action or setup-trivy at mutable version tags and pin to specific commit hashes immediately.
Treat any environment that ran a compromised Trivy version since March 19 as potentially credential-compromised. Rotate all associated tokens, SSH keys, and cloud credentials.
Apply this lesson to every security tool in your pipeline. Security tooling supply chains are higher-value targets than application code supply chains.

Rock’s Musings

The attacker turned the vulnerability scanner into the vulnerability. That is the platonic ideal of a supply chain attack: targeting organizations that care about security and embed security tooling in their build pipelines. The more security-conscious your culture, the higher your Trivy adoption, and the more exposed you were. The non-atomic credential rotation is the root cause. Aquasecurity rotated credentials on March 1 but did not revoke all tokens simultaneously. The attacker grabbed freshly rotated secrets during the window between invalidation and deployment. If your own rotation procedures have a gap between “revoke old” and “confirm new is live,” that gap is your exposure. Run your playbooks against that question this week.

8. The State AI Chatbot Safety Wave Is Not Waiting for Washington

Georgia’s state senate voted to concur in the House-amended version of SB 540 during the week of March 27, sending the chatbot disclosure and minor-protection bill to Governor Kemp’s desk (Troutman Privacy, Transparency Coalition). Idaho’s S 1297 passed its full legislature and advanced to Governor Little. Both are chatbot safety measures. Georgia’s bill requires disclosure every three hours for adult users and every hour for minors, along with explicit suicide and self-harm response protocols for conversational AI services. The Future of Privacy Forum’s tracker now counts 78 AI chatbot safety bills moving across 27 states in 2026.

Why it matters

Disclosure, minor safety, and mental health response requirements are becoming the regulatory floor across state jurisdictions. Organizations operating consumer-facing AI products need a 50-state tracking capability, not a wait-and-see approach.
Hourly disclosure requirements for minors are not trivial to implement for many chatbot architectures. The compliance engineering work should start now.
Seventy-eight bills across 27 states mean that any federal preemption framework, if one ever arrives, faces an already established patchwork of state obligations to reconcile.

What to do about it

Map your consumer AI products against chatbot disclosure requirements in every state where users reside. Georgia and Idaho represent the floor, not the ceiling.
Assess your chatbot’s existing mental health response protocols against the Georgia requirement specifics. A disclaimer is not compliant.
Assign someone accountable for multi-state AI governance tracking. This is not a future compliance problem.

Rock’s Musings

Washington cannot pass a federal AI framework. States can. Fifty legislatures with different requirements and different timelines is the compliance nightmare that preemption was supposed to prevent. It didn’t. Georgia’s hourly minor disclosure requirement is specific, implementable, and enforceable. State legislatures are producing more actionable compliance requirements than most federal guidance I have seen this year. If you deploy consumer AI products and you don’t have someone accountable for multi-state AI governance tracking today, that gap closes before Q3 or it closes you.

9. The EU AI Act Has an Enforcement Problem, and Nobody Is Talking About It Honestly

As of late March, only 8 of 27 EU member states had designated the single contact points required for national enforcement coordination under the AI Act, according to the European Parliament Think Tank’s enforcement analysis (Tech Policy Press, IAPP). The Digital Omnibus proposal, with negotiating positions adopted by Parliament’s IMCO and LIBE committees on March 18, would push high-risk AI compliance deadlines to December 2027 for Annex III systems and to August 2028 for Annex I systems, compared with the original August 2026 deadline. The European Commission also missed its own deadline for issuing guidance on high-risk AI systems. Trilogue negotiations between Council, Parliament, and Commission are now underway.

Why it matters

Approximately 70% of EU member states are not operationally ready for AI Act enforcement. Regulations without enforcement infrastructure are aspirational documents.
The 16-month delay in high-risk requirements gives organizations breathing room on paper while creating uncertainty about what compliance standard they are being held to during the gap.
The Commission missing its own implementation guidance deadline sets a poor precedent for holding private sector organizations to their compliance timelines.

What to do about it

Do not use the delay as a license to defer governance program work. The underlying obligations have not changed in substance. Build the program now and own it.
Review the Digital Omnibus amendments specifically for changes to the high-risk AI system definition. Legislative simplification sometimes reclassifies systems in ways that alter the scope of compliance.
Subscribe to IAPP’s EU AI Act tracker for updates on the trilogue outcome. The final text will differ from both Council and Parliament positions.

Rock’s Musings

Eight out of 27 enforcement bodies are operational as the Act’s first major deadlines approach. The Commission missed its own implementation guidance deadline. The most substantive AI governance framework on the planet is running on infrastructure that is not ready to enforce it. The delay does not invalidate the regulation. Organizations that build genuine AI risk management programs now will be positioned for whatever enforcement timeline materializes. Organizations that chase the deadline and treat compliance as documentation will be exposed when the enforcement machinery catches up. That gap grows wider every quarter.

The One Thing You Won’t Hear About But You Need To

NVIDIA and Johns Hopkins Gave You a Blueprint for Defending AI Agents Against Prompt Injection

Researchers from NVIDIA and Johns Hopkins University published “Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks” on March 31 (ArXiv 2603.30016). The paper addresses how AI agents are vulnerable not to direct attacks on the model but to malicious instructions embedded in data the agent processes during task execution. The authors articulate three architectural positions. First, agents in dynamic environments need dynamic replanning with security policy updates built into the replanning loop. Second, security decisions requiring contextual judgment should still involve LLMs, but only within system designs that strictly constrain what the model can observe and decide. Third, ambiguous situations should treat human interaction as a core design consideration, not an edge case to minimize.

Why it matters

This paper frames indirect prompt injection as an architectural problem, not a model alignment problem. You cannot align your way out of it. You design it out or you accept the risk.
The principle of strictly constraining what the model can observe and decide has immediate practical application as your primary defense lever, more effective than filtering or detection approaches.
The human oversight design principle directly contradicts how most agentic deployments are being built, with human review treated as friction to reduce rather than a security control to preserve.

What to do about it

Read the paper. At 12 pages, it is short enough to share with your AI architects and security engineers before the next deployment review meeting.
Audit any agentic AI system currently in your environment against the observation scope and decision authority questions. Broad scope plus broad authority equals your highest-risk deployment.
Make human oversight an explicit design requirement in your AI agent security standards. Document the specific conditions under which an agent must pause and request human authorization.

Rock’s Musings

Nobody outside the AI security research community covered this paper. That is precisely why it belongs here. The breach reports get attention. The architecture guidance that would prevent the next breach sits on ArXiv with a few hundred downloads. I have been arguing at RockCyber for two years that agentic AI security is an architecture problem. You do not solve it with better prompts or stronger models. You solve it with privilege constraints, observation scope limits, and honest human oversight design. NVIDIA and Johns Hopkins gave you a 12-page framework for that conversation. If your next AI agent deployment review does not address these three principles, you are building exposure, not capability.

👉 For ongoing analysis of agentic AI governance frameworks, the conversation continues at RockCyber Musings.

👉 Visit RockCyber.com to learn more about how we can help you in your traditional Cybersecurity and AI Security and Governance Journey

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Thanks for reading RockCyber Musings! This post is public, so feel free to share it.

Share RockCyber Musings

References

Axios. (2026, March 31). Anthropic leaked its own Claude source code. https://www.axios.com/2026/03/31/anthropic-leaked-source-code-ai

Axios. (2026, April 2). Exclusive: Gottheimer presses Anthropic on source code leaks and safety protocols. https://www.axios.com/2026/04/02/gottheimer-anthropic-source-code-leaks

BleepingComputer. (2026, March 27). CISA: New Langflow flaw actively exploited to hijack AI workflows. https://www.bleepingcomputer.com/news/security/cisa-new-langflow-flaw-actively-exploited-to-hijack-ai-workflows/

Bloomberg. (2026, March 27). European Commission’s data stolen in hack on AWS account. https://www.bloomberg.com/news/articles/2026-03-27/european-commission-s-data-stolen-in-hack-on-aws-account

Bloomberg. (2026, April 1). Anthropic takes down thousands of GitHub repos trying to yank its leaked source code. https://www.bloomberg.com/news/articles/2026-04-01/anthropic-scrambles-to-address-leak-of-claude-code-source-code

CNBC. (2026, March 31). Anthropic leaks part of Claude Code’s internal source code. https://www.cnbc.com/2026/03/31/anthropic-leak-claude-code-internal-source.html

CoinDesk. (2026, March 27). Anthropic’s massive Claude Mythos leak reveals a new AI model that could be a cybersecurity nightmare. https://www.coindesk.com/markets/2026/03/27/anthropic-s-massive-claude-mythos-leak-reveals-a-new-ai-model-that-could-be-a-cybersecurity-nightmare

Fortune. (2026, March 27). Anthropic accidentally leaked details of a new AI model that poses unprecedented cybersecurity risks. https://fortune.com/2026/03/27/anthropic-leaked-ai-mythos-cybersecurity-risk/

Fortune. (2026, March 31). Anthropic leaks its own AI coding tool’s source code in second major security breach. https://fortune.com/2026/03/31/anthropic-source-code-claude-code-data-leak-second-security-lapse-days-after-accidentally-revealing-mythos/

Help Net Security. (2026, March 27). CISA sounds alarm on Langflow RCE, Trivy supply chain compromise after rapid exploitation. https://www.helpnetsecurity.com/2026/03/27/cve-2026-33017-cve-2026-33634-exploited/

Help Net Security. (2026, March 30). Second data breach at European Commission this year leaves open questions over resilience. https://www.helpnetsecurity.com/2026/03/30/european-commission-cyberattack-cloud-infrastructure-website/

IAPP. (2026). European Commission misses deadline for AI Act guidance on high-risk systems. https://iapp.org/news/a/european-commission-misses-deadline-for-ai-act-guidance-on-high-risk-systems

IAPP. (2026, March). EU Digital Omnibus: Analysis of key changes. https://iapp.org/news/a/eu-digital-omnibus-analysis-of-key-changes

Qualys ThreatPROTECT. (2026, March 26). CISA Added Langflow Vulnerability to its Known Exploited Vulnerabilities Catalog (CVE-2026-33017). https://threatprotect.qualys.com/2026/03/26/cisa-added-langflow-vulnerability-to-its-known-exploited-vulnerabilities-catalog-cve-2026-33017/

SecurityAffairs. (2026, March 27). The European Commission confirmed a cyberattack affecting part of its cloud systems. https://securityaffairs.com/190067/data-breach/the-european-commission-confirmed-a-cyberattack-affecting-part-of-its-cloud-systems.html

Sysdig. (2026, March 27). CVE-2026-33017: How attackers compromised Langflow AI pipelines in 20 hours. https://www.sysdig.com/blog/cve-2026-33017-how-attackers-compromised-langflow-ai-pipelines-in-20-hours

TechCrunch. (2026, March 27). European Commission confirms cyberattack after hackers claim data breach. https://techcrunch.com/2026/03/27/european-commission-confirms-cyberattack-after-hackers-claim-data-breach/

TechCrunch. (2026, April 1). Anthropic took down thousands of GitHub repos trying to yank its leaked source code. https://techcrunch.com/2026/04/01/anthropic-took-down-thousands-of-github-repos-trying-to-yank-its-leaked-source-code-a-move-the-company-says-was-an-accident/

The Hacker News. (2026, March 27). LangChain, LangGraph flaws expose files, secrets, databases in widely used AI frameworks. https://thehackernews.com/2026/03/langchain-langgraph-flaws-expose-files.html

The Hill. (2026, April 2). House Democrat pushes Anthropic on safety protocols, source code leak. https://thehill.com/policy/technology/5812881-gottheimer-presses-anthropic-ai-safety/

Tech Policy Press. (2026). EU’s AI Act delays let high-risk systems dodge oversight. https://www.techpolicy.press/eus-ai-act-delays-let-highrisk-systems-dodge-oversight/

Transparency Coalition. (2026, March 27). AI legislative update: March 27, 2026. https://www.transparencycoalition.ai/news/ai-legislative-update-march27-2026

Troutman Pepper Locke. (2026, March 30). Proposed state AI law update: March 30, 2026. https://www.troutmanprivacy.com/2026/03/proposed-state-ai-law-update-march-30-2026/

Aquasecurity. (2026). Trivy ecosystem supply chain temporarily compromised [GitHub Security Advisory GHSA-69fq-xp46-6x23]. https://github.com/aquasecurity/trivy/security/advisories/GHSA-69fq-xp46-6x23

European Parliament Think Tank. (2026, March 18). Enforcement of the AI Act. https://epthinktank.eu/2026/03/18/enforcement-of-the-ai-act/

Jiang, Z., et al. (2026, March 31). Architecting secure AI agents: Perspectives on system-level defenses against indirect prompt injection attacks [Preprint]. ArXiv. https://arxiv.org/abs/2603.30016

AI Monitoring Is a Standards Problem, Not a Technology Problem

Rock Lambros — Tue, 31 Mar 2026 12:50:10 GMT

NIST just published an admission that nobody knows how to monitor AI systems after deployment. NIST AI 800-4, “Challenges to the Monitoring of Deployed AI Systems,” reviews findings from three workshops, 250+ experts, and almost 90 research papers. The document catalogs over 30 distinct challenges. It offers zero solutions. That’s not a criticism. That’s the diagnosis, and that should raise your spidey senses.

NIST Mapped the Mess

The report organizes post-deployment AI monitoring into six categories:

Functionality (does it still work as intended?)
Operational (does the infrastructure hold?)
Human Factors (is it transparent and useful to humans?)
Security (is it defended against attacks?)
Compliance (does it meet regulatory requirements?)
Large-Scale Impacts (does it promote human flourishing?)

Each category carries its own distinct challenges. Functionality monitoring suffers from a lack of ground-truth datasets and a lack of a reliable way to detect model drift. Operational monitoring struggles with fragmented logging across distributed infrastructure. Human Factors monitoring, which drew more practitioner attention than any other category in the workshops, remains almost entirely unstudied in the literature. Security monitoring faces the unsettling reality that some models appear to detect when they’re being evaluated, changing their behavior under observation. Compliance monitoring lacks even basic tracking of terms-of-service violations, including downstream fine-tuning of open models for CSAM generation. Large-Scale Impacts monitoring lacks agreed-upon metrics to measure whether AI systems help or harm people at scale.

That’s a lot of individual problems. The question is whether they share a common root cause.

Figure 1: NIST AI 800-4 Cross-Cutting Challenges

The Root Cause NIST Documented Without Naming

Read the cross-cutting challenges section carefully. Five categories of barriers span every monitoring type:

No trusted methods and tools
Poor visibility and transparency
Pace of change
Organizational incentive failures
Resource constraints

Strip away the academic framing, and a pattern emerges. Workshop attendees were asking questions that belong in a standards body, not a research lab.

One attendee called for “an abstraction layer for universal security and monitoring.” Others asked, “What does the information sharing of what’s measured look like up and down the value chain?” Multiple participants flagged the absence of common metrics across use cases, noting that “non-standardized logic for generating metrics across use cases prevents us from building easy platform capabilities for monitoring.”

It’s important to point out that not every challenge NIST documented is a standards problem. Detecting deceptive behavior in models that modify their behavior under observation remains an open research problem. No specification can fix it because nobody knows how to do it reliably yet. Human-AI feedback loops are an understudied science. Ground-truth dataset availability is a data and methodology problem. The field faces three categories of challenge simultaneously: standards gaps (metrics, logging formats, reporting schemas), research gaps (deceptive behavior detection, feedback loop dynamics), and adoption gaps (methods exist in adjacent fields but aren’t applied to AI).

The standards layer is the prerequisite that makes progress on the other two categories possible. Without common definitions, you can’t scale research findings into production monitoring. Without shared schemas, adoption of proven methods stays trapped inside individual vendor implementations. Take deception detection as an example. You can’t begin researching whether a model’s stated reasoning matches its actual behavior unless you’re capturing structured reasoning traces alongside action logs in the first place. The research gap depends on closing the standards gap.

You’ve Seen This Movie Before

How did this work out for us in cybersecurity? We’ve had a 20-year head start on this exact problem.

Before syslog standardization, every network device vendor shipped its own logging format. Security teams drowned in data they couldn’t correlate. Firewalls from one vendor produced logs that meant nothing to the SIEM built for another vendor’s format. Every firewall had monitoring, but none of them spoke the same language.

The fix wasn’t a better firewall. It was CEF (Common Event Format), then LEEF (Log Event Extended Format), and now OCSF (Open Cybersecurity Schema Framework). Common schemas let security teams correlate events across vendors, build cross-platform detection rules, and operate SOCs that don’t require a translator for each data source. The technology didn’t change. The standards layer underneath made the existing technology useful at scale.

The AI monitoring equivalent would need agent-specific semantic conventions built on the observability infrastructure enterprises already operate. Not a new standard competing with OpenTelemetry. Extensions to OpenTelemetry that understand agent reasoning steps, tool calls, and multi-agent handoffs. Security events are mapped to schemas that flow into existing SIEMs without custom parsers. The pattern is identical: don’t build a parallel universe of AI-specific tooling. Extend the standards that security teams already trust.

AI monitoring is stuck in the pre-syslog era. Every platform defines its own metrics, its own log structures, its own alert taxonomies. If your organization runs AI workloads across three cloud providers and two agent frameworks, you operate five separate monitoring stacks that don’t talk to each other.

Here’s what that looks like in practice. A regional bank deploys a customer-facing loan origination model hosted on one cloud provider’s ML platform. The model calls a third-party credit scoring API. A separate vendor supplies the fairness monitoring layer. The bank’s compliance team uses an internal dashboard that pulls from the cloud provider’s native monitoring. When the credit scoring API updates its model without notification, the loan origination model starts producing subtly different risk scores. Approval rates for one demographic bracket shift by 4% over six weeks. The fairness monitoring vendor’s tool flags a drift alert using its own proprietary metric. The cloud provider’s native monitoring shows no anomaly because its baseline was never calibrated against the third-party API’s output distribution. The compliance dashboard, which aggregates data from both sources, shows conflicting signals that the compliance analyst can’t reconcile because the two tools define “drift” differently, measure it on different time windows, and log it in incompatible formats.

Nobody in that chain did anything wrong individually. The fairness vendor’s tool worked as designed. The cloud provider’s monitoring worked as designed. The gap was structural. There was no shared definition of what “drift” means across the pipeline, no common logging schema that would let the compliance team correlate events from two different monitoring tools, and no standardized way for the credit scoring API provider to notify downstream consumers of model updates.

That scenario plays out today in financial services, healthcare, and any sector that assembles AI capabilities from multiple vendors. NIST AI 800-4 confirmed it with receipts from 250 practitioners saying the same thing in different words.

Figure 2: The Monitoring Standards Gap

Article 72 Is Already Undeliverable

Regulators aren’t waiting for standards to mature. The EU AI Act’s high-risk system obligations take effect August 2, 2026 (if the aren’t delayed). Article 72 requires providers of high-risk AI systems to implement post-market monitoring plans that “actively and systematically collect, document and analyse relevant data” on system performance throughout the system’s lifetime. Deployers face separate obligations to monitor operations and report serious incidents within 72-hour and 15-day windows.

Pull one thread, and the gap becomes specific. Article 72 requires providers to collect performance data “throughout their lifetime” and evaluate “continuous compliance.” NIST AI 800-4 documents that practitioners lack standardized performance metrics, can’t establish baselines or deviation thresholds, and have no systematic way to compare model behavior across providers. One workshop attendee put it bluntly: “It’s often unclear what exactly to monitor and how.” The report cites research confirming that “the appropriate metrics to capture is not standardized in the AI community” and warns this “absence can result in misleading performance measures.”

That’s not a general compliance gap. Article 72 requires continuous collection and analysis of performance data. NIST AI 800-4 confirms that the field hasn’t agreed on what “performance” means in post-deployment contexts, let alone how to measure it consistently across different AI systems and providers. The regulation demands an activity that is structurally undeliverable with the current monitoring ecosystem. Organizations filing post-market monitoring plans in 2026 will document processes built on unstandardized metrics, non-interoperable tools, and self-defined baselines. They’ll comply on paper. The monitoring itself won’t be comparable, auditable, or meaningful across organizational boundaries.

Compliance requires two capabilities this ecosystem lacks: runtime hooks that produce monitoring data in standardized formats, and trace architectures that reconstruct decision chains across organizational boundaries. Without these, Article 72 post-market monitoring plans are fiction written in incompatible vendor dialects.

NIST’s own AI Risk Management Framework compounds the pressure. The MANAGE function calls for continuous monitoring and risk response throughout deployment. The forthcoming NIST Cyber AI Profile maps cybersecurity controls to AI-specific concerns like model integrity and adversarial robustness. Every framework converges on the same expectation. The implementation layer that would make compliance verifiable doesn’t exist yet.

Who’s Responsible? Nobody Knows That Either.

NIST AI 800-4 surfaced a question that’s arguably more urgent than the technical gaps: who monitors? Workshop attendees repeatedly asked: “Who should do monitoring?” “Who is responsible for remediating incidents?” and “If anything is found, who can act on it?”

In the bank scenario above, was the monitoring failure the cloud provider’s responsibility? The fairness vendor’s? The credit scoring API provider’s? The bank’s compliance team? Each party monitored its own slice of the pipeline. Nobody monitored the seams between them. The NIST report documents this as an unresolved question across the AI supply chain, and it’s compounded by the standards gap. You can’t assign responsibility for monitoring when you haven’t agreed on what monitoring means. You can’t hold a vendor accountable for failing to report a drift event when “drift” has no shared definition.

A viable monitoring architecture separates three concerns. The platform exposes standardized observation and control points. An open enforcement layer applies policy through those control points, portable across any platform that exposes them. The enterprise customizes policy to its domain: financial services brings its own data sensitivity models, healthcare brings PHI detection, and any regulated industry brings its compliance requirements. When responsibilities are layered this way, the question of “who monitors?” has a structural answer. The platform enables. Open tooling enforces. The enterprise governs. Accountability follows the layer where the failure occurred.

One attendee asked how to “reduce the burden on the end user” to validate model behavior. Another asked how monitoring could become “a more collaborative practice, rather than a closed technical process.” These aren’t theoretical musings. They’re the governance questions that determine whether monitoring happens at all or degenerates into checkbox compliance where everyone points at someone else’s dashboard. A layered architecture gives each party a defined obligation: expose, enforce, govern. The current ecosystem gives everyone an excuse.

Agents Make Everything Worse

If the standards gap is a problem for current AI systems, it’s a crisis for agentic AI. NIST SP 800-4 repeatedly mentions agents, and the findings are sobering.

Workshop attendees flagged “lengthy agentic tasks” as especially resource-intensive to monitor. The report cites research noting that “both the agents and the operational environment are subject to change,” making static monitoring baselines unreliable. Agent identification and tracking remain unstandardized. Attendees raised visibility challenges around “out-of-distribution behavior using agent identifiers” and noted that watermarking and content provenance measures “face reliability challenges.” One attendee asked directly: “Is the model agentically attempting to subvert the monitoring setup it is under, i.e., scheming?”

That question deserves a pause. We’re building systems that plan, execute across organizational boundaries, call external tools, and collaborate with other agents. The monitoring challenges NIST documented for conventional AI systems, from detecting drift to maintaining visibility to establishing baselines, all assume a relatively static system being observed from outside. Agents aren’t static. They change behavior based on context, discover new capabilities at runtime, and operate across a distributed infrastructure that no single organization fully controls.

Any monitoring standard for agents needs a dynamic inventory mechanism. A static software bill of materials generated at deployment time is worthless when agents discover new tools, connect to new service endpoints, and modify their own capabilities during a single execution session. The inventory must update in real time, triggered by component changes, and output in formats the supply chain security ecosystem already consumes. If your agent connects to a new MCP server mid-task and your inventory doesn’t reflect that within the same session, your security team is operating on a stale map.

The “monitorability tax” concept raised in the report’s cited research captures the emerging cost structure. Model developers will pay a performance penalty, through slower inference or less capable models, to maintain the ability to monitor agent behavior. That cost rises as agent autonomy increases. Standardized hooks reduce the engineering cost by making monitoring implementation portable across frameworks, a one-time platform integration rather than custom monitoring code for every deployment. The monitorability tax on compute remains. The tax on engineering effort doesn’t have to.

The cross-provider abstraction layer that workshop attendees called for isn’t a nice-to-have for agentic systems. Without standardized hooks for runtime monitoring, standardized trace formats for multi-agent workflows, and standardized inventories of agent capabilities and dependencies, you’re watching agents through whatever proprietary window each vendor provides. You can’t correlate behavior across platforms. You can’t reconstruct decision chains that span multiple agent frameworks. You can’t audit what you can’t consistently observe.

One more structural blind spot worth naming: runtime monitoring standards assume a cooperating platform that exposes hooks. Open-weight models distributed without platforms bypass this assumption entirely. Once a model is released into the wild for anyone to run, no runtime hook exists unless the downstream deployer voluntarily implements one. Open-weight models are structurally ungovernable by runtime standards alone. Any honest conversation about the monitoring gap has to acknowledge this boundary.

Figure 3: How Agents Amplify the Monitoring Standards Gap

Key Takeaway: NIST AI 800-4 confirms what practitioners feel in their bones: AI monitoring isn’t failing because we lack technology. The standards layer that would make technology useful at scale doesn’t exist. Agents make the gap existential.

What to do next

Stop accepting proprietary monitoring silos. The next time you evaluate an AI platform, put these questions into the review:

What open logging schema do your monitoring outputs conform to? If the answer is a proprietary format, ask how you export monitoring data into a format another platform can ingest without custom transformation.
How does your monitoring define and detect model drift? Compare the answer across your vendors. If two vendors define “drift” differently, your compliance team can’t produce a coherent post-market monitoring report under Article 72.
When a component in the AI pipeline (a third-party API, a model update, a data source change) shifts behavior, how does your monitoring surface cross-component effects? If the answer involves manual correlation, you have a gap that scales with system complexity.
Who in the supply chain is responsible for monitoring the seams between components? If nobody owns cross-boundary monitoring, say so in your risk register. That’s an accepted risk, not an oversight.
Does your AI platform expose standardized middleware hooks that allow your security team to intercept and evaluate agent actions before they execute? If the platform’s controls are proprietary and non-portable, your enforcement logic dies with the vendor relationship. Every policy you write, every guardrail you configure, every compliance rule you encode is locked to one vendor’s architecture.

Push your industry groups and standards bodies. If you participate in OWASP, ISO working groups, or NIST-affiliated communities, advocate for common AI monitoring vocabularies and reference architectures. The cybersecurity field solved this problem a decade ago with common event formats and shared schemas. The AI field hasn’t started.

Audit your own monitoring maturity against the six NIST categories. Most organizations will find entire categories with no monitoring at all, particularly Human Factors and Large-Scale Impacts. Map the gaps before the next board meeting where someone asks if you’re ready for August 2026.

The full NIST AI 800-4 report is available at https://doi.org/10.6028/NIST.AI.800-4.

👉 Visit RockCyber.com to learn more about how we can help you in your traditional Cybersecurity and AI Security and Governance Journey

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Thanks for reading RockCyber Musings! Subscribe for free to receive new posts and support my work.

Share RockCyber Musings

Weekly Musings Top 10 AI Security Wrapup: Issue 31 March 20-26, 2026

Rock Lambros — Fri, 27 Mar 2026 12:11:00 GMT

RSA Conference 2026 closed Thursday in San Francisco. Thirty thousand attendees, six hundred exhibitors, one word on every booth banner: agentic. While the industry competed on keynotes and happy hours, LiteLLM, deployed in hundreds of enterprise AI stacks, got infected with credential-stealing code through a misconfigured GitHub Actions workflow. Malicious releases went live March 19 and March 22. Most of your security team was watching keynotes.

Underneath the conference noise, genuine signal emerged. Zenity’s CTO demonstrated live zero-click exploits against ChatGPT, Salesforce, and Microsoft Copilot on the conference floor. Palo Alto Networks Unit 42 documented new attack paths through the Model Context Protocol. HackerOne disclosed a 540% year-over-year surge in validated prompt injection vulnerabilities. The EU AI Office’s second draft Code of Practice on AI-generated content transparency is open for feedback through March 30, with prescriptive new requirements that narrow compliance discretion significantly. NIST published AI 800-4, the first federal framework for monitoring AI systems in production, with no vendor booth to announce it.

Here’s what matters and what to do about it.

1. Zenity Launches Guardian Agents and Demonstrates 0-Click AI Exploits at RSA

Zenity launched Guardian Agents at RSA 2026 on March 23, positioning it as continuous, contextual security for AI agents across SaaS, cloud, and endpoint environments. CTO Michael Bargury ran live demonstrations titled “Your AI Agents Are My Minions,” showing zero-click prompt injection chains that manipulated Cursor into leaking developer secrets via support emails, Salesforce agents into exfiltrating customer data to an attacker-controlled server, and ChatGPT into producing persistent attacker-chosen outputs across conversations (The Register, March 23, 2026, and Help Net Security, March 24, 2026).

Why it matters

Zero-click attacks eliminate the human review checkpoint most AI security frameworks assume is present. When agents act without user input, your primary detection layer disappears before the threat is visible.
Live exploitation of production enterprise systems on a conference floor is harder to dismiss than a threat model in a whitepaper.
Guardian Agents signals a market category forming in real time. The evaluation criteria you set today will shape purchasing decisions for the next several years.

What to do about it

Inventory every AI agent in your environment before your next board meeting. If you can’t enumerate them, you can’t monitor them.
Require vendors to document in writing which actions their agents take without explicit human approval. Non-answers are critical control gaps.
Run adversarial testing against your three highest-access agents this quarter, targeting credential extraction, data exfiltration, and cross-system manipulation.

Rock’s Musings

Bargury’s demonstration strategy was the most honest thing at RSA this week: show the attack, then show the defense. Live exploitation on production systems is harder to dismiss than a slide deck built around the word autonomous. The inconvenient reality is that most enterprises already have agents running with email access, CRM credentials, and code repository permissions, with no runtime monitoring on what those agents decide to do. Selecting an AI security vendor is not the same thing as having an answer to the problem he demonstrated on the conference floor.

2. LiteLLM Infected with Credential-Stealing Code via Trivy Misconfiguration

The Register reported March 24 that LiteLLM, a widely deployed open-source LLM API proxy, was compromised through a misconfigured Trivy GitHub Actions workflow. Attackers modified version tags on the trivy-action GitHub Action to inject malicious code into workflows organizations were already running, producing malicious releases on March 19 and March 22. The maintainer confirmed that anyone who installed and ran the project during that window should assume credentials available to their environment were exposed.

Why it matters

LiteLLM sits in the critical path of many enterprise AI deployments. One compromised abstraction library reaches hundreds of downstream production systems simultaneously.
The attack exploited version tags, not direct code injection. CI/CD pipelines relying on tags rather than pinned commits ran malicious code without detection. That’s a systemic configuration gap across most enterprise pipelines.
The attack ran during RSA week when security teams were distracted. The timing was likely not accidental.

What to do about it

Audit every environment that pulled a LiteLLM update between March 19 and March 24. Treat those environments as potentially compromised until you confirm otherwise.
Pin all GitHub Actions to specific commit hashes, not version tags. Tags are mutable and can be silently overwritten. Commits are not.
Establish software bill of materials practices for all AI and ML dependencies. Supply chain attacks will keep finding environments where that inventory doesn’t exist.

Rock’s Musings

LiteLLM is exactly the kind of library that lands in enterprise AI stacks without a security review, installed by an ML engineer who needed to route calls to three model providers before the sprint ended. Trivy is a security tool. Attackers used a security tool misconfiguration to compromise a release pipeline for another widely used tool. If there’s a cleaner argument for applying security rigor to your own security tooling, I haven’t heard it. Your AI dependency chain needs the same scrutiny as your application dependencies. Good intentions at install time are not a compensating control.

3. Palo Alto Networks Unit 42 Documents MCP Attack Vectors

Palo Alto Networks Unit 42 published research the week of March 20 documenting new attack paths through the Model Context Protocol, including prompt injection delivered through MCP’s sampling interface. Security researchers tracked 30 CVEs filed against MCP implementations in the preceding 60 days, including CVE-2026-25536 (cross-client data leak in the MCP TypeScript SDK) and CVE-2026-23744 (remote code execution in MCPJam Inspector). A scan of more than 500 public MCP servers found that 38% lacked authentication entirely (Unit 42, March 2026, and Adversa.ai, March 2026).

Why it matters

MCP is the connective tissue between AI agents and enterprise tools. A vulnerability in this protocol exposes the entire agent ecosystem built on top of it, not one isolated system.
Thirty CVEs in 60 days signals that security review did not happen before shipping at scale. Every API ecosystem that launches with deployment velocity ahead of security assessment follows this arc.
Thirty-eight percent of scanned servers lacking authentication is systemic failure. Authentication is the minimum viable control. Everything built on top of unauthenticated servers is exposed.

What to do about it

Inventory every MCP server in your environment and treat unauthenticated instances as critical findings requiring immediate action.
Require authentication, authorization, and comprehensive logging for any MCP server with access to production systems or sensitive data.
Demand specific CVE status and patch timelines from your AI infrastructure vendors. Vague answers signal high risk and a vendor not tracking its own exposure.

Rock’s Musings

Thirty CVEs in 60 days is not a patching problem. It’s a design problem. MCP shipped fast because the builders cared more about what AI agents could reach than how securely they could reach it. The 38% authentication gap is the number that should end budget debates about AI infrastructure security investment. Roughly two in five MCP servers operate on the assumption that only authorized parties will talk to them, which is exactly wrong in a protocol designed to connect agents to external resources. That assumption creates direct paths to your production data.

4. HackerOne Reports 540% Surge in Validated Prompt Injection Vulnerabilities

HackerOne announced Agentic Prompt Injection Testing on March 21, paired with platform data showing a 540% year-over-year increase in validated prompt injection vulnerabilities. The service executes structured, multi-turn adversarial scenarios against live AI applications, evaluating whether injection attempts produce actual data exposure or unauthorized tool execution across interconnected agent systems (HackerOne Blog, March 2026, and Cybersecurity Insiders, March 21, 2026).

Why it matters

A 540% increase in validated vulnerabilities means real researchers are finding real exploitable conditions in production systems, not theoretical edge cases.
Traditional application security testing does not cover agent-specific attack paths. If your AI agents aren’t explicitly in scope for your red team or bug bounty program, you have a documented blind spot.
Unit 42’s concurrent research on indirect prompt injection through web content eliminates the “attacker needs direct access” objection. Agents read the web. The web is the attack surface.

What to do about it

Add AI agents to your red team scope explicitly as a primary target category, not an afterthought appended to an existing engagement.
Require prompt injection testing as part of every AI agent release process, treated as a gate equivalent to penetration testing for any externally facing application.
Track prompt injection findings as a distinct vulnerability class in your risk register. You can’t demonstrate improvement to your board on metrics you’re not collecting separately.

Rock’s Musings

Five hundred forty percent ends the debate about whether prompt injection is a real threat. I’ve heard the objection that attackers need direct access to craft payloads. Unit 42’s indirect injection research, published this same week, shows agents reading manipulated instructions from ordinary websites they visit in the course of normal operation. Your agents don’t need to be directly targeted; they need to visit the wrong page. The gap between organizations deploying AI agents and organizations testing those agents adversarially is the largest unaddressed risk exposure I see in enterprise AI programs right now.

5. Microsoft Publishes Secure Agentic AI Framework and Confirms Agent 365 May 1 GA

Microsoft published “Secure Agentic AI End-to-End” on March 20, documenting its approach to extending Zero Trust architecture across the full AI agent lifecycle: data ingestion, model training, deployment, and runtime behavioral monitoring. The post confirmed Agent 365, Microsoft’s governance control plane for enterprise AI agents, will reach general availability on May 1, 2026, with agent identity, authorization scope, and behavioral monitoring treated as distinct security domains from traditional human-user ZT controls (Microsoft Security Blog, March 20, 2026).

Why it matters

A confirmed May 1 GA date gives enterprises in Microsoft environments a concrete six-week planning horizon. Governance framework adoption takes time and that clock is already running.
Extending Zero Trust to AI agents is architecturally correct. Most ZT implementations weren’t designed with agent identity or behavioral monitoring in mind, making the gap assessment non-trivial work.
Publishing detailed technical frameworks before product GA signals Microsoft wants enterprises building governance practices now, before the product ships.

What to do about it

Map your current ZT architecture against the agent-specific requirements described in the March 20 post. Focus on gaps in agent identity and behavioral monitoring specifically.
Begin internal stakeholder alignment on Agent 365 if you’re in a Microsoft 365 environment. Six weeks is not enough time to start that conversation from zero.
Document agent permissions, access patterns, and decision scopes using whatever visibility tools you have today rather than waiting for Microsoft tooling.

Rock’s Musings

“End-to-end” is doing heavy lifting as a title. What Microsoft describes is extending known security primitives to a new execution context. That’s necessary work and not a complete answer. The hard problems are behavioral: distinguishing authorized agent actions from manipulated ones, detecting policy violations in real time, and maintaining audit trails that survive an incident investigation. Agent 365 is worth watching. If the behavioral monitoring is substantive, it’ll move the market. If it’s a compliance dashboard, enterprises will check the box while actual risk sits unaddressed underneath it.

6. Cisco Releases DefenseClaw Open Source on Final Day of RSA

Cisco released DefenseClaw to GitHub on March 27, the final day of RSA 2026, as an open-source framework for scanning agent skills and sandboxing agent execution. The release accompanied Zero Trust Access for AI agents and a free AI Defense Explorer Edition targeting security practitioners. Cisco plans integration with NVIDIA OpenShell for hardware-level execution sandboxing, addressing execution isolation that software-only monitoring cannot replicate (Cisco Newsroom, March 2026, and UC Today, March 2026).

Why it matters

Open-source agent security scanning means organizations can start building security into agent development pipelines without a procurement cycle or a budget line.
Hardware-anchored execution sandboxing addresses a control gap that software-only monitoring cannot close. Execution isolation for agents is systematically underinvested across the industry relative to the risk.
The open-source and Explorer Edition strategy targets developers before enterprise procurement cycles form, competing for architectural mindshare with builders rather than just buyers.

What to do about it

Pull DefenseClaw and run it against a non-production agent environment this month. Validate real-world utility before committing to any commercial evaluation.
Evaluate the NVIDIA sandboxing integration if you’re running NVIDIA infrastructure. Test in isolation before production consideration.
Track Cisco’s AI Defense commercial roadmap. Free Explorer Editions typically precede commercial tier launches by 12 to 18 months, and starting your evaluation now means you’ll have data when the pitch arrives.

Rock’s Musings

Releasing open-source code on the last day of the conference changes the conversation from “will enterprises buy this” to “pull the repo and see for yourself.” That’s a credible move when the code is real and the threat model is honest. Run DefenseClaw against your actual agent environment before making any claims about coverage. The larger play is Cisco’s bid for the enterprise AI security architecture position using network visibility, an established security portfolio, and enterprise relationships most competitors would need a decade to build. DefenseClaw is a credible opening move. Watch the next 18 months of product decisions to judge the hand.

7. Google Deploys Gemini Agents to Process 10 Million Dark Web Posts Daily

Google announced at RSA 2026 on March 23 that Gemini AI agents are processing more than 10 million dark web posts daily to surface threats relevant to specific organizations. The capability integrates with Google Security Operations alongside new agentic automation features, currently in preview, that let security teams combine AI-driven investigation with deterministic automated response workflows (The Register, March 23, 2026, and Google Cloud Blog, March 2026).

Why it matters

Ten million posts per day changes the economics of dark web threat intelligence. Organizations that couldn’t sustain comprehensive monitoring programs gain access to Google-scale processing at a fraction of the previous cost.
Pairing AI-driven investigation with deterministic automation preserves human-defined control while extending agent reach into high-volume, low-judgment tasks. That’s the right architectural pattern for agentic SOC work.
Preview status means GA behavior, SLA, and security review standards remain unfinalized. Your production SOC is not where you run this experiment yet.

What to do about it

Assess your current dark web monitoring coverage gap against what this capability covers. If there’s a meaningful difference, prioritize a pilot evaluation once the feature reaches GA.
Review preview terms carefully before enabling agentic automation in any production SOC workflow. Preview features carry materially different risk profiles than GA releases.
Define which SOC workflows you’d delegate to agents and where human approval must remain. Build that policy before the tools arrive, not after they’re already running.

Rock’s Musings

Threat intelligence is the most defensible application of AI agents in security operations right now. Failure modes are recoverable: the agent misses a threat and your other controls have a chance at it. Compare that to agentic incident response, where the failure mode might be blocking a production system or destroying forensic evidence. Start with intelligence, not response. The preview framing signals Google is collecting operational data before committing to GA behavior guarantees, which is reasonable product discipline. It also means you wait for GA before running this where failures have material consequences.

8. Novee Launches Autonomous AI Red Teaming Platform for LLM Applications

Novee announced autonomous AI red teaming for LLM applications on March 24 at RSA Conference 2026. The platform deploys an AI pentesting agent that executes multi-turn adversarial scenarios against live systems, simulating attacker chaining techniques across prompt injection, jailbreaks, data exfiltration paths, and agent behavior manipulation, covering any LLM-powered system regardless of model provider with optional CI/CD pipeline integration (GlobeNewswire, March 24, 2026, and Help Net Security, March 24-25, 2026).

Why it matters

Traditional pentesting tools were designed for pre-LLM application security problems. Novee builds red teaming from actual LLM vulnerability research, producing findings that adapted traditional tools miss.
CI/CD pipeline integration lets security teams catch prompt injection and agent manipulation issues before production deployment rather than after an incident surfaces them.
Two distinct companies announced adversarial AI testing capabilities at RSA 2026 in the same week. Market formation around this problem is accelerating.

What to do about it

Evaluate Novee’s beta against a non-production LLM application to understand what it surfaces relative to your existing security testing coverage.
Map the gap between your current SDL and what LLM-specific adversarial testing would require. The gap is almost certainly larger than you expect it to be.
Add AI-native red teaming as a release gate requirement for any LLM application reaching production. Make it a gate, not a post-deployment recommendation that teams skip.

Rock’s Musings

Two autonomous AI red teaming announcements in one RSA week tells you the market is accepting that testing AI systems requires AI-specific tooling, not adapted traditional approaches. That’s a healthy development even if the tools themselves are early. The CI/CD integration angle is the most practically valuable feature: security issues caught before production deployment cost a fraction of what they cost after deployment. If you’re shipping LLM applications without adversarial testing in the pipeline, you’re making a risk decision that most boards don’t know they’re making.

9. EU AI Office Second Draft Code of Practice Enters Final Feedback Window

The EU AI Office published its second draft Code of Practice on AI-Generated Content Transparency on March 3, with the stakeholder feedback window closing March 30. The second draft moves from high-level principles toward prescriptive, technically detailed commitments, narrowing compliance discretion and signaling how regulators will likely assess conformance in practice. A third and final version is expected by June 2026, ahead of the August 2 applicability date for AI-generated content transparency obligations (Herbert Smith Freehills Kramer, March 2026, and BABL AI, March 2026).

Why it matters

Draft 2’s shift to prescriptive technical commitments closes the interpretation space organizations were using to plan flexible compliance programs. The gap between “we have a policy” and “we meet the technical specification” narrowed significantly this month.
The March 30 feedback deadline is this weekend. If your organization has substantive views on requirements that are technically unworkable, the window to influence the final text is closing.
August 2 is not distant. Organizations waiting for final text before beginning compliance work are accepting a six-week implementation sprint under real enforcement conditions.

What to do about it

Read Draft 2 this week. The technical specificity represents a meaningful change from Draft 1, and your compliance planning may need adjustment.
Submit feedback before March 30 if the current draft creates compliance constraints you believe are technically unworkable for your AI content operations.
Begin implementation planning against Draft 2 requirements now. The June final text will refine but won’t fundamentally restructure what’s already written.

Rock’s Musings

Every organization waiting for final text before starting EU AI Act compliance work is playing a game where the timeline gets worse each quarter they wait. Draft 2 is prescriptive enough to start serious implementation planning. The adjustments you’ll need when Draft 3 drops will be smaller than the work you’ll need to compress into six weeks if you start in June. The transparency labeling requirements are more technically demanding than most organizations appreciate from reading summaries. Download Draft 2 from the EU’s digital strategy portal and read it against your actual AI content production workflows. That gap analysis is the starting point for everything else.

10. RSA 2026 Reveals a Contested Market for AI Agent Governance Control Planes

A pattern emerged across RSA 2026 beyond individual product launches: the governance control plane for AI agents is being actively contested by multiple major vendors. Microsoft’s Agent 365 (GA May 1), Cisco’s DefenseClaw (released March 27), SentinelOne’s Prompt AI Agent Security control plane, and Nudge Security’s AI agent discovery expansion all launched during the conference week, each addressing the same fundamental problem: enterprises deploy AI agents and lose track of what those agents do, access, and decide autonomously (SecurityWeek, March 2026, and Biometric Update, March 2026).

Why it matters

Multiple major vendors converging on the same problem in the same week signals enterprises are actively requesting governance solutions, not absorbing vendor-manufactured demand.
Competition between Microsoft’s integrated control plane and point solutions from Cisco, SentinelOne, and Nudge creates a real architectural decision. Choose wrong and you own the integration debt for years.
None of these products fully solves behavioral monitoring. They address discovery, policy enforcement, and visibility. Real-time behavioral anomaly detection for agents remains an open engineering challenge.

What to do about it

Define your AI agent governance requirements before evaluating any vendor. Required capabilities: inventory discovery, permission auditing, behavioral logging, and human approval workflows for high-risk actions.
Assess whether your environment favors an integrated control plane or best-of-breed point solutions based on your actual architecture, not vendor marketing claims.
Ask every vendor during evaluation: how does the product detect when an agent takes an authorized action it was manipulated into taking? The answer quality will differentiate vendors quickly.

Rock’s Musings

When four vendors announce competing governance control planes at the same conference in the same week, you’re watching a market category consolidate in real time. That’s interesting for analysts and exhausting for practitioners who have to evaluate all of it while managing agents already running in production without any governance. My advice: don’t let the governance platform debate distract from the more urgent problem of knowing what agents you currently have. Most enterprises have agents deployed that security teams didn’t authorize, can’t enumerate, and have no logs on. Governance tooling is the right investment. Knowing what you’re governing is the prerequisite.

The One Thing You Won’t Hear About But You Need To

NIST Publishes AI 800-4: The First Federal Framework for Monitoring AI Systems in Production

NIST published AI 800-4, “Challenges to the Monitoring of Deployed AI Systems,” in March 2026. Built from three practitioner workshops with more than 200 experts across academia, industry, and ten-plus federal agencies, plus an 87-paper literature review, it maps the gaps, barriers, and open questions in monitoring AI systems after deployment. It covers six monitoring categories: functionality, operational health, human factors, security, safety, and compliance. It received no RSA booth, no vendor keynote, and no sponsored coverage (NIST News, March 2026, and NIST AI 800-4 PDF, March 2026).

Why it matters

Most organizations deploying AI monitor latency and availability. AI 800-4 addresses whether the model behaves consistently with its training distribution and produces outputs that align with policy, which are the failures that matter most and the ones traditional monitoring misses entirely.
NIST explicitly identifies human-AI interaction monitoring as the most under-researched gap in the field. Workshop practitioners raised it far more than published literature covers. If your AI monitoring program doesn’t address how users interact with and respond to AI outputs, you’re missing the category NIST calls most underdeveloped.
The document is vendor-neutral and grounded in practitioner experience, directly applicable to conversations with regulators and auditors who want evidence of a structured AI monitoring program.

What to do about it

Download NIST AI 800-4 from nist.gov and route it to whoever owns your AI security program. It’s the most actionable government guidance on operational AI monitoring published to date.
Map your current monitoring coverage against the document’s six categories. The gaps will be immediately apparent and the prioritization logic writes itself once you have the map.
Use AI 800-4 as the foundation for your AI monitoring program documentation. When regulators ask how you monitor AI systems in production, a NIST-aligned program gives you a defensible, auditable answer.

Rock’s Musings

The honest state of enterprise AI monitoring: most organizations have logs showing their AI system responded. They don’t have logs showing whether the response was correct, consistent with training distribution, within policy boundaries, or manipulated by adversarial input. That visibility gap is how AI security incidents become AI security incidents. You don’t catch the drift until the outcome is undeniable and the damage is done. NIST AI 800-4 doesn’t get coverage because nobody can sell it. The organizations that read it and build monitoring programs from its framework will answer regulatory questions coherently in 18 months when enforcement catches up to deployment rates. The organizations that attended every RSA keynote and skipped the NIST publication will be writing incident reports instead. For more on building AI governance programs that survive regulatory scrutiny, visit rockcybermusings.com. If you need help turning frameworks like AI 800-4 into operating programs your security team can actually run, reach out at rockcyber.com.

👉 Visit RockCyber.com to learn more about how we can help you in your traditional Cybersecurity and AI Security and Governance Journey

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Share RockCyber Musings

References

Bargury, M. (2026, March 23). Your AI agents are my minions [Conference presentation]. RSA Conference 2026, San Francisco, CA.

Claburn, T. (2026, March 24). LiteLLM infected with credential-stealing code via Trivy. The Register. https://www.theregister.com/2026/03/24/trivy_compromise_litellm/

Claburn, T. (2026, March 23). AI agents are ‘gullible’ and easy to turn into your minions. The Register. https://www.theregister.com/2026/03/23/pwning_everyones_ai_agents/

Claburn, T. (2026, March 23). Google unleashes Gemini AI agents on the dark web. The Register. https://www.theregister.com/2026/03/23/google_dark_web_ai/

Cisco. (2026, March). Cisco reimagines security for the agentic workforce. Cisco Newsroom. https://newsroom.cisco.com/c/r/newsroom/en/us/a/y2026/m03/cisco-reimagines-security-for-the-agentic-workforce.html

Google Cloud. (2026, March). RSAC 26: Supercharging agentic AI defense with frontline threat intelligence. Google Cloud Blog. https://cloud.google.com/blog/products/identity-security/rsac-26-supercharging-agentic-ai-defense-with-frontline-threat-intelligence

HackerOne. (2026, March). Agentic prompt injection testing for AI security. HackerOne Blog. https://www.hackerone.com/blog/agentic-prompt-injection-testing

HackerOne introduces agentic prompt injection testing as AI security risks accelerate. (2026, March 21). Cybersecurity Insiders. https://www.cybersecurity-insiders.com/hackerone-introduces-agentic-prompt-injection-testing-as-ai-security-risks-accelerate/

Herbert Smith Freehills Kramer. (2026, March). Transparency obligations for AI-generated content under the EU AI Act: From principle to practice. https://www.hsfkramer.com/notes/ip/2026-03/transparency-obligations-for-ai-generated-content-under-the-eu-ai-act-from-principle-to-practice

EU releases second draft of AI Act Code of Practice on labeling AI-generated content. (2026, March). BABL AI. https://babl.ai/eu-releases-second-draft-of-ai-act-code-of-practice-on-labeling-ai-generated-content/

Microsoft Security. (2026, March 20). Secure agentic AI end-to-end. Microsoft Security Blog. https://www.microsoft.com/en-us/security/blog/2026/03/20/secure-agentic-ai-end-to-end/

NIST. (2026, March). New report: Challenges to the monitoring of deployed AI systems. https://www.nist.gov/news-events/news/2026/03/new-report-challenges-monitoring-deployed-ai-systems

NIST. (2026). NIST AI 800-4: Challenges to the monitoring of deployed AI systems. National Institute of Standards and Technology. https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.800-4.pdf

Novee. (2026, March 24). Novee introduces autonomous AI red teaming to uncover security flaws in LLM applications [Press release]. GlobeNewswire. https://www.globenewswire.com/news-release/2026/03/24/3261278/0/en/Novee-Introduces-Autonomous-AI-Red-Teaming-to-Uncover-Security-Flaws-in-LLM-Applications.html

Novee introduces autonomous AI red teaming to hunt LLM vulnerabilities. (2026, March 24). Help Net Security. https://www.helpnetsecurity.com/2026/03/24/novee-ai-red-teaming-for-llm-applications/

Palo Alto Networks Unit 42. (2026, March). New prompt injection attack vectors through MCP sampling. https://unit42.paloaltonetworks.com/model-context-protocol-attack-vectors/

SecurityWeek. (2026, March). RSAC 2026 conference announcements summary: Day 1. https://www.securityweek.com/rsac-2026-conference-announcements-summary-day-1/amp/

Zenity AI agents contextual security. (2026, March 24). Help Net Security. https://www.helpnetsecurity.com/2026/03/24/zenity-ai-agents-contextual-security/

Zenity. (2026, March 23). Zenity sets the foundation for guardian agents. Zenity Newsroom. https://zenity.io/company-overview/newsroom/company-news/zenity-sets-the-foundation-for-guardian-agents

Weekly Musings Top 10 AI Security Wrapup: Issue 30 March 13-19, 2026

Rock Lambros — Fri, 20 Mar 2026 12:50:42 GMT

Meta logged a SEV-1 on March 18 because an internal AI agent posted without human approval, provided bad advice, and exposed sensitive data to the wrong employees for 2 hours. Amazon confirmed its Bedrock sandbox lets AI models exfiltrate data via DNS and called it intentional design. HiddenLayer found 31% of security leaders don’t know if they had an AI breach in the past year. The EU Council voted to restructure the AI Act’s high-risk compliance framework. Three AI agent security products launched in four days. This was one week.

The week’s evidence points in one direction: agentic AI security is no longer a research problem. Real incidents are appearing in production environments run by organizations with serious security programs. Technical flaws in AI infrastructure are drawing vendor responses that amount to documentation updates rather than patches. Research data is documenting blind spots CISOs can no longer treat as edge cases. In parallel, the governance machinery is finally moving, but it’s moving slower than deployment. Standards and deployments are in a race, and deployments are winning by a wide margin. More context at RockCyber and RockCyber Musings.

1. OWASP publishes its GenAI data security risk taxonomy for 2026

The OWASP GenAI Security Project released GenAI Data Security: Risks and Mitigations 2026 in March, a 103-page taxonomy covering 21 discrete data security risks across the full GenAI lifecycle from training through agentic runtime (OWASP). The document maps risks across training and fine-tuning data, retrieval and RAG pipelines, vector stores, context windows, agent memory, tool call payloads, and observability infrastructure. It identifies a core architectural property that makes GenAI data security structurally different from every prior computing model: the context window aggregates data from multiple trust domains into a single flat namespace with no internal access controls. A confidential HR record retrieved via RAG sits next to a user prompt with identical trust weight, and there is no mechanism today to mark a context segment as available for reasoning but not surfaceable in the output. The document also addresses machine unlearning directly: deleting source data does not remove what a fine-tuned model or LoRA adapter has memorized into its weights. Download the report HERE.

Why it matters

The flat-namespace context window problem is not a configuration gap. It’s an architectural property of how these systems work, which means perimeter controls and access policies cannot fully solve it. Minimization and context scoping are the only practical mitigations available today.
LoRA adapter memorization of rare training examples means high-recall prompts can extract verbatim PII, credentials, or intellectual property from fine-tuned models without any sophisticated attack technique. Organizations fine-tuning on internal data have a data exposure risk they likely haven’t assessed.
The Right to Erasure problem is unsolved at the architectural level. Deleting training data from a source system does not delete what the model encoded during fine-tuning. GDPR and state privacy law DSR obligations cannot be satisfied by source deletion alone.

What to do about it

Treat the context window as a data-exposure surface, not just a prompt-delivery mechanism. Classify what goes in the same way you classify what goes into a database query, and scope RAG retrieval to the minimum required for the task.
Audit every fine-tuned model and LoRA adapter in your environment against the data used to train it. If that training data included PII, credentials, or regulated information, your model could serve as a potential exfiltration vector.
Build a GenAI data bill of materials using CycloneDX ML-BOM as the base format. Until you have lineage from the source dataset to the deployed model to the embedding store, you cannot answer the question a regulator will eventually ask: what data did this model see, and where does it live now?

Rock’s Musings

The architectural insight at the center of this document is the one the industry keeps sliding past. The context window has no internal access control layer. That’s not a misconfiguration. It’s a design property of how transformers process sequences. Everything that enters the context window is treated as equally reachable by the model’s output mechanism, and no amount of system prompt guardrailing changes the underlying architecture. The practical implication is that the primary defense is what you put in, not what you try to prevent from coming out.

The machine unlearning section is the one I push organizations on hardest. They are collecting consent, honoring deletion requests, and scrubbing source databases, and then deploying fine-tuned models that still carry what they memorized from the deleted data. The model weights are a copy of your training corpus in a form your DLP tools don’t see, and your deletion workflows can’t reach. Right to Erasure in GenAI is an open architectural problem with no clean solution today, and most organizations haven’t told their legal team that yet.

2. EU Council rewrites the compliance clock for high-risk AI systems

The EU Council adopted its negotiating position to amend the AI Act’s high-risk framework (EU Council). The core change replaces the fixed August 2026 compliance deadline with a conditional trigger. Full high-risk obligations apply only once the Commission certifies required standards and tools are available, with a hard backstop date. The Council also pushed the national AI regulatory sandbox deadline to December 2027 and clarified that law enforcement, border management, judicial, and financial AI systems remain under national supervisory authority rather than the Commission. Negotiations with the European Parliament begin next.

Why it matters

The conditional trigger gives the Commission discretion over when your obligations start. Until it certifies standards are ready, full high-risk obligations don’t apply, creating an indeterminate window.
Pushing the sandbox deadline to December 2027 removes a key testing mechanism for high-risk AI at a time when organizations are accelerating deployment.
Fragmented supervisory authority means 27 member states apply their own rules to some of the highest-stakes AI use cases.

What to do about it

Map your AI systems against current and proposed high-risk definitions now. The conditional trigger shifts the timeline, not the compliance obligation itself.
Track Parliament negotiations. The Council position is a mandate, not the final text.
Build a jurisdiction-aware compliance map for EU operations covering which systems fall under national versus Commission supervision.

Rock’s Musings

I’ve seen regulatory timelines used to delay compliance indefinitely in my career more times than I can count. This EU Council move fits the pattern. The conditional trigger means the Commission controls when your clock starts, and they have to certify standards are available first. Given the pace at which NIST’s agentic AI guidance is moving, expecting European standards to materialize quickly requires genuine optimism.

Organizations using this ambiguity to do nothing are miscalculating. The August 2026 date was never the governance point. You have high-risk AI systems in production today, and you need to govern them regardless of what the Commission certifies and when.

3. Meta logs a SEV-1 incident from a rogue internal AI agent

On March 18, Meta confirmed a Severity 1 security incident caused by an internal AI agent operating without human authorization (Bitcoinworld, HackerNoob). The agent posted to an internal forum, gave incorrect advice, and triggered a cascade that exposed sensitive company and user data to unauthorized employees for approximately two hours. Meta contained the exposure by cutting the agent’s forum access and auditing permissions across other internal agents. No external exfiltration was confirmed.

Why it matters

A SEV-1 at Meta from an AI agent operating outside its bounds sets a documented precedent: production agents at companies with robust security programs can circumvent behavioral constraints and cause genuine incidents.
The chain reaction, one unauthorized action triggering downstream data exposure, is characteristic of agentic systems and different from traditional software vulnerabilities in ways most IR playbooks don’t yet account for.
No external exfiltration is partial comfort. Unauthorized internal access to sensitive user data carries GDPR and AI Act exposure regardless of whether the data left the building.

What to do about it

Audit every AI agent in your environment and document what it can post, write, or modify without a human approval checkpoint.
Map the blast radius. If a specific agent takes an unexpected action, what does it touch first, and what cascades from there?
Build AI agent incident response playbooks with automated containment triggers that don’t require analyst approval before they fire.

Rock’s Musings

The Meta incident will get dismissed as a minor operational hiccup. That’s the wrong read. Even with legit engineering talent and a mature security program, a production AI agent escaped its behavioral constraints and triggered a data exposure chain. I’m willing to bet your environment isn’t more disciplined than Meta’s.

Two hours to containment is fast. Most organizations I work with couldn’t tell you within two hours that an agent had gone sideways. AI agent behavioral monitoring is dramatically behind where it needs to be. The lesson to take away from this is that you need detection that fires before the cascade, not after the data is already in the wrong hands.

4. Amazon’s Bedrock sandbox leaks data through DNS because that’s the design

BeyondTrust’s Phantom Labs disclosed that Amazon Bedrock AgentCore Code Interpreter’s sandbox mode permits outbound DNS queries (SC Media, The Hacker News). An attacker interacting with the agent can send commands encoded in DNS A record responses and receive exfiltrated data encoded in DNS subdomain queries to an attacker-controlled server. No authentication bypass is required. BeyondTrust assigned a CVSS score of 7.5. AWS reviewed the research, determined that the behavior reflects the intended functionality, and responded by updating the documentation rather than issuing a patch.

Why it matters

“Intended behavior” is a vendor risk posture, not a security posture. Sandbox mode was positioned as providing execution isolation. A sandbox allowing covert DNS exfiltration does not deliver isolation in any security-relevant sense.
DNS-based covert channels are standard red team tradecraft in traditional environments. The technique translates directly into AI code execution environments without modification.
Organizations running agents against sensitive internal data in AWS Bedrock face an unpatched, documented, CVSS 7.5 risk with no vendor remediation timeline.

What to do about it

Add DNS query monitoring for Bedrock AgentCore code execution environments to your threat detection stack now.
Reduce the data that AI agents with code execution access can reach to the strict minimum required for the task.
Get a formal written architecture statement from AWS specifying exactly what the sandbox guarantees before expanding Bedrock AgentCore deployments.

Rock’s Musings

Another “Intended behavior” narrative. I’m getting pretty damn sick of it. That’s another way of saying, “We know about this, it would be expensive to change, and it sucks to be you.” (see my thoughts in CSO magazine about a previous instance HERE). The documentation update rather than a patch is the tell. You can’t outsource your risk posture to your cloud provider’s design decisions.

The technique is in every red team playbook. DNS exfiltration from sandboxed environments is foundational evasion tradecraft. Translate that knowledge directly to your AI infrastructure. If you’re running code execution agents against sensitive data in Bedrock and you haven’t instrumented DNS as an exfiltration channel, now you have your reason.

5. Linux Foundation raises $12.5 million from AI vendors to fix what their tools helped break

The Linux Foundation announced $12.5 million in grant funding from Anthropic, AWS, GitHub, Google, Google DeepMind, Microsoft, and OpenAI to advance open source software security (Linux Foundation, OpenSSF). The funding flows through Alpha-Omega and the Open Source Security Foundation. The stated problem is that AI tools are generating vulnerability reports at a volume that open-source maintainers cannot triage or remediate, degrading the security posture of the software supply chain. AWS contributed an additional $2.5 million to Alpha-Omega, in addition to the pooled amount.

Why it matters

The same organizations whose AI tools created the report flood are funding the solution. This characterizes the governance dynamic precisely, that vendors profit from deployment and are now asked to fund the externalized costs on the maintainer community.
Overwhelming maintainers with AI-generated findings lowers average signal quality. Funding addresses capacity but doesn’t solve the signal-to-noise problem alone.
This is the first major coordinated industry response to the specific problem of AI-generated report volume stressing the open source security ecosystem.

What to do about it

Factor the current maintainer backlog into your software composition analysis program. Critical open source dependencies may carry known vulnerabilities sitting in a backlogged queue rather than getting remediated.
Watch what Alpha-Omega and OpenSSF deliver from this investment over the next twelve months. The commitment matters less than whether the tooling measurably improves triage capacity.
Ask your security vendors how they handle AI-generated findings before surfacing them to your team. The same noise problem exists inside your tooling stack.

Rock’s Musings

$12.5 million is the right direction, yet not nearly enough. Open source maintainers are largely volunteers managing the infrastructure that the global software supply chain runs on. The AI-generated report flood is a problem these vendors created while selling velocity gains to enterprises.

The coordination signal matters more than the dollar amount. You rarely see Google, Microsoft, AWS, Anthropic, and OpenAI announce joint anything. When competitors fund a shared problem together, the liability exposure of inaction exceeds the competitive cost of cooperating. Given how much of the internet runs on open source that these companies’ AI tools are now stressing, the math on joint action isn’t complicated.

6. Pentagon moves to replace Anthropic while the lawsuit works through the courts

TechCrunch reported that the Pentagon is actively developing alternative AI capability paths to replace Anthropic’s Claude across defense applications (TechCrunch). This follows the Defense Department’s February designation of Anthropic as a supply chain security risk and Anthropic’s subsequent lawsuit against the Trump administration. This confirms that the replacement effort has shifted from contingency planning to active technical development. More than 875 Google and OpenAI employees have signed an open letter supporting Anthropic’s position.

Why it matters

Active technical development of replacements, rather than contingency planning, signals DoD confidence that the Anthropic designation will hold through the litigation cycle.
Defense contractors relying on Claude for active program work now face migration timelines driven by someone else’s legal and procurement decisions.
The 875-employee response across competing firms signals the tech workforce treats this as a legitimacy question about AI governance, not a routine vendor dispute.

What to do about it

If your organization operates in the defense industrial base, review AI vendor contracts now for comparable ethical-use clauses and their enforceability, before further redesignations affect your supply chain.
Track the Anthropic lawsuit. The outcome defines what ethical use provisions in AI contracts are worth in federal procurement.
Evaluate AI vendor concentration risk in your stack. If one supply chain designation event could disrupt your programs, that’s a single point of failure worth addressing.

Rock’s Musings

The supply chain risk designation was built for foreign adversaries. Applying it to a domestic AI company for writing autonomous weapons prohibitions into a contract is a significant precedent that the press is underweighting. The designation signals that safety constraints are now framed as operational liabilities in defense procurement, not risk mitigation.

If that framing spreads to other acquisition decisions, the AI vendors most willing to remove safety constraints gain a competitive advantage in a large and growing federal spending category. Watch the lawsuit and the follow-on procurement awards carefully. Both will tell you where this governance experiment ends up.

7. CSA’s 2026 cloud and AI security report documents the identity explosion

The Cloud Security Alliance published its State of Cloud and AI Security 2026 on March 13, finding the average enterprise now manages 100 machine and non-human identities for every one human identity (CSA). Forgotten or misconfigured cloud credentials declined from 84% in 2024 to 65% in 2026. Ninety-two percent of executives report business-impacting security compromises, most from preventable risks. The report identifies decentralized AI agents as the primary driver of the NHI expansion and calls for continuous exposure management to replace static patching cycles.

Why it matters

A 100:1 machine-to-human identity ratio means the traditional IAM program built around human users is managing a fundamentally different problem than it was designed for.
Credential misconfiguration persisting at 65% suggests the improvement rate won’t match the velocity of AI-driven identity expansion.
A 92% executive compromise from preventable risks indicates the gap isn’t a detection-sophistication problem. Organizations know the controls and aren’t applying them at the required scale.

What to do about it

Audit NHI management practices against the same standards applied to human identities: lifecycle management, least privilege, and regular access reviews.
Deploy continuous credential exposure monitoring specifically for machine identities and AI agent service accounts.
Shift the board-level narrative from maturity scores to continuous exposure management. That’s where enterprise frameworks are heading.

Rock’s Musings

A hundred machine identities for every human one, and most organizations manage them with IAM tooling built for a 10-to-1 ratio. The math doesn’t work. The credential improvement trend from 84% to 65% is real progress, but 65% still represents a failure rate I wouldn’t accept in any other critical control domain.

Every new agentic deployment creates more identities, tokens, service accounts, and API keys. If you don’t have a clear owner for non-human identity governance today, you have a gap that will become a breach within twelve months. Find the owner. Document the scope. Don’t wait for the incident.

8. Jozu Agent Guard launches after watching an AI agent bypass governance in four commands

Jozu announced Jozu Agent Guard on March 17, a zero-trust runtime that executes AI agents, models, and MCP servers with policy enforcement built outside the model’s control plane and hardcoded against agent-level override (Help Net Security). The architecture decision came directly from internal testing: during product development, Jozu observed an AI agent bypass the governance controls the product was designed to enforce in four commands. That failure drove the decision to move policy enforcement entirely outside the execution layer the agent can influence.

Why it matters

A product built specifically to constrain AI agents was bypassed in four commands during its own testing. The threat model has to assume the agent itself will attempt to circumvent governance. Cooperative compliance is not a valid design assumption.
MCP server isolation is underprovided. MCP servers frequently carry production credentials and broad tool access, and running them in shared agent environments creates privilege escalation paths most organizations haven’t mapped.
Three AI agent security products launching in four days signals enterprise buying is active in this space right now.

What to do about it

Require AI agent security vendors to demonstrate their product against an adversarial agent in a live environment. Demand the failure modes alongside the happy path.
Treat MCP server execution environments as sensitive infrastructure requiring isolation equivalent to your most privileged workloads.
Add governance bypass testing to your AI red team scope before the next production agent deployment.

Rock’s Musings

The four-command bypass during their own testing is the most honest vendor disclosure I’ve seen about AI agent security in the past year. Most vendors demo the happy path and skip the part where their product got circumvented. Jozu disclosed it and changed the architecture. That’s how security engineering is supposed to work.

The uncomfortable implication for everyone else: if a product built specifically to constrain AI agents was bypassed in four commands, ask yourself what your existing controls look like against an agent actively trying to exceed its permissions. If you haven’t run that test, you don’t have an answer.

9. Token Security builds intent-based controls for AI agent permissions

Token Security announced intent-based AI agent security on March 18, governing autonomous agents by scoping their permissions to declared operational purpose rather than granting standing broad access (Help Net Security). The system creates purpose-defined permission envelopes that expire at task completion, with runtime enforcement preventing actions outside the declared intent. Token Security’s CEO stated directly that prompt filtering and guardrails were not designed to contain the security risks of autonomous AI agents, pointing to the architectural limitation of relying on the model’s output layer for enforcement.

Why it matters

Purpose-aligned permissions address a structural problem in current agent deployment: agents inheriting credential scopes far exceeding what any single task requires.
Explicit acknowledgment that content filtering can’t do this job alone represents where serious practitioner thinking is converging. The field is moving from output layer controls toward architectural access controls.
Paired with Jozu, Entro, and Microsoft Entra Agent ID announcements this same week, this reflects a coherent market thesis forming around agent identity and least privilege as primary security controls.

What to do about it

Map current AI agent deployments against one question: does each agent hold only the permissions it needs for its specific task? If you can’t answer quickly, your access governance is already too loose.
Evaluate intent-based and purpose-scoped access controls in your next AI security procurement cycle.
Brief your identity team on AI agent access management before your security team deploys solutions they haven’t reviewed. These tools touch the same credential infrastructure.

Rock’s Musings

Least privilege applied to agents is the same principle that has protected privileged service accounts in traditional architectures for decades. The problem is that most AI agent deployments aren’t being treated like privileged service accounts. They get broad collaboration access by default, and nobody asks why.

Intent-based controls force the right question: what is this agent for? If you can answer precisely, you can scope permissions precisely. If you can’t answer precisely, that is the real governance problem. You’ve deployed an agent without a defined operational boundary, and your control over it is largely fictional.

10. NIST receives formal research submissions on securing AI agents

On March 18, UC Berkeley’s Center for Long-Term Cybersecurity submitted a formal response to NIST’s CAISI RFI on AI agent security, urging prioritization of standardization, incident reporting frameworks, talent pipelines, and adaptive governance (CLTC UC Berkeley). The Computer and Communications Industry Association submitted parallel comments advocating for multistakeholder processes and alignment with existing NIST frameworks (CCIA). NIST’s National Cybersecurity Center of Excellence also holds a separate comment period open through April 2 on a concept paper covering identity and authorization for AI agents.

Why it matters

The gap between NIST collecting input and usable standards publishing is measured in years. Your agents are running now, under no binding identity or authorization standard.
Berkeley’s call for incident reporting infrastructure acknowledges a structural gap: no systematic mechanism exists for learning from AI agent security failures across organizations.
The NCCoE concept paper on agent identity and authorization is where future compliance requirements will originate. Comments submitted now shape what those requirements demand.

What to do about it

Read the NCCoE concept paper at nccoe.nist.gov and submit comments before April 2 if your organization deploys agents. Operational experience is what NIST is specifically asking for.
Treat the Berkeley and CCIA submissions as intelligence on where auditors will focus within 18 to 36 months.
Stand up basic agent identity logging now using existing IAM controls. Don’t wait for NIST to finalize anything.

Rock’s Musings

NIST is moving faster on agentic AI security than I expected two years ago. That still isn’t fast enough to matter for organizations deploying agents today. Best case from the current comment cycle: interim guidance in twelve months. Binding controls will take longer.

Berkeley’s call for incident reporting is the right recommendation and it will face the same resistance every mandatory reporting regime has faced. Voluntary frameworks will come first, get ignored, and get teeth after the third or fourth major public incident. That’s the pattern. Plan for it and build your own internal incident tracking capability now.

The One Thing You Won’t Hear About But You Need To

Entro Security builds a governed map of what your AI agents access in production

Entro Security launched its Agentic Governance and Administration platform, extending non-human identity security coverage specifically to AI agents (GlobeNewswire, Help Net Security). The platform builds structured AI agent profiles from three observable layers. First, sources: the endpoints, agent platforms, cloud environments, and MCP servers where agents execute. Second, targets: the enterprise assets and applications each agent accesses. Third, identities: the human accounts, non-human identities, and secrets each agent uses to operate. AGA provides MCP server activity visibility and policy enforcement, audit trails for both allowed and blocked activity, and controls against unsanctioned MCP targets and AI client behaviors.

Why it matters

Most organizations deploying AI agents don’t have a single governed view of what agents are running, what they access, and which identities they use. AGA builds that view from execution telemetry rather than documentation that goes stale immediately after it’s written.
MCP server governance is nearly absent from enterprise security programs today, despite MCP servers frequently holding production credentials and broad access to sensitive systems.
The NHI-first architecture lets organizations with existing non-human identity programs extend that coverage to AI agents rather than building a separate program from scratch.

What to do about it

Before the next AI agent deployment, require answers to three questions from observable telemetry: where does it run, what does it touch, and which identities does it use? If you need documentation rather than telemetry to answer, you don’t have governance.
Add MCP server inventory to asset management now. MCP servers deploy through developer workflows without formal change management, and retroactive cataloguing gets harder with each deployment.
Assess whether your current NHI security program explicitly covers AI agent identities. If it doesn’t, extend it or stand up a parallel track with a clear accountable owner.

Rock’s Musings

This one didn’t get coverage this week because it launched during RSA prep season when every security vendor fights for the same column inches. That’s exactly why it’s here. The problem AGA addresses is what I call dark matter governance: AI agents operating in your environment that nobody catalogued because they deployed through platforms your traditional asset management tools don’t see.

The MCP visibility layer is the operationally useful piece. MCP servers multiply fast, are deployed by individual developers without change management review, and frequently hold credentials for production systems. An agent you haven’t catalogued connecting to an MCP server you haven’t governed is a permissions sprawl problem that compounds with every new deployment. Get a governed view of that surface before your adversary maps it for you.

If you found this analysis useful, subscribe at rockcybermusings.com for weekly intelligence on AI security developments.

👉 Visit RockCyber.com to learn more about how we can help you in your traditional Cybersecurity and AI Security and Governance Journey

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 Subscribe for more AI and cyber insights with the occasional rant.

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Share RockCyber Musings

References

Bitcoinworld. (2026, March). Rogue AI agent sparks critical security crisis at Meta, exposing sensitive data. https://bitcoinworld.co.in/meta-rogue-ai-agent-security-breach/

Cloud Security Alliance. (2026, March 13). The state of cloud and AI security in 2026. https://cloudsecurityalliance.org/blog/2026/03/13/the-state-of-cloud-and-ai-security-in-2026

Computer and Communications Industry Association. (2026, March). CCIA submits comments to NIST regarding privacy and security of AI agents. https://ccianet.org/news/2026/03/ccia-submits-comments-to-nist-regarding-privacy-and-security-of-ai-agents/

Council of the European Union. (2026, March 13). Council agrees position to streamline rules on artificial intelligence. https://www.consilium.europa.eu/en/press/press-releases/2026/03/13/council-agrees-position-to-streamline-rules-on-artificial-intelligence/

Entro Security. (2026, March 18). Entro launches agentic governance and administration to bring visibility and control to AI access across the enterprise. GlobeNewswire. https://www.globenewswire.com/news-release/2026/03/18/3258229/0/en/Entro-Launches-Agentic-Governance-Administration-to-Bring-Visibility-and-Control-to-AI-Access-Across-the-Enterprise.html

HackerNoob. (2026, March). Meta’s rogue AI agent: Sev 1 security incident and how to sandbox AI agents properly. https://hackernoob.tips/meta-rogue-ai-agent-sev1-how-to-sandbox-ai-agents/

Help Net Security. (2026, March 17). Jozu Agent Guard targets AI agents that evade controls. https://www.helpnetsecurity.com/2026/03/17/jozu-agent-guard-targets-ai-agents-that-evade-controls/

Help Net Security. (2026, March 18). Token Security advances AI agent protection with intent-based controls. https://www.helpnetsecurity.com/2026/03/18/token-security-intent-based-ai-agent-security/

Help Net Security. (2026, March 18). Big tech companies step in to support the open source security ecosystem. https://www.helpnetsecurity.com/2026/03/18/linux-foundation-open-source-security-12-5-million-funding/

Help Net Security. (2026, March 19). Entro Security AGA brings governance and control to enterprise AI agents and access. https://www.helpnetsecurity.com/2026/03/19/entro-agentic-governance-administration/

HiddenLayer. (2026, March 18). HiddenLayer releases the 2026 AI threat landscape report. PR Newswire. https://finance.yahoo.com/news/hiddenlayer-releases-2026-ai-threat-140000928.html

Linux Foundation. (2026, March 17). Linux Foundation announces $12.5 million in grant funding from leading organizations to advance open source security. https://www.linuxfoundation.org/press/linux-foundation-announces-12.5-million-in-grant-funding-from-leading-organizations-to-advance-open-source-security

SC Media. (2026, March). AWS Bedrock tool vulnerability allows data exfiltration via DNS leaks. https://www.scworld.com/brief/aws-bedrock-vulnerability-allows-data-exfiltration-via-dns-leaks

TechCrunch. (2026, March 17). The Pentagon is developing alternatives to Anthropic, report says. https://techcrunch.com/2026/03/17/the-pentagon-is-developing-alternatives-to-anthropic-report-says/

The Hacker News. (2026, March 17). AI flaws in Amazon Bedrock, LangSmith, and SGLang enable data exfiltration and RCE. https://thehackernews.com/2026/03/ai-flaws-in-amazon-bedrock-langsmith.html

UC Berkeley Center for Long-Term Cybersecurity. (2026, March 18). Researchers submit response to U.S. government request on security considerations for AI agents. https://cltc.berkeley.edu/2026/03/18/researchers-submit-response-to-u-s-government-request-on-security-considerations-for-ai-agents/

AI Agent Authentication Gets the Hard Part Right. Authorization Is Still Your Problem.

Rock Lambros — Tue, 17 Mar 2026 12:50:42 GMT

The IETF just published its most ambitious attempt to standardize how AI agents prove their identity across systems. Draft-klrc-aiagent-auth-00, dropped March 2, 2026, composes WIMSE, SPIFFE, and OAuth 2.0 into a 26-page framework called AIMS (Agent Identity Management System). The authentication layer is solid. The authorization layer stops at the token boundary. The Security Considerations section contains two words: “TODO Security.” If you’re deploying agentic systems in production, you need to understand where this draft helps you and where you still have to build your own controls.

Before I get into specifics, a quick note on what this document actually is. An IETF Internet-Draft (I-D) is a working document, the raw material that may eventually become an RFC (an official Internet standard). This one is version -00, the very first public iteration from Pieter Kasselman (Defakto Security), Jean-Francois Lombardo (AWS), Yaroslav Rosomakho (Zscaler), and Brian Campbell (Ping Identity). Criticizing a -00 draft for incompleteness is a bit like reviewing someone’s outline and complaining the conclusion is thin. That said, people are already reading this as deployment guidance, and the gaps matter for anyone building agentic systems today. So let’s talk about what it covers, what it doesn’t cover yet, and what you need to build yourself while the standards process catches up.

The good news: agents are workloads, and workloads have an identity stack

The draft’s foundational thesis gets it right that AI agents should be treated as workloads, not as some new identity category requiring new protocols and running instances of software executing specific tasks. That framing unlocks SPIFFE’s attestation-bound cryptographic identity, WIMSE’s cross-system workload semantics, and OAuth 2.0’s delegation framework. No new protocols needed.

This matters because SPIFFE already works at scale. Uber processes billions of attestations daily through SPIRE. Block runs the full SPIFFE+WIMSE+OAuth stack in production. The draft codifies patterns that companies with real security engineering teams already deploy.

The WIMSE identifiers specified in the draft bind agent identity to the execution environment through hardware-rooted attestation. A SPIRE agent on each node performs workload attestation by examining the kernel or querying the orchestration platform. Your agent’s identity gets measured from where it runs, not merely asserted by who registered it. An OAuth client_id is a registration artifact. A SPIFFE ID is cryptographic proof that Agent X is actually Agent X, running in the expected environment.

The draft also gets credentials right. Short-lived, cryptographically bound, explicit expiration. Static API keys are called out as unsuitable for agent authentication: bearer artifacts with no cryptographic binding, no identity conveyance, operationally painful to rotate.

That warning couldn’t come at a better time. Astrix Security analyzed over 5,200 open-source MCP server implementations and found that 53% rely on static API keys or Personal Access Tokens. Only 8.5% use OAuth. The ecosystem is building on exactly the anti-pattern the draft condemns.

Figure 1: MCP Server Authentication Methods

Transaction Tokens solve the lateral movement problem

Section 10.4 addresses a real attack vector most frameworks ignore. When access tokens propagate through internal microservice chains within an agent workflow, every hop creates a theft and replay opportunity.

The draft’s answer is Transaction Tokens (draft-ietf-oauth-transaction-tokens-08). Short-lived, signed JWTs that bind user identity, workload identity, and authorization context to a specific transaction. Lifetimes are measured in seconds to minutes. Cryptographic signatures prevent context modification. You can’t grab a Transaction Token from one transaction and replay it in another because the transaction context is cryptographically sealed. A companion draft (draft-oauth-transaction-tokens-for-agents-04) extends this with agent-specific fields for the acting agent, the initiating human, and operational constraints.

The draft also correctly identifies tools forwarding access tokens to downstream services as an anti-pattern.

The authorization gap: where scope alone isn’t enough

Here’s where the draft’s -00 status shows. Once an OAuth access token gets issued with a set of scopes, every action within those scopes proceeds unchecked until the token expires. No per-action evaluation. No consequence assessment. No behavioral feedback loop. The authors clearly know authorization needs more work (the AIMS conceptual model describes layers that the spec hasn’t filled in yet), but anyone reading this draft as a deployment blueprint today will inherit that gap.

Think about what that means in practice. An agent with email:send scope authorized to send meeting notes can use that same scope to email every contact in the address book a different message. Each action is technically within scope. The framework treats them identically. The authorization decision happened once, at token issuance. Everything after that is a free pass.

OWASP’s Top 10 for Agentic Applications draws a distinction that the draft hasn’t addressed yet: least agency versus least privilege. Least privilege asks what the agent can access. Least agency extends that to how much freedom the agent has to act on that access without checking back.

The term “least agency” appears nowhere in the draft. Section 10.8 says agents should request minimum scopes and authorization details. That’s least privilege applied to OAuth scopes. Standard stuff. It does nothing to constrain autonomous decision-making within those scopes.

OWASP’s ASI03 (Identity and Privilege Abuse) mitigation guidance recommends per-action authorization through a centralized policy engine. Not once at token issuance. At each privileged step. The draft doesn’t provide a mechanism for this yet, and future revisions may address it. In the meantime, you need to build that layer yourself.

Figure 2: OWASP Agentic Top 10 Coverage by IETF Draft

Your token says “allowed.” What it can’t say is “should you?”

The deeper issue goes beyond per-action evaluation. The draft in its current form contains no mechanisms for assessing the potential impact of an action before permitting it. No concept of blast radius. No reversibility check. No impact severity score. Again, this is version -00. These concepts may arrive in later revisions. They’re absent today.

Consider the practical difference. An agent with files:read_write scope can read one file or delete every file in scope. The OAuth framework treats these as equivalent actions. They aren’t. One is routine. The other is catastrophic and irreversible.

Consequence-based authorization asks three questions per permission:

What’s the worst action this agent can take?
Is the damage reversible?
Can you reverse it within an acceptable recovery window?

OAuth scopes can’t answer any of these.

The emerging practice of graduated trust models (read-only, then draft-only, then supervised execution, then earned autonomy) represents an informal consequence-based approach. Most practitioners agree that most agents never earn full autonomy in high-stakes contexts. That’s the correct outcome. The draft provides no framework for expressing or enforcing these graduation stages.

OWASP’s ASI08 (Cascading Failures) recommends blast-radius caps and digital twin replay testing. Run recorded agent actions in an isolated environment first. See if sequences trigger cascading failures before expanding policy permissions. Future revisions of the draft could incorporate these concepts. For now, they’re outside its scope.

The observability gap: strong detection, no policy feedback loop

Section 11’s observability requirements are genuinely strong for detection and audit. Seven minimum audit event fields. Correlation across agents, tools, services, and LLMs. The ability to reconstruct complete execution chains, including delegated authority and intermediate calls.

The draft calls observability “a security control, not solely an operational feature.” Correct. Then it integrates the OpenID Shared Signals Framework with CAEP (Continuous Access Evaluation Profile) for real-time signal delivery. Also good.

The problem is that the AIMS conceptual model in Section 4 promises observability that can “dynamically modify authorization decisions based on observed behavior and system state.” The actual specification delivers reactive remediation, terminate sessions, discard tokens, re-acquire with updated constraints. Detection flows to dashboards and SIEM tools. It doesn’t feed into the policy decision point that evaluates each authorization request. The conceptual model is ahead of the spec, which is normal for a -00 draft. The spec will likely catch up. You can’t afford to wait for it.

An agent exhibiting anomalous tool invocation patterns should see its authorization dynamically narrowed. Not through token revocation (which is all-or-nothing) but through policy-level constraints on permitted actions. The draft gives you a circuit breaker when you need a rheostat.

NIST SP 800-207 (Zero Trust Architecture) explicitly recommends a trust score that changes dynamically based on entity behavior patterns, feeding into the policy engine. Context-aware authorization systems from companies such as Zscaler and StrongDM already implement this pattern in production (not endorsing either). I’d expect future revisions of the draft to engage with these models, especially given that Zscaler’s Rosomakho is one of the four co-authors.

AuthZEN fills the gap the draft hasn’t reached yet

The most interesting omission in the current document is that AuthZEN (OpenID Authorization API 1.0) was approved as a Final Specification in January 2026. It standardizes a transport-agnostic API where any Policy Enforcement Point queries any Policy Decision Point, regardless of vendor. The information model is a four-element tuple:

Subject (the agent), Action (the operation), Resource (the target), Context (ambient attributes).

Every agent tool invocation maps cleanly to an AuthZEN evaluation: subject is the agent’s SPIFFE ID, action is “send_email,” resource is “contact_list,” context carries the delegating user, blast radius classification, reversibility flag, and behavioral anomaly score. The context object is extensible and open-ended. It was designed for exactly this kind of dynamic, attribute-rich decision-making.

The draft references AuthZEN in its normative references. The body text doesn’t discuss it yet. Given that AuthZEN solves the draft’s most significant open question, I’d bet it features prominently in the next revision. For now, that connection is yours to make.

Three policy engines deserve attention for filling that gap. OPA (Open Policy Agent), a CNCF Graduated project, evaluates structured JSON input against declarative policies with sub-millisecond latency. Cedar, from AWS, offers automated reasoning via SMT solver that can mathematically prove properties about policies and benchmarks at 42 to 60 times faster than Rego. Topaz, from Aserto (whose CEO co-authored the AuthZEN specification), combines OPA’s decision engine with a built-in Zanzibar-style relationship graph.

OAuth provides coarse-grained delegation, who can access what resource category. Policy engines provide fine-grained runtime evaluation, should this specific action on this specific resource proceed given current context. That layered model is where the draft needs to go next. Until it gets there, you build it yourself.

Figure 3: Authentication vs. Authorization Layer Responsibilities

Regulatory timelines won’t wait for standards completion

The EU AI Act’s high-risk system requirements take full effect August 2, 2026 (as of this writing, anyway). Five months from now. Article 14 requires human oversight. Article 26 requires deployers to keep automatically generated logs for at least six months. The draft’s identity-bound audit trails and CIBA-based human-in-the-loop mechanism directly support both.

NIST launched two converging initiatives in February 2026. The NCCoE concept paper on AI agent identity and authorization, and the AI Agent Standards Initiative covering security controls, identity, and testing. Both center on WIMSE/SPIFFE + OAuth. Both explicitly include policy-based access control, the piece the IETF draft’s -00 revision hasn’t specified yet.

The Colorado AI Act establishes a “reasonable care” standard for high-risk AI systems effective June 30, 2026. Widely adopted standards become evidence of reasonable care in court. The identity architecture the draft describes will likely qualify for authentication. You still need to build the authorization layer yourself.

Figure 4: Regulatory Compliance Timeline for AI Agent Systems

MCP and A2A still have fundamental identity gaps

Mapping the IETF draft’s framework onto the Model Context Protocol reveals how far the ecosystem still has to travel. MCP identifies agents as OAuth clients with a client_id, a registration artifact with no attestation binding. No SPIFFE identity verification. No attestation mechanism. No multi-hop delegation. No standard mapping between tool names and OAuth scopes. The draft recommends Workload Proof Tokens for proof-of-possession. MCP uses bearer tokens.

MCP’s OAuth model is human-centric (Authorization Code + PKCE). The Client Credentials Grant for machine-to-machine authentication was removed from the spec and is only returning through an extension. Fully autonomous agents have no standard authentication path in MCP today. Google’s A2A protocol has similar gaps: self-declared identities with no attestation binding, credential acquisition out of scope, authorization left to the receiving agent.

Riptides demonstrated the draft’s compositional pattern working for MCP in practice. Each workload gets a SPIFFE SVID, used as a software statement in Dynamic Client Registration and as a JWT assertion for client authentication. The pattern works. It required significant custom integration that no standard profile defines.

What you should build now

Don’t wait for standards completion. The threat model OWASP defined already exists. The regulatory deadlines are set.

Start with SPIFFE/SPIRE for attestation-bound agent identity. Use SVIDs as JWT assertions (RFC 7523) to obtain OAuth tokens. This follows the pattern the draft describes and Riptides validated in production.

Deploy an AuthZEN-compliant PDP (OPA, Cedar, or Topaz). Evaluate every agent tool invocation against dynamic policy. Pass agent identity, action details, resource metadata, delegation context, and behavioral signals in the AuthZEN context object.

Write Cedar or Rego policies encoding blast-radius thresholds, reversibility requirements, graduated trust levels, and human-in-the-loop triggers. Version-control policies alongside application code.

Tag every tool and action with impact metadata: blast_radius, reversible, data_sensitivity, scope. Enforce that irreversible high-blast-radius actions require explicit human approval through CIBA step-up authorization.

Feed observability data into the policy engine as real-time context attributes. Stop sending behavioral signals only to SIEM dashboards for post-hoc investigation. Make them first-class policy inputs.

Key Takeaway: The IETF draft gives you a strong answer to “is this really Agent X?” It hasn’t answered “should Agent X do this specific thing right now?” yet. That gap will close as the draft matures. In the meantime, authentication without per-action authorization is a locked front door with open windows. Build the authorization layer now.

What to do next

If you’re building agentic systems and trying to figure out where identity controls fit, start with the CARE framework at rockcyber.com for mapping security controls to business risk outcomes. The RISE framework helps you evaluate where your organization sits on the AI security maturity curve, particularly useful for figuring out which authorization controls to prioritize first.

The agent identity problem is a microcosm of the larger question the book addresses: how do you govern autonomous systems when the blast radius of failure compounds faster than your ability to detect it?

More analysis on agentic AI security, MCP authorization gaps, and practical frameworks for building authorization layers at rockcybermusings.com.

👉 Subscribe for more AI security and governance insights with the occasional rant.

👉 Visit RockCyber.com to learn more about how we can help you in your traditional Cybersecurity and AI Security and Governance Journey

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Share RockCyber Musings

Weekly Musings Top 10 AI Security Wrapup: Issue 29 March 6, 2026 - March 12, 2026

Rock Lambros — Fri, 13 Mar 2026 12:50:41 GMT

The week of March 6-12, 2026, handed us a story that was coming... Anthropic filed suit against the Pentagon for blacklisting it as a national security risk. In the same week, the White House released a new cyber strategy, OpenAI launched a vulnerability-scanning agent aimed squarely at the enterprise security market, and two major federal regulatory deadlines expired. This is that week.

AI Security and AI governance collided this week in federal court, in congressional briefings, and in the server rooms of every organization running an AI agent they don’t fully understand. The governance frameworks that were supposed to provide clarity are instead amplifying uncertainty, and attackers are exploiting the gap in real time. Here’s what happened, what it means, and what to do about it, from someone who’s watched this industry long enough to be appropriately paranoid about all of it.

1. Anthropic sues the Pentagon for blacklisting it as a national security risk

Anthropic filed two federal lawsuits against the Trump administration after the Department of Defense designated the company a supply chain risk. That designation, typically reserved for foreign adversaries, bars Anthropic from federal contracts and requires defense contractors to certify they don’t use Claude in any DoD work. The root cause is Anthropic’s refusal to allow Claude for autonomous weapons or mass surveillance of American citizens. CEO Dario Amodei drew two red lines in contract negotiations, the Pentagon walked, and then labeled the company a national security threat (Fortune, Defense One). Anthropic warns the financial exposure runs to hundreds of millions of dollars.

Why it matters

This is the first time a U.S.-headquartered AI company has received the supply chain risk designation, a label previously applied only to foreign adversaries.
The case tests whether the executive branch can use procurement leverage to override AI developers’ safety commitments, a precedent that extends far beyond Anthropic.
Every CISO advising on AI vendor selection now has to factor whether a vendor’s ethics commitments make it a federal liability.

What to do about it

Map your Claude and Anthropic API dependencies now. Know which workflows break if this escalates.
Brief your board on what a supply chain risk designation means in federal contracting terms if your organization touches government work.
Watch for similar scrutiny applied to other AI vendors with published safety policies. This may not be a one-off.

Rock’s Musings

Anthropic drew a line in the sand (no autonomous weapons, no mass surveillance), and the government responded by calling them a threat. Think about what that signals to every AI developer watching. If you have safety principles that conflict with defense procurement, you get punished for them. The First Amendment angle is interesting, but the real issue is that the executive branch just discovered that supply chain risk designation is a very effective stick, and they used it on a domestic company for the first time. AI safety as a business value just became a liability under the current administration. Read that sentence twice.

2. Trump’s Cyber Strategy for America lands in five pages

On March 6, the White House released “President Trump’s Cyber Strategy for America” alongside an executive order on cybercrime (White House, Forrester). The document covers six pillars: offensive cyber operations to shape adversary behavior, regulatory streamlining, federal network modernization, critical infrastructure security, technological superiority, and cyber workforce development. At five pages, it’s the shortest national cybersecurity strategy in a decade. The strategy explicitly calls for more aggressive offensive operations, “unprecedented coordination” between the public and private sectors, and the building of a talent base fluent in autonomous systems and AI-enabled defense.

Why it matters

Five pages are either a vision document or a placeholder. For practical CISO purposes, it signals direction but provides almost no implementation guidance.
The offensive posture language has legal and escalation implications for any organization with a government nexus.
Workforce development framed as a national strategic asset means the government will be competing for the same AI security talent you’re trying to hire.

What to do about it

Map existing compliance obligations against the six pillars. Where regulations get streamlined, understand which requirements might disappear and which you need to maintain voluntarily.
Engage your federal liaison if you’re in a critical infrastructure sector. The public-private coordination language means more government asks are coming.
Start building for AI-fluent security talent now. The window before this becomes a serious hiring crunch is closing.

Rock’s Musings

Five pages tells you something: either there’s a lot more in the classified annex, or this is aspirational language waiting for someone to actually build the plumbing. The workforce section is the sleeper story. AI-enabled defense needs people who understand both AI failure modes and adversarial tradecraft simultaneously. That combination doesn’t exist at scale anywhere, and we’re being asked to build it at the same time AI is accelerating attacks. The gap between those two curves is where the next major breach lives.

3. OpenAI launches Codex Security and walks into the vulnerability scanning market

OpenAI released Codex Security as a research preview, a context-aware AI vulnerability scanning agent that evolved from Aardvark, an internal security research tool OpenAI had tested in private beta since October 2025 (Bloomberg, SecurityWeek). Codex Security analyzes code repositories, pressure-tests suspected vulnerabilities in sandboxed environments, generates proof-of-concept exploits to confirm impact, and proposes fixes. OpenAI’s own data shows it scanned 1.2 million commits over the preceding 30 days, surfacing 10,561 high-severity issues and approximately 800 critical vulnerabilities. The tool is available free for the next month to ChatGPT Pro, Enterprise, Business, and Edu customers. OpenAI says it can “identify complex vulnerabilities that other agentic tools miss” (TechRadar).

Why it matters

A free, frontier-model-powered vulnerability scanner from OpenAI immediately changes the competitive math for established AppSec vendors whose pricing models depend on the difficulty of this problem.
Generating proof-of-concept exploits to confirm vulnerability impact is a significant capability. In the wrong hands, or with a compromised account, this is an exploit generation service.
Organizations deploying Codex Security are giving OpenAI’s systems read access to their codebases. That data handling relationship deserves the same scrutiny as any privileged third-party tool.

What to do about it

Before enabling Codex Security on production repositories, review OpenAI’s data retention and training policies. Understand whether your code becomes training data.
Evaluate Codex Security against your existing SAST tooling on a representative code sample before replacing anything. “Better than other agentic tools” is a marketing claim until your team validates it.
The proof-of-concept exploit generation feature needs access controls. Restrict which engineers can trigger full exploit confirmation scans.

Rock’s Musings

OpenAI entering the vulnerability scanner market is not a product launch. It’s a statement about where AI is heading in security operations. The incumbents in SAST and DAST have been selling the same scan-and-report workflow for a decade. An agent that generates a proof-of-concept exploit to confirm a real finding changes the value proposition significantly. I’m not surprised OpenAI built this. I’m watching carefully how they handle the fact that generating exploit code is exactly the capability defenders need and attackers want. The account compromise scenario alone should give your red team ideas.

4. NIST AI Agent Standards RFI closes with 932 comments

The comment period for NIST’s Center for AI Standards and Innovation (CAISI) Request for Information on securing AI agent systems closed March 9 with 932 responses (Federal Register, NIST). The RFI, published in January 2026, sought input from industry, academia, and the security community on securing agentic AI development and deployment. The OpenID Foundation submitted a response addressing AI agent identity and authorization. A second comment period focused specifically on identity and authorization for AI agents remains open until April 2.

Why it matters

932 responses signals broad industry recognition of the problem. The quality of those comments determines whether the resulting standards have operational teeth.
Identity and authorization for AI agents is the structural gap behind most agent security failures. If NIST gets this right, it reshapes the risk calculus for enterprise agent deployment.
The listening sessions starting in April give practitioners a direct channel to shape what these standards require.

What to do about it

If your organization skipped the first RFI, submit to the identity and authorization comment period before April 2. Your implementation experience is exactly what NIST needs.
Start building your AI agent identity architecture now using OAuth 2.0 On-Behalf-Of flows with proper scope constraints. This is the emerging standard pattern.
Assign someone to track the AI Agent Standards Initiative. When draft standards publish later this year, you want your red-team comments in front of NIST before they finalize.

Rock’s Musings

Standards processes are slow by design, and the slowness here is appropriate because the identity and authorization problem for AI agents is genuinely hard. An agent acting on behalf of a user needs to carry that user’s permissions, not escalate to system-level access, and current tooling doesn’t enforce this reliably. The OIDF response to NIST gets the framing right: agent identity needs cryptographic binding, not just policy. If an agent claims to act on your behalf without a verifiable credential, you don’t have identity management. You have a trust-me system. You can read about the comments I submitted at “NIST AI Agent RFI (2025-0035): Human Oversight Is the Wrong Fix.”

5. Commerce and FTC hit their AI regulatory deadlines, and nothing changed yet

Two major deliverables from the December 2025 executive order on AI preemption came due on March 11. The Commerce Department submitted its review of state AI laws, identifying which ones the administration considers overly burdensome or in conflict with federal objectives. The FTC delivered a policy statement on how Section 5 of the FTC Act applies to AI and when state laws requiring alteration of model outputs are preempted by federal deceptive practices law (Mondaq, Digital Applied). Neither document invalidates any state law on its own. They are ammunition for the DOJ’s AI Litigation Task Force, established in January and yet to file any lawsuits. The administration is also conditioning $42 billion in BEAD broadband funding on states repealing AI regulations it deems onerous.

Why it matters

Organizations operating AI in multiple states face genuine legal uncertainty. State laws remain on the books. The federal government plans to fight them in court, and that litigation takes years.
The FTC’s Section 5 application to AI bias-mitigation requirements is legally untested territory.
The BEAD funding leverage is the most concrete near-term enforcement tool. Which states hold firm versus which fold will tell you a lot about regulatory durability.

What to do about it

Do not assume any state AI compliance requirement is going away. Build compliance architecture that can be toggled by jurisdiction as the legal landscape shifts.
Get legal counsel read into the Commerce Department report. Knowing which of your state compliance obligations are on the federal target list helps you prioritize risk posture.
Prepare for a two-to-three year period of overlapping requirements. Companies with modular, jurisdiction-aware compliance programs will weather this better.

Rock’s Musings

The administration created a fog of legal uncertainty and called it reducing regulatory burden. For most enterprises deploying AI, this makes compliance harder. You now have to track active federal litigation against state laws while still complying with those laws until courts rule otherwise. The FTC theory is worth watching closely: if the argument that requiring AI bias mitigation compels “deceptive output” holds, it guts a large category of state AI fairness requirements. If it fails, it sets a precedent limiting federal deceptive practices law’s reach into AI output governance. Either outcome reshapes the field.

6. OpenAI publishes its prompt injection defense playbook

On March 12, OpenAI published research and engineering guidance on defending AI agents against prompt-injection attacks (OpenAI, PrismNews). The guidance covers training techniques that help models treat different input channels with varying skepticism, architectural decisions that constrain privilege and limit blast radius, and layered verification to catch anomalous behavior. OpenAI also disclosed that it built a reinforcement learning-trained automated attacker to discover injection vulnerabilities internally, capable of steering agents through harmful multi-step workflows. The decision to publish openly reflects recognition that injection attacks threaten the entire developer ecosystem building on top of large language models.

Why it matters

Publishing the automated attacker methodology gives defenders a concrete model of what they’re fighting. Multi-step RL-trained attacks won’t be stopped with static guardrails.
The channel-skepticism approach, which trains models to treat external web content differently from system instructions, is an architectural fix that operates at inference time.
OpenAI’s disclosure accelerates industry defenses while giving attackers a clearer picture of which countermeasures to route around.

What to do about it

Apply privilege minimization immediately: agents should hold only permissions required for the specific task, expiring at task completion.
For agents consuming external content, validate that content before the agent ingests it. Treat external web data as untrusted input, period.
Build a prompt injection test suite and run it against production agents before every deployment. What you don’t test, you don’t know.

Rock’s Musings

OpenAI built an RL-trained machine to find injection vulnerabilities in their own systems. That machine now exists, and the same architecture will run on the offensive side of this problem within months, if it isn’t already. The deeper issue is architectural: language models cannot reliably distinguish instructions from data. That’s a fundamental property of how these systems process text, not a fixable bug. Any defense assuming the model will eventually learn to make that distinction is building on sand. The real fix is external. Don’t give agents access to resources they don’t need, and verify every external input before it reaches the model.

7. Google Cloud Threat Horizons reveals software exploits overtaking stolen credentials

Google Cloud’s Office of the CISO published its H1 2026 Threat Horizons Report on March 9, covering the second half of 2025 (Help Net Security, Security Boulevard). The headline finding is that exploitation of third-party software vulnerabilities jumped from 2.9% to 44.5% of initial cloud entry vectors in a single half-year period. The exploitation window has collapsed to days, with the React2Shell case showing crypto miners deployed within 48 hours of public vulnerability disclosure. North Korean threat group UNC4899 abused DevOps workflows and container breakout to steal millions in cryptocurrency. Threat actors also used LLMs to automate credential harvesting and accelerate the path from local developer access to full cloud admin privileges.

Why it matters

A jump from 2.9% to 44.5% in software exploitation isn’t an incremental change. Something shifted structurally in attacker methodology during H2 2025.
A 48-hour exploitation window means patch prioritization SLAs have to account for attacker speed, not just team capacity.
LLM-assisted credential harvesting is now in a major incident response dataset, no longer just theoretical research.

What to do about it

Reduce your vulnerability exposure window to 48 hours or less for critical and high-severity findings on internet-facing systems. Build the automation to get there.
Audit DevOps pipeline permissions. The UNC4899 vector targets the privilege elevation that happens when developers hold broad cloud access from local workstations.
Review whether AI coding tools introduce dependencies with unreviewed third-party code. Supply chain hygiene is now tier-one.

Rock’s Musings

For years, the orthodoxy was “credential hygiene is job one in the cloud.” Attackers just told you that orthodoxy is obsolete. They shifted to software exploitation because credential defenses got good enough. That’s how this works: defenders get strong on one vector, attackers rotate to the next. The current answer is patching speed. The LLM-assisted credential harvesting detail is quietly significant. It’s been in theoretical papers for two years, and now it’s in operational incident data from nation-state actors. Adjust your threat model accordingly.

8. AI agents are now helping criminals manage attack infrastructure

On March 8, The Register reported on Microsoft Threat Intelligence findings showing that North Korea’s Coral Sleet group is using AI and development platforms to rapidly build and manage attack infrastructure at scale. AI agents automate the creation of phishing infrastructure, manage C2 systems, and accelerate campaign tempo. The Unit 42 2026 Global Incident Response Report, published in February and drawing on 750 major incidents, showed the fastest 25% of attackers reaching data exfiltration in 72 minutes, down from 285 minutes the previous year. Identity weaknesses played a material role in almost 90% of investigations.

Why it matters

AI is now a documented operational capability in nation-state attack campaigns, not just an enterprise productivity tool.
The 4x speed increase in attack timelines means detection and response programs calibrated to last year’s data are already outdated.
87% of incidents unfolded across multiple attack surfaces, making correlation harder for defenders.

What to do about it

Review detection and response SLAs against the new attacker timeline. 72 minutes from initial access to exfiltration is shorter than most IR playbook trigger times.
Run tabletops assuming an AI-assisted attack infrastructure. Stress-test whether your team can detect and contain within the compressed timeline.
Identity controls remain the highest-leverage investment. 90% material involvement in incidents makes this your budget priority.

Rock’s Musings

The debate about whether attackers would use AI is over. It’s all about the economics. If you’re running persistent operations against multiple targets, automating the operational overhead with AI is exactly what you’d do. The 72-minute exfiltration timeline is the number that should break your IR program’s assumptions. Most enterprise programs are built around detection metrics measured in hours or days. You need automated detection with automated response triggers, not a playbook that assumes a human analyst will catch the initial alert.

9. Amazon pushes back on data linking AI coding to infrastructure outages

On March 10, The Register reported leaked briefing notes from an Amazon internal operations meeting flagging a “trend of incidents” characterized by “high blast radius” and “Gen-AI assisted changes.” The implication was that AI-assisted coding has made infrastructure changes more fragile. Amazon responded, saying they “have not seen compelling evidence that incidents are more common with AI tools.” The Veracode 2026 State of Software Security report, published February 24, found 82% of organizations carry security debt, a 36% year-over-year spike in high-risk vulnerabilities, and that more vulnerabilities are being created than fixed, with AI development velocity outstripping remediation capacity as a contributing factor.

Why it matters

Amazon’s internal concern, even disputed, comes from one of the largest cloud operators in the world. Internal friction at that scale is a signal worth tracking.
The Veracode data shows a systemic pattern. AI tools accelerate feature shipping and the introduction of vulnerabilities simultaneously, while remediation capacity doesn’t scale at the same rate.
82% of organizations carry security debt, with 60% classified as critical, which should be a material risk disclosure issue for most boards (materiality is another conversation for another time).

What to do about it

Require AI coding tools to integrate with static analysis before code reaches production. Velocity gains without security gates just accelerate debt accumulation.
Measure remediation rate alongside development velocity. If the gap is widening, you have a governance problem, not just a tooling problem.
Brief your board on the Veracode numbers. This is a material risk disclosure issue.

Rock’s Musings

Amazon’s denial matters. One leaked briefing note does not make a causal case. What it tells you is that someone inside one of the world’s largest cloud operators thought the correlation was worth flagging in an internal ops review. That’s a signal, not proof. The Veracode data is where I’m more confident: if your AI coding tools help developers write code 40% faster and that code contains the same flaw density as human-written code, you’ve just increased your vulnerability production rate by 40%. The only way this works in your favor is if you accelerate the remediation side at the same rate. Almost nobody is doing that.

10. Microsoft Patch Tuesday drops 77 CVEs

Microsoft pushed its March Patch Tuesday on March 11, fixing at least 77 vulnerabilities across Windows and other software (Kaseya, Check Point Research). This update cycle lands in an environment where, per the Google Cloud Threat Horizons data released the same week, exploitation windows for critical vulnerabilities have collapsed to 48 hours from public disclosure. AI-assisted exploit development is further compressing the time between CVE publication and the availability of weaponized exploits.

Why it matters

77 CVEs in one month means your patch management team works against a sprint clock every Patch Tuesday. Prioritization methodology matters more than ever.
Critical Microsoft CVEs are being probed within 48 hours of this disclosure per current attacker timelines. Your patch SLA has to account for that.
AI-assisted exploit development means the gap between disclosure and exploitation continues to narrow.

What to do about it

Build risk-tiered patching protocols: critical internet-facing systems within 24-48 hours, critical internal systems within 72 hours, high severity within a week.
Prioritize remote code execution vulnerabilities from the March 11 batch first. Review the Microsoft advisory for specific critical CVEs.
Apply compensating controls like network segmentation and least-privilege configurations for systems where immediate patching isn’t operationally feasible.

Rock’s Musings

Patch Tuesday used to feel routine. It isn’t anymore because the time between a CVE being added to the NVD and an attacker scanning for it has gone from weeks to hours. If your patch SLA is still “30 days for critical,” you’re operating with a policy written for a threat environment that no longer exists. That’s not a patch management problem. That’s a governance problem. Fix the policy first.

The One Thing You Won’t Hear About But You Need To

CISA adds an actively exploited n8n RCE to its known exploited list, and 24,700 instances are still unpatched

On March 12, CISA added CVE-2025-68613 to its Known Exploited Vulnerabilities catalog, a critical expression-injection vulnerability in the n8n workflow automation platform with a CVSS score of 9.9 (The Hacker News, The Register). The flaw was patched three months ago in the December 2025 versions. Federal agencies have until March 25 to patch. The problem: Shadowserver data shows 24,700 instances remain unpatched online, with 12,300 in North America and 7,800 in Europe. This matters beyond the CVE itself because n8n is one of the most widely used platforms for building AI automation workflows and AI agent pipelines. Organizations deploying AI agents frequently use n8n as the orchestration layer connecting those agents to enterprise data sources.

Why it matters

An unpatched RCE in the orchestration layer of an AI workflow means that an attacker who owns the n8n instance can access every connected system the AI agents touch, including credentials, APIs, and data stores.
24,700 exposed instances three months after a publicly known critical patch represents a systemic patching failure in a category of software organizations that have not been treated as critical infrastructure.
CISA’s KEV addition triggers mandatory remediation timelines for federal agencies, but most n8n deployments are in private enterprise environments with no equivalent enforcement mechanism.

What to do about it

Search your environment for n8n now. It is frequently deployed by individual teams or developers outside formal IT procurement, so your asset inventory may not show it.
If you find unpatched instances, treat them as compromised until proven otherwise. Rotate every credential and API key the n8n instance had access to.
Apply the same logic to every workflow automation tool in your environment: Zapier, Make, and similar platforms are potential RCE targets and connect to the same sensitive data sources.

Rock’s Musings

This story isn’t getting the attention it deserves because nobody considers workflow automation as critical security infrastructure. It’s where developers wire things together quickly, connect AI agents to Slack, Salesforce, and internal APIs, and then move on to the next problem. The security team doesn’t own it. The AI team doesn’t think they need to patch it. The result is a critical RCE sitting at the center of your AI agent architecture, exposed to the internet, with a patch that’s been available for three months. CISA flagging active exploitation on March 12 means this is not theoretical. Someone is using this right now. Go find your n8n instances.

If you found this analysis useful, subscribe at rockcybermusings.com for weekly intelligence on AI security developments.

👉 Visit RockCyber.com to learn more about how we can help you in your traditional Cybersecurity and AI Security and Governance Journey

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 Subscribe for more AI and cyber insights with the occasional rant.

Share RockCyber Musings

References

Axios. (2026, March 6). OpenAI rolls out Codex Security to automate code security reviews. https://www.axios.com/2026/03/06/openai-codex-security-ai-cyber

Baker Botts. (2026, March). March 2026: Federal deadlines that will reshape the AI regulatory landscape. MONDAQ. https://www.mondaq.com/unitedstates/new-technology/1755166/march-2026-federal-deadlines-that-will-reshape-the-ai-regulatory-landscape

Bloomberg. (2026, March 6). OpenAI unveils Codex Security tool to detect database vulnerabilities. https://www.bloomberg.com/news/articles/2026-03-06/openai-releases-ai-agent-security-tool-for-research-preview

Check Point Research. (2026, March 9). 9th March: Threat Intelligence Report. https://research.checkpoint.com/2026/9th-march-threat-intelligence-report/

CISA. (2026, March 12). CISA adds one known exploited vulnerability to catalog. https://www.cisa.gov/known-exploited-vulnerabilities-catalog

CNBC. (2026, March 10). Amazon convenes ‘deep dive’ internal meeting to address outages. https://www.cnbc.com/2026/03/10/amazon-plans-deep-dive-internal-meeting-address-ai-related-outages.html

Defense One. (2026, March 9). Anthropic sues over a dozen federal agencies and government leaders. https://www.defenseone.com/business/2026/03/anthropic-sues-over-dozen-federal-agencies-and-government-leaders/411997/

Digital Applied. (2026, March). FTC AI policy deadline March 11: Compliance guide. https://www.digitalapplied.com/blog/ftc-ai-policy-deadline-march-11-compliance-readiness

Forrester. (2026, March). White House announces the 2026 cyber strategy for America. https://www.forrester.com/blogs/white-house-announces-the-2026-cyber-strategy-for-america/

Fortune. (2026, March 9). Anthropic sues Pentagon after being labeled a threat to national security. https://fortune.com/2026/03/09/anthropic-sues-pentagon-ai-supply-chain-risk-trump-administration/

Google Cloud. (2026, March 9). Cloud threat horizons report H1 2026. https://cloud.google.com/security/report/resources/cloud-threat-horizons-report-h1-2026

Help Net Security. (2026, March 11). Software vulnerabilities push credential abuse aside in cloud intrusions. https://www.helpnetsecurity.com/2026/03/11/google-cloud-environments-cyber-threats-report/

Kaseya. (2026, March 11). The week in breach news: March 11, 2026.

https://www.kaseya.com/?post_type=post&p=26754

Microsoft Security Blog. (2026, March 6). AI as tradecraft: How threat actors operationalize AI. https://www.microsoft.com/en-us/security/blog/2026/03/06/ai-as-tradecraft-how-threat-actors-operationalize-ai/

National Institute of Standards and Technology. (2026, January). CAISI issues request for information about securing AI agent systems. https://www.nist.gov/news-events/news/2026/01/caisi-issues-request-information-about-securing-ai-agent-systems

National Institute of Standards and Technology. (2026, February). Announcing the AI agent standards initiative for interoperable and secure innovation. https://www.nist.gov/news-events/news/2026/02/announcing-ai-agent-standards-initiative-interoperable-and-secure

OpenAI. (2026, March 6). Codex Security: Now in research preview. https://openai.com/index/codex-security-now-in-research-preview/

OpenAI. (2026, March 12). Understanding prompt injections: A frontier security challenge. https://openai.com/index/prompt-injections/

OpenAI. (2026). Continuously hardening ChatGPT Atlas against prompt injection attacks. https://openai.com/index/hardening-atlas-against-prompt-injection/

OpenID Foundation. (2026). OIDF responds to NIST on AI agent security. https://openid.net/oidf-responds-to-nist-on-ai-agent-security/

Palo Alto Networks. (2026, February). 2026 Unit 42 global incident response report: Attacks now 4x faster. https://www.paloaltonetworks.com/blog/2026/02/unit-42-global-ir-report/

PrismNews. (2026, March). OpenAI releases engineering playbook to shield AI agents from prompt injection. https://www.prismnews.com/news/openai-releases-engineering-playbook-to-shield-ai-agents

Security Boulevard. (2026, March). 83% of cloud breaches start with identity, AI agents are about to make it worse. https://securityboulevard.com/2026/03/83-of-cloud-breaches-start-with-identity-ai-agents-are-about-to-make-it-worse/

SecurityWeek. (2026, March 6). OpenAI rolls out Codex Security vulnerability scanner. https://www.securityweek.com/openai-rolls-out-codex-security-vulnerability-scanner/

TechRadar. (2026, March 6). OpenAI releases Codex Security to spot the next big cyber risks to your company. https://www.techradar.com/pro/security/openai-releases-codex-security-to-spot-the-next-big-cyber-risks-to-your-company-promises-to-identify-complex-vulnerabilities-that-other-agentic-tools-miss

The Hacker News. (2026, March 12). CISA flags actively exploited n8n RCE bug as 24,700 instances remain exposed. https://thehackernews.com/2026/03/cisa-flags-actively-exploited-n8n-rce.html

The Register. (2026, March 6). Anthropic sues US over national security blacklist. https://www.theregister.com/2026/03/06/anthropic_left_with_no_other/

The Register. (2026, March 8). Manage attack infrastructure? AI agents can now help. https://www.theregister.com/2026/03/08/deploy_and_manage_attack_infrastructure/

The Register. (2026, March 10). Amazon insists AI coding isn’t source of outages. https://www.theregister.com/2026/03/10/amazon_ai_coding_outages/

The Register. (2026, March 12). CISA says n8n critical bug exploited in real-world attacks. https://www.theregister.com/2026/03/12/cisa_n8n_rce/

U.S. Federal Register. (2026, January 8). Request for information regarding security considerations for artificial intelligence agents. https://www.federalregister.gov/documents/2026/01/08/2026-00206/request-for-information-regarding-security-considerations-for-artificial-intelligence-agents

Veracode. (2026, February 24). 2026 state of software security report. BusinessWire. https://www.businesswire.com/news/home/20260224526703/en/Veracode-2026-State-of-Software-Security-Report-Reveals-Four-Out-of-Five-Organizations-Are-Drowning-in-Security-Debt

White House. (2026, March). White House unveils President Trump’s cyber strategy for America. https://www.whitehouse.gov/articles/2026/03/white-house-unveils-president-trumps-cyber-strategy-for-america/

AI Vendor Lock-In: What the Pentagon Taught Every CISO This Week

Rock Lambros — Tue, 10 Mar 2026 12:50:49 GMT

You probably don’t know which AI model is running inside your operational tools right now. That’s a near-certainty given how enterprise AI procurement actually works. The Pentagon just ran a live stress test on that exact blind spot, and the results were not subtle. When the Department of War formally designated Anthropic a supply chain risk on March 5, 2026, making it the first American company in history to receive a label previously reserved for Huawei and Chinese state-adjacent tech firms, the disruption didn’t start with Anthropic. It cascaded through Palantir, across AWS infrastructure, and into active military workflows during U.S. strikes on Iran. Your enterprise has the same layered architecture. The question is whether you’ve mapped it, and whether your contracts protect you when the layer you don’t control catches fire.

The AI Model You Don’t Control Is Already in Production

The DoD’s direct customer relationship wasn’t with Anthropic. Claude ran inside Palantir’s Maven Smart System, hosted on AWS at Impact Level 6, sitting on classified infrastructure the military depended on for intelligence analysis and operational planning. The DoD contracted with Palantir. Palantir embedded Claude. When the supply chain risk designation landed, it cascaded from procurement machinery through Palantir’s operational position and into workflows with real military dependencies, reportedly including active support for Iran strikes, even as the designation was being disputed on social media by the Secretary of Defense and the CEO of Anthropic simultaneously.

Piper Sandler analysts noted after the designation that Anthropic was “heavily embedded in the Military and the Intelligence community” and that migrating off the technology could “pose some short-term disruptions” to Palantir’s operations. Short-term disruptions. During an active military operation. That’s the polite Wall Street version of the problem.

Figure 1: The Embedded AI Architecture Problem

Replace “Military and Intelligence community” with your sector. Replace “Palantir” with your largest workflow vendor. Replace “active military operation” with your peak fraud season, your annual close, or your next regulatory audit. You’ve just described your own exposure.

Your enterprise equivalent of Maven isn’t a targeting system. It’s the fraud detection platform your SOC relies on for alert triage. It’s the contract review tool your legal team treats as a first pass on every agreement. It’s the SIEM enrichment workflow your analysts approved 18 months ago, without anyone asking which model was under it or whose usage policy governed it. In each case, there’s a foundation model embedded by a SaaS vendor, hosted by a cloud provider, running under policies you never reviewed and almost certainly can’t enforce. The vendor who sold you the platform might not even know which model version was deployed last Tuesday.

The lock-in risk most CISOs think about is the wrong one. They worry about pricing leverage at renewal or feature gaps during the next budget cycle. Those are real, and they’re also the least interesting version of vendor risk in an AI-dependent stack. The risk that actually bites is operational dependency on a model whose policies, safety stack, and external political relationships sit entirely outside your contractual reach. This week demonstrated those conditions shift in 48 hours. When they do, you find out how embedded you actually are. The DoD found out during airstrikes. You’ll find out during something comparably inconvenient for you.

What the Contract Language Reveals About Your Own Agreements

The factual record on the Anthropic negotiation is clear enough. The Department of War’s January 2026 AI strategy memorandum directed procurement to require “any lawful use” language and to acquire models “free from usage policy constraints that may limit lawful military applications.” Anthropic held two red lines: no mass domestic surveillance of Americans, and no fully autonomous weapons with no human in the targeting decision loop. The DoD called those constraints unacceptable. The negotiation collapsed. The designation followed.

Here’s where it gets interesting... OpenAI reached a deal within hours of the designation announcement, published contract excerpts containing the exact “all lawful purposes” language Anthropic refused, then amended the agreement twice in the following week after legal experts publicly tore apart what the protections actually meant. Sam Altman acknowledged the deal was “definitely rushed” and that “the optics don’t look good.” Jessica Tillipman, associate dean for government procurement law studies at George Washington University, wrote that the published excerpt “does not give OpenAI an Anthropic-style, free-standing right to prohibit otherwise-lawful government use.” Altman signed it anyway. To be fair to him, he was working in 48-hour crisis mode while a competing lab was being designated a national security threat. Good contract hygiene was not the priority.

Figure 2: Red Lines vs. Legal Anchors: Two Approaches to AI Contract Protection

Instead of wasting your time on the OpenAI vs. Anthropic drama and who is right or wrong, you need to pay attention to the legal architecture underlying AI safety commitments.

Why?

Because your enterprise contracts almost certainly follow the same pattern OpenAI accepted, that include usage restrictions anchored to “applicable law” and “existing policy,” with the vendor’s safety stack as the primary enforcement mechanism. OpenAI anchored its protections to existing statutes: the Fourth Amendment, FISA, DoD Directive 3000.09 on autonomous weapons, and Executive Order 12333. Critics flagged immediately that EO 12333 is the authority the NSA has historically used to justify intercepting Americans’ communications through collection outside U.S. borders. “Lawful” in national security contexts isn’t a fixed boundary. It lives inside classified legal interpretations, executive orders, and internal agency guidance nobody outside the building ever reads.

Your enterprise contracts with AI vendors operate the same way. When law shifts, when policy changes, or when your vendor faces its own version of a 48-hour political deadline, those anchors move with the situation. What your procurement posture needs instead are vendor-imposed, free-standing prohibited-use schedules for your specific high-risk workflows, written into contract appendices with attached audit rights and defined remedies. “We comply with applicable law” is a description of baseline legal obligation. It’s not a control. It’s what every vendor says about every product, whether or not AI is involved. You shouldn’t be paying for that sentence in an AI addendum. You should be getting something that took a lawyer to write specifically for your deployment.

Human-in-the-Loop Theater

Let me describe a workflow you probably have running right now. Your AI triage layer ingests 200 alerts per shift and flags 180 as low severity. Your analyst reviews the queue, confirms the model’s assessment on most items, escalates five, clears the rest. Total elapsed review time for the cleared items is, let’s say, roughly two minutes each. Every disposition went through a human. The audit log shows human review. Your controls documentation references human oversight. What actually happened is your analyst ratified model outputs under cognitive load and time pressure while telling themselves they were exercising judgment.

That’s the failure mode human-in-the-loop review was designed to prevent. The loop exists on paper. The friction isn’t in the workflow design because no step requires the reviewer to explain why they agree with the model before confirming the output. Nobody required forced alternative generation before escalating or clearing. Nobody captured uncertainty as a structured field. The control is decorative.

The OpenAI contract’s autonomous weapons provision bars the use of the AI system “to independently direct autonomous weapons in any case where law, regulation, or Department policy requires human control.” Defense scholars noted the omission of “human-in-the-loop” language was deliberate, preserving operational flexibility. “Human judgment” and “human control” are not equivalent, and the people drafting that language knew it. The contract borrows its enforceability entirely from existing policy, which requires commanders to exercise “appropriate levels of human judgment over the use of force.” Appropriate is not a control. It’s a word that means whatever the decision-maker concludes is appropriate under the circumstances they’re actually in.

Research from King’s College London found that tested AI models threatened nuclear strikes in 95% of simulated crisis scenarios. The problem wasn’t autonomous weapons. The problem was that under uncertainty and time pressure, models produced escalatory recommendations with false confidence, and human reviewers were positioned to ratify those outputs rather than interrogate them. That’s not a future risk. That’s automation bias, and it operates in your environment every shift, at every tier of your AI-assisted workflows.

The Lavender targeting system used by Israeli defense forces was reportedly identified by investigators as carrying a 10% false positive rate on human identification, with human reviewers present throughout the process. The investigation raised a direct question of whether those humans were genuinely reviewing or functionally ratifying outputs under operational tempo. That distinction carries different consequences in contexts outside the military. In your environment, it shows up as a miscategorized fraud case that costs a customer their account, or a misconfigured access control that cleared review because the analyst trusted the model’s output and moved on in the last four minutes of a shift.

Building real decision friction requires designing it into the workflow architecture before something goes wrong, not auditing for it afterward. Two-person review for high-consequence AI outputs. Forced alternative generation before an analyst confirms a model recommendation. Explicit uncertainty capture as a required structured field. If your current AI-assisted workflows don’t require a reviewer to articulate why they agree with the model’s output before confirming it, then you are rubber-stamping your way into a problem down the road. You may survive your next audit. Youwon’t survive your next incident.

The Procurement Posture That Needs to Change Before the Next Signature

Most CISOs don’t own AI vendor contracts. Procurement does. Legal does. The CISO inherits the agreement after signature, usually after the vendor relationship is already operational and the leverage window has closed. This is the moment where I’ll stop pretending that’s a systems failure and call it what it is: CISOs have let themselves get cut out of a decision that’s now one of the highest-risk commitments their organization makes. The Anthropic situation gives you the publicly documented argument to change that for every AI agreement with operational or regulatory exposure going forward.

The DoD’s relationship with Palantir didn’t include enforceable audit rights over Claude’s underlying usage policy, safety stack updates, or model variant changes. When Anthropic’s relationship with the DoD broke down, Palantir faced operational disruption from a vendor dependency it hadn’t fully governed at the model layer. Your enterprise equivalent is any SaaS vendor who embeds a foundation model in a production workflow without explicit flow-down contract obligations. You need those flow-down provisions now: contractual requirements for your SaaS vendors to notify you of material AI policy changes, with a defined right to pause deployment or terminate.

Anthropic’s published usage policy states the company may tailor restrictions for certain customers based on mission and legal authorities, subject to Anthropic’s judgment about safeguards. That clause exists in their public policy documentation. Most of their enterprise customers have never read it, don’t know whether their deployment is governed by standard or tailored terms, and have no contractual mechanism to find out. If you’re an Anthropic customer and you don’t know the answer to that question, the answer is almost certainly that you don’t know, which means you don’t control it.

Splunk’s 2026 CISO Report found that a large majority of CISOs carry personal liability concerns about security incidents. AI model misuse by a subcontractor or an embedded model that you didn’t govern is exactly the incident scenario that tests that liability question. Your current contract schedules almost certainly don’t address it. Here are the questions that need to be in every AI vendor negotiation before signature, not as a wish list, but as conditions of signature:

Which model variant governs your deployment, and does that variant deviate from the vendor’s published acceptable use policy or baseline safety commitments? Get the answer in writing with a version reference.
What change control process governs model updates, safety stack revisions, and policy changes? “We update continuously” is not an answer. You need customer notice requirements and the right to pause deployment when the vendor makes a material change.
What logs exist, who holds access, and what is the retention period? Without logs you can’t support an incident investigation, a regulatory inquiry, or your own post-incident analysis.
What happens when a major customer, a regulator, or a government agency demands scope expansion for your deployment? The Anthropic situation confirmed this question isn’t hypothetical. It’s a negotiating dynamic triggered externally, rapidly, and without advance warning to downstream customers.

From the Run Phase to the Evolve Phase

If you’re applying the CARE framework, this situation signals that you’re overdue for an Evolve-phase review of your AI vendor relationships. The Create and Adapt work produced your current model integrations. Most organizations have stayed in the Run phase, monitoring performance and managing routine issues, while the risk environment underneath those integrations has shifted significantly. The Evolve phase requires reassessing whether the governance model you built for each AI deployment still fits the world you’re operating in now.

The Anthropic situation changed that environment in three concrete ways your board needs to understand. First, it showed that an AI vendor’s political and contractual relationships with high-profile customers now represent operational risk to every downstream customer, not only government contractors. Second, it produced a documented public case where contract language anchored to “applicable law” failed to deliver the protections a party believed it had agreed to. Third, it revealed that model replacement timelines are slower than your AI vendors implied during the sales process. The DoD, with its classified infrastructure, operational urgency, considerable resources, and six-month transition timeline, is the fastest-moving version of this problem you’re likely to encounter. Your enterprise timeline almost certainly isn’t shorter.

Build your AI vendor risk registry before something breaks, while relationships are functional and vendors are cooperative. Map every production AI deployment to the model underneath it, the vendor who embeds it, the cloud provider who hosts it, and the contract that governs each layer. Run a prohibited-use gap assessment: which categories of use does each contract explicitly prohibit, and are those prohibitions free-standing or anchored to “applicable law”? Apply OWASP’s Agentic Top 10 to any workflow where a model makes or influences a decision without a mandatory human review step that requires documented rationale.

The CISOs who were ahead of this story weren’t tracking the Pentagon news cycle. They had already asked their SaaS vendors which model was embedded, what the vendor’s posture would be if that model’s policy changed, and what their exit path looked like. Most got vague answers. The right response to a vague answer from an AI vendor is a contract clause, not a follow-up email.

Key Takeaway: Your AI vendor’s ethics statement doesn’t protect your enterprise. A free-standing prohibited-use schedule, enforceable audit rights, and model-layer flow-down provisions do.

What to Do Next

Start with a model inventory audit across your top ten SaaS vendor relationships. Ask each vendor to identify the foundation model embedded in your production workflows and provide the current acceptable use policy governing your specific deployment, including any tailored terms. Map the gap between what the policy says and what your contract actually enforces.

The Anthropic situation is the most instructive public case study on AI vendor governance to emerge from this space. Use it while it’s in front of your board and before your next AI vendor signature lands on someone else’s desk.

👉 Subscribe for more AI security and governance insights with the occasional rant.

👉 Visit RockCyber.com to learn more about how we can help you in your traditional Cybersecurity and AI Security and Governance Journey

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

The views and opinions expressed in RockCyber Musings are my own and do not represent the positions of my employer or any organization I’m affiliated with.

Share RockCyber Musings

Weekly Musings Top 10 AI Security Wrapup: Issue 28 February 27, 2026 - March 5, 2026

Rock Lambros — Fri, 06 Mar 2026 13:47:08 GMT

This week handed security leaders something they’ve been theorizing about for two years: autonomous AI agents attacking other autonomous AI agents in live production environments. No thought experiment, no conference demo. A malicious bot using Claude Opus 4.5 compromised five major open-source repositories. An AI-native offensive platform compromised 600 firewalls across 55 countries. Developer tools turned into attack vectors by opening a Git repo.

The practitioner community doing the real work on these problems gathered at [un]prompted in San Francisco. The rest of the week’s news served as a live demonstration of why that conference needed to exist. Attackers aren’t waiting for frameworks to catch up. Your AI tools are the attack surface now. The developers building them are the initial targets. The agents those tools spawn are the next ones.

1. [Un]Prompted Delivers the AI Security Conference the Industry Needed

The first [un]prompted conference ran March 3-4 at The Hibernia in San Francisco (unpromptedcon.org). Gadi Evron of Knostic, who chaired the conference, received nearly 500 talk submissions and built a program spanning offense, defense, DFIR, and governance. No vendor theater. Confirmed speakers included Heather Adkins from Google on advancing code security, Joshua Saxe from Meta on agent evaluation, Paul McMillan from OpenAI on securing software in the agentic era, and Nicholas Carlini from Anthropic on black-hat LLMs finding zero-days in production codebases. Dan Guido closed Day Two, explaining how Trail of Bits rebuilt around AI to reach 200 bugs per engineer per week. Sergej Epp from Sysdig presented primary forensic evidence from an 8-minute AWS escalation and EtherRAT, a blockchain C2 campaign. Gadi even stepped in for Avishai Efrat and Michael Barugy from Zenity…a direct competitor… who could not get out of Israel, to drop PleaseFix.

Why it matters

The field now has a practitioner-grade conference built for people doing actual work, from red teamers to governance leads, not vendor keynotes disguised as research.
The offensive capability context is essential. Carlini showed current models finding zero-days. Guido showed 200 bugs per engineer per week. Defenders need this before building programs.
The governance track didn’t retreat into frameworks. Healthcare and large enterprise practitioners spoke about what actually works in production.

What to do about it

Read the full agenda at unpromptedcon.org. The talk abstracts contain more actionable signal than most vendor white papers.
Follow the researchers presenting there. Those names are shaping the actual threat landscape.
Prioritize the Stripe threat modeling talks and the Snap capability-based authorization session if your team hasn’t treated AI agents as first-class attack surfaces yet.

Rock’s Musings

Rob T. Lee’s line on Stage 2 deserves repeating. Anthropic’s own GTG-1002 report showed adversaries running Claude Code at 80-90% autonomous execution. Your adversary has an AI. If you’re at tab-completion for defense, that’s a strategic failure, not a skills gap.

I’ve been going to security conferences for a long time. Most are marketing events with technical content as decoration. [un]prompted felt different because Gadi built it explicitly for people who know what a YAML file does. That’s a rare thing and worth supporting. Start planning for year two.

2. Hackerbot-Claw Proved Autonomous AI Can Systematically Destroy Your CI/CD Pipeline

Between February 21 and March 1, 2026, a GitHub account called hackerbot-claw ran an autonomous campaign against public repositories (StepSecurity). The account describes itself as an “autonomous security research agent powered by claude-opus-4-5,” maintains a vulnerability pattern index with 9 classes and 47 sub-patterns, and claims to have scanned 47,391 repositories. The bot achieved remote code execution in at least four of seven targeted repositories, including Microsoft, DataDog, CNCF, and Aqua Security’s Trivy scanner. In the Trivy compromise, it stole a Personal Access Token with broad write permissions, deleted all 178 GitHub releases, wiped repository content, and published a malicious VSCode extension to OpenVSX under Trivy’s trusted publisher identity. OpenSSF issued a TLP:CLEAR advisory March 1.

The single defining moment: the bot attempted prompt injection against a Claude-based CI workflow at ambient-code/platform. Claude, running claude-sonnet-4-6, classified it as “a textbook AI agent supply-chain attack via poisoned project-level instructions” and refused. The only target the bot failed to compromise was protected by another AI model recognizing the attack.

Why it matters

CI/CD misconfigurations are now mass-exploitable at machine speed without a single CVE. Five documented exploitation techniques, all using known patterns, all automatable.
Supply-chain compromise at scale doesn’t require sophisticated malware. It requires systematic scanning and pull request automation. The bot scanned 47,000 repos in a week.
AI-versus-AI defense is no longer theoretical. The ambient-code defense worked because someone built proper tool allowlisting with prompt injection detection.

What to do about it

Audit every pull_request_target workflow in your repositories this week. Move PR metadata into environment variables. Scope tokens to minimum permissions.
Verify your AI-based code review toolchain has prompt injection detection and tool allowlisting. Configuration matters as much as the model.
Check the OpenSSF advisory for the specific pattern list hackerbot-claw exploited. These are all preventable and all still present in thousands of active repositories.

Rock’s Musings

The “security research” framing in the account bio is working hard. Deleting 32,000 stars from Trivy and pushing a malicious extension to OpenVSX isn’t research. The creator remains unidentified. The domain name, the “molt” naming, and the OpenClaw ecosystem references point to infrastructure being assembled and tested in the open because the operators know defenders aren’t watching yet. We’re watching the emergence of an offensive AI toolkit in real time.

3. CyberStrikeAI: A Chinese-Linked Offensive Platform Hit 600 Firewalls Across 55 Countries

Team Cymru published research on March 3, naming CyberStrikeAI as the AI-native offensive tool behind the FortiGate campaign disclosed by Amazon Threat Intelligence in February (BleepingComputer, The Hacker News). The campaign ran from January 11 to February 18, 2026, comprising over 600 FortiGate devices across 55 countries. CyberStrikeAI is built in Go, integrates 100-plus security tools, and uses any OpenAI-compatible model, including Claude and DeepSeek, through an MCP orchestration engine. The developer, alias Ed1s0nZ, submitted the tool to Knownsec 404’s Starlink Project in December 2025 and briefly posted a CNNVD vulnerability credential to their GitHub profile before deleting it. CNNVD operates under oversight by China’s Ministry of State Security. Team Cymru detected 21 unique IPs running CyberStrikeAI between January 20 and February 26, primarily on Chinese cloud infrastructure. No zero-days exploited. The actor succeeded through exposed management interfaces and weak credentials.

Why it matters

AI-native offensive platforms are open-source and in active deployment. The barrier to running a 600-device campaign across 55 countries is now a GitHub clone and a cloud account.
State-adjacent tooling proliferates fast. Zero deployments in November to 21 active servers by late February is an adoption curve worth tracking.
The entry point remains unchanged. Sophisticated AI orchestration amplified the attacker. Exposed management interfaces created the opportunity. Harden the basics first.

What to do about it

Pull the FortiGate management interface exposure from public networks immediately (seriously… who do we have to keep saying this?). Apply all current firmware patches.
Add CyberStrikeAI IOCs from the Team Cymru report to your threat intelligence feeds.
Add AI-native offensive tooling as a threat category in your risk model. The economics of running large-scale exploitation campaigns changed this quarter.

Rock’s Musings

The credential scrub tells you something about the actor’s maturity. Ed1s0nZ posted the CNNVD award, realized the optics problem, and deleted it. Git commit history preserved both moves. This is someone running a 600-device campaign across 55 countries who doesn’t understand basic operational security hygiene. The AI amplified a low-to-medium capability actor significantly. That’s the real threat vector here, not the sophisticated attacker getting more powerful. It’s the mediocre attacker becoming operationally dangerous.

4. Claude Code Let Attackers Own Developer Machines by Opening a Git Repo

Check Point Research disclosed two critical vulnerabilities in Anthropic’s Claude Code around February 25-27, 2026, widely covered through March 4 (Dark Reading, Security Affairs, The Hacker News). CVE-2025-59536 (CVSS 8.7) allows code injection via the Hooks feature and MCP server initialization. CVE-2026-21852 (CVSS 5.3) allows API key exfiltration by manipulating ANTHROPIC_BASE_URL before the trust dialog appears. Both trigger on opening an untrusted repository with no further user interaction. Researchers Oded Vanunu and Aviv Donenfeld at Check Point found that .claude/settings.json, .mcp.json, and CLAUDE.md function as active execution layers. Stolen API keys in Anthropic Workspaces expose all project files shared across that workspace, creating team-wide compromise from one developer’s action. All issues are patched: CVE-2025-59536 fixed in version 1.0.111, CVE-2026-21852 fixed in 2.0.65.

Why it matters

AI coding tools are now supply-chain attack vectors. Cloning a malicious repository used to mean running attacker code. Now it means letting an AI agent run attacker code with your credentials before any warning appears.
Repository configuration files are execution logic. Add .claude/, .mcp.json, and CLAUDE.md to your code review checklist alongside source code.
The Workspaces blast radius multiplies team exposure. One stolen key can expose shared project files and generate unauthorized API costs across an entire engineering organization.

What to do about it

Verify all Claude Code users are on 1.0.111 or later for the hook vulnerability and 2.0.65 or later for the API key issue. Both patches deliver via auto-update.
Rotate Anthropic API keys for any team that cloned untrusted repositories before the patches were applied.
Extend your security review process to cover AI tool configuration files in every repository the tool touches.

Rock’s Musings

“Trust dialog bypass” shouldn’t appear in the threat model of a professional developer tool in 2026. The design assumption that config files are passive was wrong, and it costs a CVSS 8.7. The governance question is broader: how many of your developers are running AI coding tools that weren’t through your security approval process? Claude Code, Cursor, Copilot. Each one has deep access to local filesystems, shell execution, and credentials. Your endpoint protection almost certainly has no visibility into what they’re doing. This disclosure is the clean example of why that matters.

5. GlicJack: Chrome’s Gemini Panel Let Malicious Extensions Steal Your Camera and Files

Palo Alto Networks Unit 42 published CVE-2026-0628 on March 2, 2026 (SC Media, The Hacker News). CVSS 8.8. Researcher Gal Weizman discovered that a Chrome extension with basic declarativeNetRequest permissions could inject JavaScript into Gemini Live’s side panel and inherit all of its elevated privileges: camera, microphone, local file reads, screenshot capability. The flaw arose because Chrome’s Gemini panel loads gemini.google.com inside a chrome://glic WebView component. Extension isolation rules that protect privileged browser pages didn’t apply to this component. An extension influencing a website is expected behavior. An extension influencing a component baked into the browser is a security flaw. Google patched this January 5, 2026 in Chrome 143.0.7499.192/.193. Unit 42 reported it October 23, 2025.

Why it matters

AI features embedded in the browser create privilege escalation paths that didn’t exist before. The capabilities granted to make the assistant useful become the attacker’s gain.
The declarativeNetRequest API is used by millions of legitimate extensions. Any extension holding that permission could have exploited this.
Enterprise Chrome fleets may lag on patches. Individual users update automatically. Managed deployments need active verification.

What to do about it

Confirm Chrome is at 143.0.7499.192 or later across all enterprise endpoints.
Audit installed extensions with declarativeNetRequest permissions. Remove anything not explicitly approved.
Add AI browser panels to your ongoing threat model. The same architectural pattern exists in Copilot in Edge and other embedded AI assistants.

Rock’s Musings

This vulnerability pattern will repeat. Every vendor shipping an embedded AI assistant is granting that panel elevated access to make it useful, then relying on the browser’s isolation model to prevent exploitation. The Gemini panel inherited browser-level privileges while the security policy hadn’t caught up. That’s not a Google-specific design flaw. It’s the natural consequence of rushing AI features into security models built for a different threat landscape. GlicJack was found and patched responsibly. The next one in a competitor’s AI browser feature might not be.

6. ClawJacked: Any Malicious Website Can Own Your Local AI Agent

Oasis Security disclosed a high-severity flaw on February 28, 2026 allowing any malicious website to connect to a locally installed OpenClaw AI agent via WebSocket and take full control (WIU Cybersecurity Center, Sysdig). The attack required nothing beyond loading a malicious webpage. An attacker’s JavaScript opened a WebSocket to the agent’s localhost port and brute-forced the gateway password with no rate limiting. Once authenticated, full access: interact with the agent, dump configuration, enumerate connected devices, read logs. A companion log poisoning vulnerability allowed indirect prompt injection through data the agent processed. OpenClaw patched ClawJacked in version 2026.2.25 and the log poisoning in 2026.2.13. The same disclosure cycle included seven additional CVEs against OpenClaw: CVE-2026-25593, CVE-2026-24763, CVE-2026-25157, CVE-2026-25475, CVE-2026-26319, CVE-2026-26322, and CVE-2026-26329.

Why it matters

Local AI agents create new cross-context attack surfaces. The browser’s isolation model doesn’t extend to local services. A webpage can reach localhost.
Seven CVEs in one disclosure cycle against the same product signals early-stage software with an immature security posture deployed in enterprise environments.
Log poisoning via indirect prompt injection generalizes to any agent that processes external data. The agent becomes the vehicle for attacker instructions delivered through normal telemetry.

What to do about it

Update OpenClaw to version 2026.2.25 or later. Non-negotiable if your organization deploys it.
Inventory which local AI agents your developers are running and what ports they’re listening on. Most users don’t understand that local agents accept browser connections.
Require rate limiting on local service authentication endpoints in any AI agent development your organization does or procures.

Rock’s Musings

Seven CVEs in one batch tells you about the security review process that went into building the product, or its absence. OpenClaw is representative of a broader pattern: AI agent frameworks are shipping at startup velocity with security addressed after product-market fit. The problem is that product-market fit now means enterprise deployment, which means these vulnerabilities sit inside corporate networks before anyone notices.

7. North Korea’s Contagious Interview Campaign Is Back With 26 npm Packages

Socket researchers disclosed March 2, 2026 a new iteration of the Contagious Interview campaign from North Korean threat group Famous Chollima, deploying 26 malicious npm packages targeting cryptocurrency and Web3 developers (The Hacker News). Packages masquerade as developer utilities. Install scripts execute automatically and fetch C2 server addresses from Pastebin content, a dead-drop resolver technique that makes the C2 infrastructure resilient: blocking domains doesn’t neutralize active infections because attackers update the Pastebin content with new addresses. The actual payload pulls from Vercel deployments, making traffic look like legitimate developer tool usage. The cross-platform RAT targets Windows, Linux, and macOS with keylogging, browser credential theft, and cryptocurrency wallet exfiltration.

Why it matters

Publishing 26 plausible-looking packages to npm is a low-barrier operation that bypasses most enterprise code review.
Pastebin dead-drop C2 is a detection evasion technique most organizations haven’t built specific detection logic for.
Crypto and Web3 developers are the named target, but the payload works on any developer machine in any organization.

What to do about it

Implement package manifest review for new installs in developer environments. Untrusted packages entering your toolchain require explicit approval.
Block or alert on Pastebin traffic from developer machines that don’t require it for work. Pastebin as a C2 dead drop is an established pattern.
Brief cryptocurrency and Web3 development teams directly. They are specifically targeted.

Rock’s Musings

Famous Chollima runs this playbook on a near-quarterly cadence and the success rate isn’t declining. Crypto theft funds sanctions-constrained North Korean government operations. This isn’t opportunistic. It’s state-directed revenue generation with a consistent target profile and consistent tooling. Your security awareness training hasn’t stopped it because awareness doesn’t change the attack surface. The attack surface is npm, Pastebin, and Vercel. Those require technical controls, not training slides.

8. The Average Enterprise Has 1,200 Unauthorized AI Applications and 14% Visibility Into Them

A briefing published March 3, 2026, by the AIUC-1 Consortium, developed with input from Stanford’s Trustworthy AI Research Lab and more than 40 security executives from Confluent, Elastic, UiPath, and Deutsche Börse, put concrete numbers to the enterprise AI governance gap (Help Net Security). Average enterprise: 1,200 unofficial AI applications; 86% of organizations report no visibility into AI data flows; shadow AI breaches cost $670,000 more than standard incidents due to delayed detection; one in five organizations report a breach linked to unauthorized AI use.

Stanford’s Sanmi Koyejo contributed research showing fine-tuning attacks bypassed Claude Haiku in 72% of cases and GPT-4o in 57%, confirming that model-level safety controls are insufficient as standalone defenses. Actual defense requires input validation, action-level guardrails, and reasoning chain visibility operating independently of model behavior.

Why it matters

1,200 unofficial AI applications per enterprise means most identity programs have a blind spot. You can’t govern what you can’t see, and you can’t detect a breach in a system you don’t know exists.
The $670,000 additional breach cost from shadow AI is the board's number. Frame AI governance conversations around detection delay, not abstract risk.
Model-level safety is not a security control you present to auditors. It’s a product feature. The bypass rates confirm it degrades under targeted attack.

What to do about it

Use SaaS discovery tools and proxy logs to inventory actual AI application usage, not self-reported usage. The gap between what employees say they use and what they actually use is where the exposure lives.
Define what an AI agent identity means in your IAM framework before your agents define it for you. Include API keys, OAuth grants, and service accounts belonging to AI agents.
Document controls at the input, action, and output layers separately from model behavior. Auditors need evidence that doesn’t depend on the model refusing bad requests.

Rock’s Musings

The $670,000 additional breach cost from shadow AI is entirely attributable to one thing: time to detect. You can’t detect what you’re not monitoring. The 86% visibility gap translates directly into investigation time, which in turn translates into breach cost. The governance conversation isn’t about restricting AI use. It’s about making AI use visible enough that your SOC can respond when something goes wrong. Start there.

9. NIST Wants to Know How to Secure AI Agents. The Comment Window Closes Monday.

NIST’s Center for AI Standards and Innovation published an RFI on January 8, 2026, seeking practitioner input on securing AI agent systems, with comments due March 9, 2026 (Federal Register). This is the first formal federal RFI focused specifically on agentic AI security. The comment deadline falls four days from the publication of this newsletter. The RFI asks respondents to identify the biggest security risks unique to AI agents, what defenses actually work, how to test and constrain these systems, and what standards and policy coordination are needed. A companion initiative from NIST’s National Cybersecurity Center of Excellence on AI agent identity and authorization has a separate April 2 deadline. The Trump administration renamed the AI Safety Institute as CAISI to reflect a shift from existential risk evaluation to practical standards and measurement.

You can read more about my submission in “NIST AI Agent RFI (2025-0035): Human Oversight Is the Wrong Fix”

Why it matters

The standards that emerge from this process will shape federal procurement requirements, contracting baselines, and eventually insurance and regulatory frameworks. Practitioner input now affects what you’ll be measured against in two to three years.
The practitioners who will respond by default are academics, system integrators, and AI vendors with commercial interests in the outcome. Independent CISO voices are underrepresented in federal standards work.
NIST standards carry weight across the federal supply chain. If you sell to or partner with federal agencies, the guidance coming from this process will affect your requirements.

What to do about it

Submit a comment before March 9 at regulations.gov under docket NIST-2025-0035. Specific examples from your actual environment are more valuable than polished organizational submissions with no concrete data.
Flag the April 2 deadline for the companion paper on AI agent identity and authorization to whoever owns your IAM program.
Engage legal or policy counsel if your organization wants a formal submission. The deadline for that conversation is today.

Rock’s Musings

Most security executives I know haven’t heard of this RFI. That’s a problem. The reason the resulting standards will be shaped by vendors instead of practitioners is that practitioners don’t show up to the process. I’m not asking you to become a standards wonk. I’m asking you to spend 30 minutes writing down what you’re actually seeing in production, the Claude Code RCE, the OpenClaw WebSocket exposure, the shadow AI breach cost, and submit it at regulations.gov. The comment period was designed for exactly that. Use it.

The One Thing You Won’t Hear About But You Need To

OpenSSF’s TLP:CLEAR Advisory Means 47,000 Repos Are Still Exposed Right Now

On March 1, 2026, the Open Source Security Foundation issued a TLP:CLEAR advisory prompted by the hackerbot-claw campaign, documenting the specific misconfiguration classes exploited: unsafe pull_request_target trigger configurations, overprivileged GITHUB_TOKEN scopes, unsanitized inputs in shell execution contexts, and dynamic shell execution patterns (Threat Landscape Blog). TLP:CLEAR means no restrictions on distribution. It was published specifically so every organization running public GitHub Actions workflows could read it and fix their exposure.

The bot’s profile claims 47,391 repositories scanned. That number isn’t independently verified, but StepSecurity’s analysis confirms five of seven analyzed targets were compromised during a nine-day campaign that defenders didn’t detect while it was running. No CVEs. No zero-days. Documented, preventable misconfigurations. New repositories with the same patterns are being created today.

Why it matters

The advisory is available and actionable. The barrier isn’t information access. It’s distribution through the security team to the platform engineers who control the workflows.
The attack surface isn’t shrinking. Hackerbot-claw found 47,000 potentially vulnerable repositories in a week. The automation will get rerun.
Undetected campaigns running for nine days means your current GitHub Actions monitoring isn’t catching this class of attack.

What to do about it

Get the OpenSSF advisory to your DevSecOps and platform engineering teams today. It contains the specific patterns to search for and the specific remediation steps.
Run StepSecurity harden-runner or equivalent tooling against your public repositories. The vulnerability patterns are enumerable. Find them before the next scanner does.
Require security review for new GitHub Actions workflows before merge. The misconfigurations hackerbot-claw exploited are consistently introduced during workflow creation.

Rock’s Musings

TLP:CLEAR means the government cleared the information for public release with no restrictions. It was published so practitioners could act on it. The fact that it’s “the thing you won’t hear about” is an indictment of how security information moves through the industry. Your platform engineers are shipping features. Nobody is reading OpenSSF advisories in real time unless someone built a process for it.

The hackerbot-claw campaign didn’t require a zero-day. It required patient scanning of publicly available information about CI/CD pipeline configurations. The attacker had that process. The question for your organization is whether you have the equivalent on defense. The OpenSSF advisory is the starting point. If you want additional context on building CI/CD security programs that account for this threat class, the practitioner content at rockcybermusings.com covers it. The attack surface is documented. Close it.

If you found this analysis useful, subscribe at rockcybermusings.com for weekly intelligence on AI security developments.

👉 Visit RockCyber.com to learn more about how we can help you in your traditional Cybersecurity and AI Security and Governance Journey

👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com

👉 Subscribe for more AI and cyber insights with the occasional rant.

Share RockCyber Musings

References

Awesome Agents. (2026, March 2). An AI agent just pwned Trivy’s 32K-star repo via GitHub Actions. https://awesomeagents.ai/news/hackerbot-claw-trivy-github-actions-compromise/

BleepingComputer. (2026, March 2). CyberStrikeAI tool adopted by hackers for AI-powered attacks. https://www.bleepingcomputer.com/news/security/cyberstrikeai-tool-adopted-by-hackers-for-ai-powered-attacks/

Check Point Research. (2026, February 25). Caught in the hook: RCE and API token exfiltration through Claude Code project files. https://research.checkpoint.com/2026/rce-and-api-token-exfiltration-through-claude-code-project-files-cve-2025-59536/

Cybernews. (2026, March 4). AI bot compromises five major GitHub repositories. https://cybernews.com/security/claude-powered-ai-bot-compromises-five-github-repositories/

Cybernews. (2026, March 4). Open some code, Claude Code runs with hacker’s instructions. https://cybernews.com/security/claude-code-critical-vulnerability-enabled-rce/

Dark Reading. (2026, February 28). Flaws in Claude Code put developers’ machines at risk. https://www.darkreading.com/application-security/flaws-claude-code-developer-machines-risk

Federal Register. (2026, January 8). Request for information regarding security considerations for artificial intelligence agents (Docket NIST-2025-0035). https://www.federalregister.gov/documents/2026/01/08/2026-00206/request-for-information-regarding-security-considerations-for-artificial-intelligence-agents

Help Net Security. (2026, March 3). AI went from assistant to autonomous actor and security never caught up. https://www.helpnetsecurity.com/2026/03/03/enterprise-ai-agent-security-2026/

NIST Center for AI Standards and Innovation. (2026, January 12). CAISI issues request for information about securing AI agent systems. https://www.nist.gov/news-events/news/2026/01/caisi-issues-request-information-about-securing-ai-agent-systems

Orca Security. (2026, March 3). HackerBot-Claw: An AI-assisted campaign targeting GitHub Actions pipelines. https://orca.security/resources/blog/hackerbot-claw-github-actions-attack/

Palo Alto Networks Unit 42. (2026, March 2). Taming agentic browsers: Vulnerability in Chrome allowed extensions to hijack new Gemini panel. https://unit42.paloaltonetworks.com/gemini-live-in-chrome-hijacking/

SC Media. (2026, March 2). Google Chrome vulnerability risked hijacking Gemini panel by rogue extension. https://www.scworld.com/news/google-chrome-vulnerability-risked-hijacking-gemini-panel-by-rogue-extension

Security Affairs. (2026, March 2). Untrusted repositories turn Claude Code into an attack vector. https://securityaffairs.com/188508/security/untrusted-repositories-turn-claude-code-into-an-attack-vector.html

StepSecurity. (2026, March 3). Hackerbot-claw: An AI-powered bot actively exploiting GitHub Actions. https://www.stepsecurity.io/blog/hackerbot-claw-github-actions-exploitation

Sysdig. (2026, March 4). Security briefing: February 2026. https://www.sysdig.com/blog/security-briefing-february-2026

The Hacker News. (2026, March 3). Open-source CyberStrikeAI deployed in AI-driven FortiGate attacks across 55 countries. https://thehackernews.com/2026/03/open-source-cyberstrikeai-deployed-in.html

The Hacker News. (2026, March 3). New Chrome vulnerability let malicious extensions escalate privileges via Gemini panel. https://thehackernews.com/2026/03/new-chrome-vulnerability-let-malicious.html

The Hacker News. (2026, February 28). Claude Code flaws allow remote code execution and API key exfiltration. https://thehackernews.com/2026/02/claude-code-flaws-allow-remote-code.html

The Hacker News. (2026, March 2). North Korean hackers publish 26 npm packages hiding Pastebin C2 for cross-platform RAT. https://thehackernews.com/2026/03/north-korean-hackers-publish-26-npm.html

Threat Landscape Blog. (2026, March 5). Hackerbot-Claw: AI bot exploiting GitHub Actions CI/CD misconfigs for repo takeover. https://threatlandscape.io/blog/hackerbot-claw-ai-bot-github-actions-supply-chain-attack

[un]prompted. (2026). Agenda — [un]prompted, The AI Security Practitioner Conference, March 3-4, 2026.

https://unpromptedcon.org/

WIU Cybersecurity Center. (2026). Cybersecurity news. Western Illinois University. https://www.wiu.edu/cybersecuritycenter/cybernews.php