Weekly Musings Top 10 AI Security Wrapup: Issue 7 August 15 - August 11, 2025

AI gets sober: pilots stall, agents ship, and regulators sharpen the knives

Aug 22, 2025

CFOs got a reality check this week: MIT says 95% of corporate GenAI pilots are failing to deliver revenue. While Wall Street digests that, vendors pushed hard on practical controls. GitHub shipped an Agents panel and a quiet but important secret‑scanning upgrade. Google rolled out agent security features. Anthropic moved from research to deployment with nuclear‑risk safeguards built with the U.S. government. Toss in critical inference‑server bugs and a shaky Microsoft update cycle, and you’ve got one theme: less hype, more engineering, and tighter governance.

1) Anthropic and NNSA deploy nuclear‑risk classifier on Claude

Summary: Anthropic announced a classifier, co‑developed with the U.S. Department of Energy’s National Nuclear Security Administration, to distinguish benign from concerning nuclear‑related content with preliminary 96% accuracy, now deployed to monitor Claude traffic. The company says it will share the approach with the Frontier Model Forum, framing it as a blueprint for public‑private safeguards. This represents a shift from research to active operational control on a high‑risk content domain. (Anthropic)

Why It Matters

Moves “safety” from blog posts to production controls tied to a national‑security partner.
Sets a bar for other model providers on content‑risk screening beyond generic policies.
Signals that sector‑specific classifiers will become table stakes for compliance programs.

What To Do About It

Ask every LLM vendor how they programmatically detect and block sensitive dual‑use content, and how they measure accuracy and drift.
Build a content‑risk taxonomy that maps to your regulated domains, then require vendor‑side classifiers plus tenant controls.
Pilot internal classifiers for your crown‑jewel misuse scenarios, mainly where export controls apply.

Rock’s Musings:
This is a real shift. We’re finally seeing providers wire safety into the pipes instead of sprinkling it on the UI. A classifier won’t catch everything, but it’s measurable and auditable, which makes my governance heart happy. If you build with agents, you need a living set of detectors for your own sensitive knowledge, too, not just what Anthropic flags. I also like the model of working with a mission regulator; it keeps the guardrails tied to real‑world harms. Don’t over‑rotate to nuclear and forget your own operational risks, though. Start with your business’s dangerous queries. Then, prove the block works in staging before you trust it in prod.

2) GitHub’s Agents panel lands across GitHub.com

Summary: GitHub added an Agents panel so developers can create and launch Copilot coding‑agent tasks from anywhere on GitHub.com. The change moves agents from a novelty into a native workflow surface alongside repos, issues, and PRs, with supporting platform changes rolling out in August. Coverage across the press frames this as a significant usability step for agentic development. (The GitHub Blog, AICPA & CIMA)

Why It Matters

Agents are now a first‑class object in mainstream developer workflows.
Security review has to cover agent permissions, tools, and context, not just code.
This increases the blast radius of prompt and tool hijacking if you don’t gate it.

What To Do About It

Define agent approval and tool whitelisting like you already do for GitHub Apps.
Turn on secret scanning and push protection for repos that agents touch, including forks.
Require unit tests for agent plans and tool‑use policies before promotion to team‑wide use.

Rock’s Musings:
The minute GitHub adds a new panel, it stops being experimental. Agents will now creep into backlogs, PRs, and release trains, and you’ll inherit the risk whether you planned for it or not. Treat agents like junior engineers with root‑level curiosity. They get the least privilege you can manage, an SSO identity, and a watchful eye on their audit trail. If your SDLC hasn’t accounted for agent plans, tool scopes, and prompts as artifacts, you’re behind. Bring AppSec and platform teams into the room early. And yes, budget time for prompt hardening. You’ll save yourself a weekend fire drill later.

3) MIT: 95% of corporate GenAI pilots are failing to deliver revenue

Summary: An MIT NANDA report found only about 5% of enterprise GenAI pilots produce rapid revenue gains, with most stalling due to integration and process “learning gaps,” not model quality. The study synthesized 300 public deployments, 150 leadership interviews, and a survey of 350 employees, and it sparked market coverage and investor jitters. The punchline for leaders: buy where possible, build where truly differentiated, and re‑engineer workflows or burn money. (Fortune, Yahoo Finance)

Why It Matters

Confirms the pilot‑to‑profit gap many teams feel but haven’t quantified.
Highlights process and data integration as the choke points, not “better models.”
Gives Boards a defensible rubric to reset AI portfolios.

What To Do About It

Create an “AI P&L” review that ties pilots to cost and revenue targets by quarter.
Standardize vendor selection on measurable outcomes and integration maturity.
Kill science projects that can’t cross a documented business process in 90 days.

Rock’s Musings:
I’ve been saying this for months, and now MIT handed you the numbers. Pilots don’t pay the bills if you park them in a sandbox and call it a strategy. If this study stings, good. Use it to prune your portfolio and move money into integrations, data cleanup, and operations. Buy for commodity use cases, then build only where you’re sure the workflow is ready to absorb change. I encourage you to check out my friend,

Disesdi Susanna Cox

’s write-up:

Angles of Attack: The AI Security Intelligence Brief

MIT Says Your AI Project Is Probably Going To Fail

Let’s cut to the chase: Your AI project probably has a 5% chance of success…

3 months ago · 5 likes · Disesdi Susanna Cox

Then call your PMO, not your model vendor.

4) OpenAI considers encrypting temporary chats by default

Summary: OpenAI is considering out end-to-end encryption for temporary chats, positioning it as a privacy-first option for sensitive prompts, with coverage noting the shift from earlier data-retention defaults. Reporting emphasizes private, time‑boxed sessions and better controls for enterprises that want to minimize data exhaust. Expect competitive pressure for parity across other assistants. (Axios)

Why It Matters

Reduces data‑in‑flight risk for high‑sensitivity prompts.
Gives enterprises a cleaner story for legal holds and eDiscovery.
Sets a new baseline competitors will need to match.

What To Do About It

Update your assistant usage standard to prefer temporary encrypted sessions for regulated work.
Validate whether transcripts ever touch vendor retraining or telemetry.
Re‑test your DLP and CASB policies with encrypted assistant traffic in mind.

Rock’s Musings:
This would be a solid change, and late. Temporary encrypted sessions belong in any enterprise assistant standard, period. Encryption helps, but it doesn’t remove prompt leakage from the endpoint or the plugins you grant. If your users can paste source code or PHI into any assistant, you need more than a policy PDF. Add a control layer and train for “red data, green data” when prompting. Also, ask vendors how they handle keys, logs, and crash dumps. Security is the whole pipeline, not just the wire.

5) U.K. taps Jade Leung as the Prime Minister’s AI Adviser

Summary: The U.K. named Jade Leung as the Prime Minister’s AI Adviser, formalizing high‑level coordination on AI risk, safety, and industrial strategy. Leung’s prior work on frontier‑risk governance and industry engagement suggests tighter coupling between government policy and provider practices. This role will shape U.K. positions that influence G7 and transatlantic forums. (GOV.UK).

Why It Matters

Signals sustained, senior focus on AI risk and economic competitiveness.
Improves policy feedback loops between labs, regulators, and ministries.
Gives enterprises a clearer point of contact for national guidance and procurement.

What To Do About It

Map your U.K. exposure and upcoming compliance obligations to a single owner.
Join sector consultations early, especially if you deploy or provide agentic systems.
Align board reporting to U.K. risk language to enable reuse across the EU and U.S.

Rock’s Musings:
Titles aren’t strategies, but they can streamline decisions. The U.K. has moved faster than most on convening providers and framing frontier‑model risk, and this gives them a single voice to push into practical guidance. U.S. firms should care because cross‑border procurement often borrows from the most mature playbook on the table. If you want your controls to travel well, build them to the strictest regime you face. Then use that baseline everywhere. It beats rewriting your program every quarter.

6) NVIDIA Triton inference server: critical bugs and a fresh ZDI advisory

Summary: NVIDIA shipped fixes for Triton Inference Server after researchers showed a vulnerability chain that can lead to AI server takeover. Follow-on research and advisories detail shared-memory issues and out-of-bounds writes, with an additional Zero Day Initiative note surfacing this week. Enterprises running GPU inference at scale should treat this as urgent patching, not optional hardening. (NVIDIA Support, wiz.io)

Why It Matters

Direct path to model and data compromise on production inference clusters.
Highlights AI stack fragility across frameworks, backends, and container tooling.
Expect copycat exploits once proof‑of‑concepts circulate.

What To Do About It

Patch Triton to the latest release and rotate any secrets accessible to the process.
Turn on strict network segmentation for inference nodes, including egress controls.
Add detections for anomalous Triton shared‑memory and IPC errors in your SIEM.

Rock’s Musings:
Inference isn’t special. It’s software, with all the usual memory bugs and attack paths, just wrapped in GPU glitter. The takeaway isn’t “don’t use Triton.” It’s “treat inference like prod, not a lab.” That means a maintenance window, version pinning, and a rollback plan. Assume your MLOps stack will inherit CVEs from every library you pull in. Then isolate it like it matters… because it does.

7) Google Cloud Security Summit: controls for agent security

Summary: At Security Summit 2025, Google introduced agent‑focused controls, including expanded AI agent inventory and risk identification in Security Command Center, in‑line Model Armor protections for Agentspace prompts and responses, and new detections for agent behaviors in Google Security Operations. The company also previewed Agentic IAM and shipped compliance tooling and DSPM updates. This is one of the clearest vendor offerings focused on agent security rather than just model safety. (Google Cloud Blog; WebProNews). (Google Cloud)

Why It Matters

Provides a vendor‑native path to inventory and governs AI agents across environments.
In‑line guardrails for prompts and tools cut the time to mitigation for prompt injection.
Gives SOCs concrete detections for agent misuse, not just “AI threats” slides.

What To Do About It

Inventory your agents, MCP servers, and tools. Then assign owners and SLAs.
Apply in‑line prompt and tool filtering on every agent that can touch sensitive data.
Pilot Agentic IAM and enforce least privilege for non‑human identities.

Rock’s Musings:
Agent security is where the next crop of incidents will come from. I like seeing concrete controls instead of vibes. Inventory first, then policy, then runtime enforcement. If you can’t answer who owns each agent and which tools it can call, you’re not ready to ship it. Start with the agent that has the most access and the least oversight. That’s your risk, not the one in the keynote demo.

8) ISACA debuts AAISM, an AI‑security management certification for CISM and CISSP leaders

Summary:
ISACA launched the Advanced in AI Security Management (AAISM) certification on August 19, aimed at experienced security managers who already hold CISM or CISSP. The program focuses on three domains: AI governance and program management, AI risk management, and AI technologies and controls. ISACA characterizes AAISM as the first AI‑centric security management credential, and trade press covered the announcement as a timely move for enterprises formalizing AI risk ownership. (Infosecurity Magazine, ISACA)

Why It Matters

Moves AI security from ad hoc policy to a recognizable professional standard for program owners.
Gives hiring managers a common bar for leaders who will own AI governance and controls.
Signals more rigorous expectations from boards and regulators on accountable AI risk management.

What To Do About It

Sponsor 2–3 senior security leaders to pursue AAISM and tie completion to owning your AI risk register.
Map AAISM’s domains to your current AI program, then close gaps in evaluation, incident handling, and change control.
Update job descriptions for platform security, AppSec, and risk leads to require demonstrated AI security management skills.

Rock’s Musings:
This is overdue. Most companies handed AI to a tiger team and hoped governance would follow. It did not. A credential won’t fix a weak strategy, but it creates a floor for the people you trust with guardrails and incident response. If your “AI lead” can’t explain model inventories, agent scopes, evaluation evidence, and rollback plans in plain English, send them to training or swap them out. I want AI risk treated like any other program with owners, budgets, metrics, and audits. Use this certification to force clarity on who makes the calls, who writes the policies, and who gets paged when an agent goes off the rails. Then measure outcomes, not certificates.

9) Microsoft’s August updates fixed a lot and broke a lot

Summary: Ah, yes… the smell of Patch Tuesday. This month’s Patch Tuesday addressed 100+ CVEs and one public zero‑day, but subsequent cumulative updates sparked widespread reports of broken multi‑monitor setups, disappearing SSDs, display corruption, and rolling blue screens in some environments. Admins began block‑listing specific KBs while waiting on refreshed packages and guidance. The takeaway is familiar: patch quickly in tiers, but keep your rollback plan ready. (Tom’s Guide; ITPro).

Why It Matters
• Reinforces the need for ringed deployment and rollback discipline.
• Highlights collateral risk to AI dev environments that depend on GPU drivers and display stacks.
• Reminds everyone that Patch Tuesday isn’t one decision; it’s a sequence.

What To Do About It
• Stage patches through canary and pilot rings with telemetry gates before full rollout.
• Snapshot and back up developer workstations with GPU dependencies before patching.
• Track KB‑level issues and set temporary holds when vendor channels light up.

Rock’s Musings:
Windows patching still requires adult supervision. I love fast remediation of real bugs, but not at the cost of a fleet outage because we trusted a Tuesday. The right response isn’t to slow‑roll forever. It’s to engineer releases like you ship code: canary, metrics, go‑no‑go. Also, document the rollback steps and practice them. You don’t want to learn under a status‑red call with the exec team listening.

10) Lenovo’s customer‑support chatbot flaw shows how one prompt can hijack agents

Summary:
Researchers reported critical vulnerabilities in Lenovo’s GPT‑4‑powered support chatbot, “Lena,” where a single 400‑character prompt forced the bot to generate malicious HTML that exfiltrated session cookies and enabled agent impersonation. The chain allowed access to ongoing and past support chats and opened paths for deeper compromise, including potential command execution. Lenovo acknowledged the issue and said it had mitigated the flaws before August 18. Cybernews IT Pro, CSO Online)

Why It Matters

Customer‑facing AI can become a stored‑XSS delivery system when teams render model output as rich content.
Session cookie theft from support agents turns a simple prompt into full account takeover.
Regulators and auditors will expect “LLM output is untrusted input” controls on any public chatbot.

What To Do About It

Treat model outputs as untrusted, render as plain text by default, enforce strict HTML sanitization, and a hardened Content‑Security‑Policy with allow‑listed domains.
Lock down agent consoles: WebAuthn for logins, short‑lived HttpOnly cookies with SameSite=Strict, session rotation on chat handoff, and device posture checks.
Add red-team tests for prompt-injection and stored-XSS, alert on unusual chat-history access and egress from agent identities, and gate releases on passing these checks.

Rock’s Musings:
This is why I keep saying your LLM’s output is code until proven safe. The bot politely printed HTML, the browser executed it, and now your support agent’s session is walking around with an attacker. If you’re rendering model output without a sanitizer and a tight CSP, you’re building an exploit surface, not a help desk. Make rich rendering a privilege that has to be earned, not the default. Rotate cookies on every transfer to a human and require WebAuthn for agent consoles, no exceptions. Add a kill switch that drops any response containing banned tags or attributes, then log the attempt for review. If a vendor can’t show you output‑sanitization tests and CSP headers, don’t put their chatbot in front of customers.

The One Thing You Won’t Hear About But You Need To

GitHub secret‑scanning push‑protection configuration is now GA

Summary: It was a big week for GitHub. In addition to Musing #2, GitHub quietly made it generally available to choose which secret‑scanning patterns are enforced in push protection at the org and enterprise levels, with API support. You can now tune enforcement to your environment and reduce false positives while blocking more of what matters. This is a big lever for supply‑chain risk that got drowned out by the agents news. (The GitHub Blog)

Why It Matters

Lets you target the secrets that hurt your business most.
Reduces developer friction by removing noisy patterns from hard‑fail checks.
Raises the bar for inbound code from contractors and forks.

What To Do About It

Turn on push protection org‑wide for all default and high‑value custom detectors.
Track bypass requests and tune patterns monthly based on false‑positive rates.
Block merges that bypass push protection without a security owner’s approval.

Rock’s Musings:
This is the kind of small feature that prevents big incidents. Most breaches still start with a token. If you only scan, you’re late. Push protection is where you stop the leak before it lives in history forever. Tune it, measure it, and report it to the board like an incident you prevented. Then expand to contractors and internal forks. Quiet controls save loud headlines.

👉 What do you think? Ping me with the story that keeps you up at night—or the one you think I overrated.

👉 The Wrap-Up drops every Friday. Stay safe, stay skeptical.

👉 For deeper dives, visit RockCyber.

Weekly Musings Top 10 AI Security Wrapup: Issue 7 August 15 - August 11, 2025

AI gets sober: pilots stall, agents ship, and regulators sharpen the knives

1) Anthropic and NNSA deploy nuclear‑risk classifier on Claude

2) GitHub’s Agents panel lands across GitHub.com

3) MIT: 95% of corporate GenAI pilots are failing to deliver revenue

4) OpenAI considers encrypting temporary chats by default

5) U.K. taps Jade Leung as the Prime Minister’s AI Adviser

6) NVIDIA Triton inference server: critical bugs and a fresh ZDI advisory

7) Google Cloud Security Summit: controls for agent security

8) ISACA debuts AAISM, an AI‑security management certification for CISM and CISSP leaders

9) Microsoft’s August updates fixed a lot and broke a lot

10) Lenovo’s customer‑support chatbot flaw shows how one prompt can hijack agents

The One Thing You Won’t Hear About But You Need To

Discussion about this post