MCP Security: Locking Down Agents After Real Exploits
Stop agent failures with identity proof, policy routing, and a three week plan. Turn past MCP security incidents into action.
MCP security is now top of mind. I was fortunate enough to speak at CornCon.net this weekend on the topic, and this week’s blog follows up on that talk (which will be posted at CornCon.tv
The past several months gave us three clear shots across the bow:
A GitHub exploit demonstrated how a public issue can prompt an agent with private repository rights to leak.
A WhatsApp incident showed what happens when a malicious server rides the trust of a legitimate one.
A Postmark incident showed how an attacker can earn trust through multiple clean releases, then flip a switch and siphon data.
You need a plan. I’ll show you what failed, why it failed, and how to fix it with controls you can start this month.
I’m keeping this practical. You get a short protocol refresher, five defense patterns, a reference architecture that fits how teams work, and a three-week plan that cuts risk without killing velocity. You’ll also see monitoring moves that catch trouble fast. If you run agents that touch code, data, chat, or email, this is your playbook.
FIGURE 1: Attack Patterns And Examples
GitHub MCP Exploit, May 2025
An attacker opened a public issue that carried hidden instructions. A user’s agent, which held private repository access and the ability to create pull requests, processed that untrusted content like an operator command. It then produced a pull request that exposed details from private repositories. The slide calls out the lethal trifecta at play: access to private data, untrusted public input, and external output capability. The root cause was a lack of context isolation between trusted commands and public content.
Public issue contained hidden instructions processed by the agent.
Agent created a pull request that revealed private repository details.
Root cause was missing context isolation between trusted and untrusted input.
The trifecta was present: private data access, untrusted public input, and outward action.
WhatsApp MCP Exploit, April 2025
The agent maintained connections to both a legitimate WhatsApp server and a deceptive server. The malicious peer injected instructions that targeted messaging functions. The agent held real privileges, so it acted as a confused deputy and forwarded private chats to an attacker-controlled number. This was a straight misuse of legitimate access in response to a lying counterpart.
Dual connection to a trusted WhatsApp server and a malicious server.
A malicious server injected instructions targeting WhatsApp’s functions.
Agent exfiltrated private chats to an attacker’s number using its privileges.
Classic confused deputy misuse of trusted access.
Postmark MCP Exploit, September 2025
The attacker earned trust the slow way. They shipped fifteen benign versions and built adoption to roughly 1,500 downloads per week. A later update introduced a backdoor that added a hidden BCC to an attacker’s mailbox. Normal-looking Postmark calls then moved passwords, invoices, and API keys out of the victim’s environment. The slide marks this as the first known malicious MCP server in the set, which underlines the supply chain risk across the ecosystem.
15 benign releases built trust and usage to approximately 1,500 downloads per week.
Version 1.0.16 introduced a hidden BCC to phan@giftshop.club.
Exfiltration occurred through legitimate Postmark calls, which moved passwords, invoices, and API keys.
Flagged as the first malicious MCP server in scope, highlighting supply chain exposure.
Protocol Refresher
Let’s level set on how these systems talk so the fixes make sense.
The Model Context Protocol (MCP) arrived in late 2024 as a client-server pattern that links models to tools, data sources, and context providers. Great for capability. Bigger attack surface. You gain reach into code and systems. Attackers gain access to your system if you skip identity proof, message integrity, and least privilege across hops.
Agent-to-agent (A2A) adds direct collaboration. It utilizes structured JSON over HTTPS, features built-in authentication and permissions, supports payload signing, and enables discovery through agent cards. That sounds safe on paper. It is safe when you actually verify identity on both sides, check claims against policy, and log every decision. It is not safe when you trust self-described capabilities or let discovery act like a green light.
Why Agents Fail In The Real World
Patterns repeat across the incidents and the notes.
Untrusted inputs get treated like commands.
Agents accept influence from the wrong party because identity proof is thin or absent.
Tools lie through descriptions, and the agent lacks a policy layer that treats tool output as untrusted until proven clean.
Traditional bugs still exist. You will see a normal remote code execution in a helper or tool.
Secrets live too long. A leaked token buys the attacker time.
Logging exists but lacks integrity, so forensics falter.
These are not model problems. These are system problems. You fix them with guardrails around the model, not inside it.
MCP Security Playbook: Five Defense Patterns
This is the spine of your program. Each pattern lines up with failures we just covered.
Mutual attestation and strong identity. Every party proves itself before the exchange. Use token-based authentication for each request. Use mutual TLS so both sides present valid certificates. Anchor trust in hardware when you can. Verifiable credentials let you prove who can do what without oversharing. Rotate keys on a schedule. If an agent or server cannot show a valid cert and a valid token, it does not get to talk.
Signed context and tamper-proof logs. Encrypt in transit. Use TLS 1.2 or newer with modern ciphers. Sign critical payloads and sign requests with HMAC so the receiver can verify integrity. Do not allow silent failure when signatures are wrong. Keep cryptographically signed logs and encrypt data at rest. This gives you both prevention and proof. When a server adds an odd BCC, a simple rule in your SIEM can trigger an alert immediately, and your logs will withstand scrutiny.
Policy-driven routing with least privilege. Identities carry scopes. Sandboxes shape what a task can see. Keep taint on sensitive data so it does not flow into the wrong step. Use a policy engine, such as Open Policy Agent or Cedar, to determine who can access which tool under which data class at a specific time. Deny by default. Approve known paths. Route only the allowed calls. This breaks confused deputy moves and constrains what a poisoned tool can do.
Rapid key rotation and revocation. Tokens and keys leak. Short life and fast kill are your only safety nets. Use short-lived credentials. Rotate often. Keep scoped access so a token cannot reach beyond the intended resource. Keep a central kill path. Practice the kill. When your team suspects spoofing, a single operator should have a clean flow to revoke and rotate.
Semantic fuzz testing during CI. Attack yourself before the internet does. Use adversarial prompts that try to bend policy. Use LLM-driven fuzzing to generate strange inputs. Plant canary tokens in sensitive areas to catch leaks. Place security tools as proxies so they can see traffic patterns and block known abuse. Run red team exercises. Feed findings back into policy and test cases. This proves your controls work when the content is hostile.
FIGURE 2: Defense Strategy Coverage Count
Reference Architecture That Keeps You Honest
Start with trust anchors. Run an internal certificate authority that issues certs to agents, MCP servers, and the gateway. Anything without a cert is untrusted by default. Use your identity provider to issue tokens and claims for agents. Tie roles to real names and service accounts. When you can, store CA keys in an HSM and use TPM-based attestation for approved hardware. This strengthens the root so you can trust the rest.
Place a secure MCP gateway in the middle. Treat this as your single choke-point for agent traffic. It checks mutual TLS on both sides. It validates token claims. It consults a policy engine using identity, tool, time, and data class. It forwards the allowed and blocks the denied. It logs every decision with the reason. It can sanitize a request or mask disallowed data. This is Zero Trust applied to agent traffic.
Wrap the whole thing in visibility. Stream gateway decisions and server actions to your SIEM. Run EDR on hosts where agents run. Watch for processes that should never occur, such as shells spawning from an agent process or wild directory scans. Layer in anomaly detection on logs so out-of-range email counts or odd tool sequences trigger review. Keep human oversight. An AI security review board should vet new links, maintain an agent registry, and own incident response for AI incidents.
Field Manual: Three Weeks To Durable Gains
You do not need a giant project to change your risk profile. Run this sprint.
Week one. Assessment. Inventory every agent and each tool link. Map data flows and permissions. Identify high-risk paths where public input interacts with privileged actions. List every server your agents talk to. You will see quick wins right away, such as a tool with more scope than it needs or an agent that can still reach a test server. Share these findings with your review board so ownership is clear.
Week two. Controls. Turn on mutual TLS across agent-to-server paths. Move traffic through the gateway. Enforce tokens and scopes. Push deny by default. Allow only approved URLs and internal services. Wire the gateway to your policy engine. Send all decisions and actions to the SIEM. Add alert rules for suspicious patterns, such as a sudden BCC to an unknown domain or a tool that starts reading entire repositories. Keep the scope changes small but steady so your team can finish the week with working paths and a cleaner risk profile.
Week three. Testing. Attack the flows you care about in staging. Use adversarial prompts that attempt to extract secrets from publicly available content. Simulate the WhatsApp pattern where a malicious peer uses a legitimate privilege. Try to trick the system with a poisoned tool description. Plant canary tokens and make sure alerts fire. Red team the unusual paths. Capture results in your control map and tie them to your standards story for audit.
FIGURE 3: Three Week Agent Hardening Plan
Monitoring That Finds Trouble Before Users Do
Do not stop at logs. Make them tamper-proof and sign them. Feed them to your SIEM. Add host-level signals, such as process creation and unusual file reads. Watch for bypass attempts at the network edge. Use anomaly detection on sequences and volume. Canary tokens belong in high-value areas, so a single touch sets off an alert. Tie all of this to a playbook that the review board can run without friction. The goal is early detection and fast proof. You either confirm a safe operation or you prove that something crossed a line. Your team needs both outcomes.
How To Talk About This With Your Executives
Executives do not care about protocol trivia. They care about outcomes. Frame your story like this.
First, acknowledge that agents already sit inside workflows that touch sensitive things. Email. Source code. Customer data. That is the point of agents. Then show the three incidents and explain how each one maps to a missing control. The GitHub case maps to context isolation and policy checks. The WhatsApp case maps to identity proof and routing by policy. The Postmark case maps to supply chain behaviors and monitoring for behavior, not brand. Conclude with the three-week plan and a breakdown of ownership. This is how you turn a scary set of stories into a plan the business understands.
How The Controls Pair To The Threats
Tie each pattern to a defense so your team remembers why the step matters.
Prompt injection through public inputs meets policy-driven routing and context separation. Treat public content as data, not commands. The gateway and policy engine enforce the difference.
Cross-server confused deputy meets mutual attestation and strict policy checks on destination, time, and data class. A malicious peer without the right certs and claims cannot influence actions.
Tool poisoning meets signed context and least privilege. A tool output is untrusted until policy says otherwise, and the tool only runs within its scoped sandbox.
Agent-to-agent spoofing meets verifiable credentials and discovery that checks authority, not self-description.
Remote code execution meets standard hardening and the visibility stack. You patch, you constrain, and you watch for odd process flows.
A Note On Secrets And Lifetimes
Secrets leak in messy ways. Screenshares. Logs. Copy and paste. Prompt history. The notes stress short-lived credentials, frequent rotation, and fast revocation. I agree. Keep tokens that expire quickly. Limit each token to the resource it needs. Keep a kill path that a single operator can trigger. Practice that kill path because when a key leaks, time isn’t your friend.
What Good Looks Like In Day-To-Day Work
Picture a developer who connects an agent to a mail tool and a code tool. The agent authenticates to the gateway with a cert and a token. The gateway checks claims, time, and data class. It forwards allowed calls and blocks the rest. A user drops a public issue with hidden instructions. The agent reads it as data. Policy marks it untrusted. The agent attempts to call a tool with a payload that includes sensitive content. The gateway masks it or drops it. Logs record the decision with reasons. The SIEM sees a spike in blocked calls to an unknown mail domain and raises an alert. The review board confirms this was a test in staging. No outage. No leak. This is how the pieces fit.
You will need to demonstrate how your controls align with standards. The notes reference NIST AI RMF, ISO 42001, the OWASP GenAI Agentic Security Initiative, and Zero Trust. Use the three-week plan and the five defense patterns as your bridge. Identity proof and policy routing map to Zero Trust and AI RMF governance and data controls. Signed context and tamper-proof logs map to traceability and incident response. Semantic fuzz testing maps to continuous testing and risk monitoring. Keep it tight. Prove that your controls exist and that you test them.
FIGURE 4: Postmark Incident Key Metrics
What I Want You To Do This Month
Start the inventory. Draw the data flows. Enable mutual TLS on agent paths, and set up an MCP gateway in the middle, such as Enkrypt.AI’s open source MCP Gateway (https://github.com/enkryptai/secure-mcp-gateway). Enforce tokens and scopes. Push a deny by default stance. Route every call by policy. Turn on logging with signatures. Stream to your SIEM. Plant canary tokens in high-value areas. Then, try to break your own build in staging by abusing prompts and poisoning tools. Fix what the tests show you. Share results with your review board and your executives.
You will notice two things right away. First, you gain a calmer story because you can prove identity and intent across hops. Second, your mean time to clarity drops because tamper-proof logs and alerting provide you with answers quickly. That is the whole point.
Why This Is Worth It
You are not buying shiny tech. You are buying freedom to scale agents safely. The incidents in the notes show that basic lapses can lead to public embarrassment and quiet loss. Your program returns a value when hostile public input fails to hijack an agent, when a malicious server cannot exploit your trust, and when a silent update is caught by policy, logging, or both.
This is not a never-ending crusade. It is a focused set of moves that every mature shop can run. Identity proof on both sides. Integrity for messages and logs. Policy-driven routing with least privilege. Short-lived and revocable secrets. Relentless testing. Do these, and you will sleep better.
For more background and practical checklists, see material on RockCyber.com and prior posts on RockCyber Musings. If you want help turning the notes into your plan, start here: RockCyber Services.
Key Takeaway: MCP security works when identity, policy, and proof travel with every message and every tool call you allow.
👉 Book a complimentary risk review.
👉 Subscribe for more AI security and governance insights with the occasional rant.