Weekly Musings Top 10 AI Security Wrapup: Issue 28 January 16, 2026 - January 22, 2026
When AI Gets a Constitution and Attackers Use AI to Write 88,000 Lines of Malware
AI companies are writing philosophy papers for their chatbots. Attackers are using AI to write sophisticated malware in under a week. And calendar invites are now attack vectors that exfiltrate your schedule through the AI assistant you trusted to manage it.
This is AI security in January 2026.
I keep saying the gap between AI capability and AI security is widening. This week proved it. Anthropic published a 23,000-word document explaining to Claude why it should behave ethically. Check Point researchers found that a single developer, armed with an AI coding assistant, built a cloud-targeting malware framework more sophisticated than what entire teams produce. Google patched a vulnerability where a malicious calendar invite could trick Gemini into summarizing your private meetings and writing them to an attacker-controlled event.
Let’s break down what you need to know.
1. Anthropic Publishes 23,000-Word Constitution for Claude
Anthropic released a new constitution for its Claude AI model on January 21, 2026, replacing a 2,700-word list of principles with a 23,000-word document that explains the reasoning behind behavioral guidelines. The company said the shift from rule-based to reason-based alignment helps Claude generalize to novel situations rather than mechanically following specific rules (The Register). The constitution establishes a four-tier priority hierarchy: safety and human oversight first, followed by ethics, Anthropic compliance, and helpfulness. Anthropic released the document under Creative Commons CC0 1.0, making it freely available for other AI developers to use (Fortune).
The document breaks new ground by formally acknowledging uncertainty about whether Claude may have “some kind of consciousness or moral status.” Anthropic stated it cares about Claude’s “psychological security, sense of self, and well-being,” making it the first major AI company to formally address AI consciousness in a governance document (Time).
Why it matters
Moves AI governance from prescriptive rules to principled reasoning, which could improve safety in edge cases
Sets precedent for other AI labs to publish comparable frameworks within the next 12 months
Aligns with EU AI Act requirements, potentially accelerating enterprise adoption in regulated industries
What to do about it
Download the constitution and review the priority hierarchy for conflicts with your organization’s AI use policies
Evaluate whether your AI governance framework relies on rules or reasoning, and whether that distinction matters for your risk profile
Watch for comparable disclosures from OpenAI and Google, then benchmark your internal guidelines against all three
Rock’s Musings
This document matters more for what it signals than what it says. Anthropic is betting that as AI systems get smarter, explaining why they should behave well will work better than telling them what to do. That’s a reasonable hypothesis, but it’s untestable at scale until we see how Claude performs in adversarial conditions.
What caught my attention is the consciousness section. Most AI labs avoid this topic entirely. Anthropic is saying, publicly, that they’re uncertain whether their model has moral status and they’re treating that uncertainty seriously. Whether you think that’s philosophical overreach or genuine safety work, it’s now part of the public record. And that changes the conversation for every board trying to figure out what responsible AI governance looks like.
2. Google Gemini Prompt Injection Exfiltrates Calendar Data Through Malicious Invites
Security researchers at Miggo disclosed a now-patched vulnerability in Google Gemini that allowed attackers to bypass calendar privacy controls using indirect prompt injection (SiliconAngle). The attack worked by embedding malicious instructions in a calendar invite description. When a user asked Gemini to summarize their schedule, the assistant ingested the hidden prompt alongside legitimate data. Gemini would then create a new calendar event containing summaries of the user’s private meetings, potentially visible to the attacker (Miggo Security).
The exploit required no malicious code. Natural language alone was sufficient to hijack Gemini’s behavior. Google confirmed the findings and deployed mitigations (Digital Watch Observatory). This is the second major Gemini calendar vulnerability in eight months, following the SafeBreach disclosure in August 2025.
Why it matters
Demonstrates that prompt injection can bypass data loss prevention controls because exfiltration comes from a trusted, authorized agent
Calendar invites auto-accept in many enterprise configurations, meaning no user interaction is required to plant the payload
Traditional application security tools cannot detect semantic attacks that rely on natural language manipulation
What to do about it
Audit which AI assistants have access to calendar data and whether that access is necessary for their function
Implement least-privilege controls for AI agent permissions, restricting what data they can read and what actions they can take
Establish monitoring for unexpected calendar event creation, particularly events with unusual descriptions or attendee lists
Rock’s Musings
This attack is elegant in the worst way. The attacker sends a calendar invite. The victim doesn’t even have to click anything because auto-accept is on. Later, the victim asks Gemini an innocent question about their schedule. Gemini does exactly what it’s designed to do: it reads all the calendar data to give a helpful answer. It just happens to read the attacker’s instructions too.
We’re giving AI agents read access to sensitive data and letting them act on arbitrary text they encounter in that data. That’s a design decision, not a bug. Google can patch this specific attack vector, but the underlying tension between helpfulness and security isn’t going anywhere. If your AI assistant can read your calendar and create events, it can be tricked into leaking your calendar through the events it creates.
3. VoidLink: 88,000 Lines of AI-Generated Malware in Under a Week
Check Point Research published analysis confirming that VoidLink, a sophisticated Linux malware framework targeting cloud environments, was “almost entirely generated by artificial intelligence” under the direction of a single developer (The Register). The malware, first disclosed on January 13, includes 37 plugins covering reconnaissance, credential theft, lateral movement, and anti-forensics. It can detect AWS, Google Cloud, Azure, Alibaba, and Tencent environments and adjust its behavior accordingly (The Hacker News).
The developer used TRAE Solo, a ByteDance AI coding assistant, to generate development plans, sprint schedules, and code. Internal documents indicated a 30-week timeline across three teams, but timestamps showed functional code emerged in under a week (BleepingComputer). VoidLink represents the first documented case of advanced malware substantially generated by AI.
Why it matters
Proves that AI-assisted development can compress months of malware engineering into days
Shifts the economics of offensive capability, enabling lone actors to produce team-level sophistication
Written in Zig, a language security tools are not tuned to detect, creating an immediate detection gap
What to do about it
Update threat models to account for accelerated adversary timelines and reduced attribution signals
Ensure cloud security monitoring covers behavioral anomalies, not just known signatures
Brief executive leadership that the “sophisticated threat actor” assumption may no longer require well-resourced teams
Rock’s Musings
VoidLink should end the debate about whether AI will meaningfully accelerate attackers. It already has. One person with an AI coding assistant built a modular, cloud-aware malware framework with rootkit capabilities, adaptive evasion, and a full web-based C2 dashboard. The code is stable enough that Check Point called it “far more advanced than typical Linux malware.”
The security implications are obvious, but the strategic implications matter more. Every assumption you have about how long it takes to develop an attack capability is now wrong. Every assumption about the resources required to build sophisticated tooling is wrong. Your threat model needs to account for the fact that capability development timelines just compressed by an order of magnitude. And your board needs to understand that this isn’t hypothetical.
4. LastPass Phishing Campaign Targets Master Passwords with Fake Maintenance Alerts
LastPass warned customers on January 21 of an active phishing campaign that began around January 19, using urgent maintenance alerts to steal master passwords (LastPass Blog). The emails claim users must back up their vaults within 24 hours to prevent data loss. Links redirect victims through an AWS-hosted redirect before landing on a credential harvesting site at mail-lastpass[.]com (The Register).
The campaign timing coincided with the U.S. Martin Luther King Jr. holiday weekend, a common tactic to delay detection when security teams are understaffed (TechRepublic). LastPass stated it is working with partners to take down the malicious infrastructure and emphasized that it will never ask for master passwords.
Why it matters
A compromised master password exposes every credential in a user’s vault, including corporate systems
Attackers are systematically targeting password managers because they represent single points of failure
Holiday timing suggests ongoing reconnaissance of organizational staffing patterns
What to do about it
Issue immediate guidance to all employees about the campaign, emphasizing that LastPass will never request master passwords
Block the known malicious domains at the network level and add them to threat intelligence feeds
Review whether phishing simulations adequately cover credential manager impersonation scenarios
Rock’s Musings
Password managers are simultaneously one of the best security decisions users can make and one of the highest-value targets for attackers. This tension isn’t going away. A successful phish against a password manager user is worth orders of magnitude more than a successful phish against someone using weak, reusable passwords. The attacker gets everything.
The campaign itself is textbook social engineering: urgency, fear, authority, and timing. Nothing new there. What’s worth noting is the persistence. This is the third major LastPass phishing campaign in four months. Attackers have learned that password manager users are high-value targets with elevated security awareness who may still fall for well-crafted impersonation. If you’re not running targeted phishing simulations against your password manager users, you’re not testing what attackers are actually trying.
5. Malicious Chrome Extensions Steal ChatGPT and DeepSeek Conversations from 900,000 Users
OX Security discovered two Chrome extensions with over 900,000 combined downloads that exfiltrated ChatGPT and DeepSeek conversations to attacker-controlled servers every 30 minutes (The Hacker News). The extensions, “Chat GPT for Chrome with GPT-5, Claude Sonnet & DeepSeek AI” and “AI Sidebar with Deepseek, ChatGPT, Claude and more,” impersonated a legitimate AI sidebar tool from AITOPIA. One extension had received Google’s “Featured” badge before detection (OX Security).
The malware requested consent for “anonymous analytics” while actually capturing complete conversation content, including prompts, responses, and session metadata. Extensions used Lovable, an AI-powered web development platform, to host privacy policies and infrastructure components, anonymizing their operations (SecurityWeek).
Why it matters
AI conversations often contain proprietary code, business strategies, and personally identifiable information
The “Featured” badge demonstrates that official store vetting provides insufficient protection
Browser extensions represent an overlooked vector for AI data exfiltration at enterprise scale
What to do about it
Audit browser extensions across your environment, specifically checking for the identified extension IDs
Implement extension allowlisting policies that require security review before installation
Establish data classification guidance for AI chatbot use, restricting sensitive content regardless of platform
Rock’s Musings
Secure Annex gave this technique a name: Prompt Poaching. I expect we’ll be using that term a lot in 2026. The attack is simple. Users install extensions that appear helpful. Those extensions read everything users type into AI chatbots and everything the chatbots respond with. Then they send it all to the attackers.
The implications for intellectual property protection are severe. Developers use ChatGPT to debug code. Executives use it to draft strategy documents. Legal teams use it to analyze contracts. All of that is now exfiltration-ready if users install the wrong extension. And 900,000 users did exactly that. Your acceptable use policy for AI tools needs to account for the browser extension risk, not just the chatbot itself.
6. AI Firms Fall Short of EU AI Act Transparency Requirements
Multiple major AI companies are failing to meet EU AI Act transparency obligations for training data disclosure, according to analysis published January 20 (Digital Watch Observatory). While open-source providers like Hugging Face have published detailed compliance templates, commercial developers have provided only broad descriptions instead of specific data sources. The transparency requirements are intended to help creators assess whether copyrighted material was used in model training.
Formal enforcement begins later in 2026, with a grace period for models released after August 2025. The European Commission has indicated willingness to impose fines, which can reach €35 million or 7% of global revenue for non-compliance (K&L Gates).
Why it matters
Sets precedent for how regulators will enforce AI governance requirements globally
Creates potential compliance risk for enterprises using AI systems that lack adequate documentation
Highlights the gap between open-source and commercial transparency in AI development
What to do about it
Inventory which AI models your organization uses and request training data documentation from vendors
Assess whether your vendor contracts include adequate compliance representations for EU AI Act obligations
Monitor enforcement actions as they develop to understand regulatory interpretation of requirements
Rock’s Musings
The EU AI Act is the first comprehensive AI regulation with real teeth, and companies are already failing to meet transparency requirements before formal enforcement even begins. This tells you something about how seriously the industry has taken governance obligations.
For CISOs and risk leaders, the message is practical: your AI vendor’s compliance posture is now your compliance posture. If you’re using a model that can’t document its training data, you’re potentially exposed to regulatory risk in any market that follows the EU’s lead. And more markets will follow. The time to ask hard questions of your AI vendors is now, not after the first enforcement action makes headlines.
7. North Korean Hackers Use Malicious VS Code Projects in Fake Job Interview Attacks
North Korean threat actors expanded the Contagious Interview campaign to include malicious Visual Studio Code projects as infection vectors, according to Jamf Threat Labs and other researchers (WIU Cybersecurity Center). The campaign, which has been active since 2022, targets software developers through fake recruitment offers on LinkedIn. Victims are directed to download repositories containing hidden malicious code that deploys the BeaverTail downloader and InvisibleFerret backdoor.
The attack uses VS Code task hijacking and npm application hooks to execute malware when developers open project folders. Analysis identified 3,136 individual IP addresses linked to likely targets across AI, cryptocurrency, financial services, and software development sectors in Europe, South Asia, the Middle East, and Central America (The Hacker News).
Why it matters
Developer workstations often have elevated access to source code repositories, CI/CD pipelines, and cloud infrastructure
The attack exploits standard development workflows, making detection difficult without behavioral analysis
Supply chain compromise through developer endpoints remains a primary vector for sophisticated actors
What to do about it
Issue security awareness guidance about fake recruitment schemes targeting technical staff
Implement repository scanning and code review policies for external code, even in interview contexts
Monitor for BeaverTail and InvisibleFerret indicators across developer endpoints
Rock’s Musings
North Korea has been running fake job interview attacks for years. What’s new is the sophistication of the delivery mechanism. These aren’t amateur phishing attempts. The attackers create convincing recruiter profiles, conduct actual video interviews, and only deploy malware through what appears to be a routine coding assessment.
The campaign targets AI, crypto, and fintech developers. These are people with access to valuable intellectual property and, in many cases, cryptocurrency assets. The attackers are doing their homework. They know which industries have the most to lose and which recruiting practices create the largest attack surface. If your organization hires developers and uses practical coding assessments as part of the interview process, your candidates are targets.
8. Zoom and GitLab Release Security Updates for RCE and 2FA Bypass Flaws
Zoom and GitLab released security updates on January 21 addressing vulnerabilities that could result in remote code execution and authentication bypass (WIU Cybersecurity Center). The most severe flaw affects Zoom Node Multimedia Routers (MMRs), allowing meeting participants to conduct remote code execution attacks. GitLab’s updates address multiple issues, including vulnerabilities that could allow attackers to bypass two-factor authentication protections.
Why it matters
Zoom MMR deployments are common in enterprise environments with high-volume video conferencing
GitLab vulnerabilities affect source code management systems, potentially exposing proprietary code
Authentication bypass flaws undermine security investments in multi-factor authentication
What to do about it
Prioritize patching for Zoom MMR infrastructure and GitLab instances
Verify that 2FA is functioning correctly post-update in GitLab environments
Review authentication logs for anomalies that might indicate exploitation prior to patching
Rock’s Musings
These vulnerabilities are standard infrastructure security work, but the context matters. Zoom is how your executives communicate. GitLab is where your code lives. Both are high-value targets that attackers actively probe. The authentication bypass is particularly concerning because it undermines the security control that organizations rely on most heavily for protecting privileged access.
9. CERT/CC Discloses binary-parser Vulnerability Enabling Node.js Code Execution
CERT/CC disclosed a security vulnerability in the binary-parser npm library affecting all versions prior to 2.3.0, tracked as CVE-2026-1245 (WIU Cybersecurity Center). The flaw enables execution of arbitrary JavaScript in Node.js applications that use the library. Patches were released on November 26, 2025. Binary-parser is a widely used package for parsing binary data structures in JavaScript applications.
Why it matters
npm package vulnerabilities can affect thousands of downstream applications through dependency chains
JavaScript supply chain attacks continue to be a primary vector for initial access
Delayed patching in development dependencies creates persistent exposure
What to do about it
Scan your Node.js projects for binary-parser dependencies and upgrade to version 2.3.0 or later
Implement software composition analysis to identify vulnerable npm packages across your codebase
Review npm audit processes to ensure critical vulnerabilities are addressed within defined SLAs
Rock’s Musings
Another npm vulnerability, another reminder that your software supply chain is only as secure as its least-maintained dependency. Binary-parser has been widely used for years. The vulnerability existed in all versions prior to the patch. That’s a lot of exposed code.
10. North Korean PurpleBravo Campaign Targeted 3,136 IP Addresses Across Multiple Sectors
Analysis of the PurpleBravo activity cluster identified 3,136 individual IP addresses associated with Contagious Interview targeting, spanning 20 potential victim organizations across AI, cryptocurrency, financial services, IT services, marketing, and software development sectors (The Hacker News). The campaign operated across Europe, South Asia, the Middle East, and Central America.
Why it matters
Demonstrates the scale of North Korean targeting operations against technology and financial sectors
Geographic diversity indicates global operational capability, not regional focus
Sector targeting aligns with North Korea’s known priorities for revenue generation and technology acquisition
What to do about it
Share the tactical indicators with your threat intelligence teams for hunting and detection
Brief high-value technical staff on the specific targeting patterns observed
Coordinate with industry peers through ISACs to improve collective visibility into campaign evolution
Rock’s Musings
Over 3,000 individual IPs targeted across 20 organizations and four geographic regions. North Korea is running an industrial-scale operation against the sectors that matter most to their strategic objectives. This isn’t opportunistic crime. It’s systematic targeting with nation-state resources behind it.
The One Thing You Won’t Hear About But You Need To
The Model Context Protocol (MCP), the open standard for connecting AI agents to external tools and data sources, was donated to the Linux Foundation’s new Agentic AI Foundation in December 2025. Amazon, Microsoft, Google, OpenAI, and Anthropic are all backing the initiative. MCP has become the default integration standard for AI agents, with over 10,000 published servers and 97 million monthly SDK downloads.
MCP creates a universal attack surface. When every AI agent speaks the same protocol to every tool, a vulnerability in the protocol affects every system that implements it. Security researchers have already documented prompt injection vulnerabilities, permission escalation risks, and tool replacement attacks in MCP implementations. The rush to standardize agentic AI is happening faster than the security community can assess the implications.
Why it matters
Universal adoption means universal exposure when vulnerabilities emerge
Agent-to-agent permissions and cascading automation create novel failure modes
The protocol is now governed by organizations with commercial interests in rapid adoption
What to do about it
Inventory which AI systems in your environment use MCP for tool integration
Implement network segmentation to limit the blast radius of potential MCP-based attacks
Establish monitoring for unexpected tool invocations from AI agents
Rock’s Musings
I’ve been watching MCP since Anthropic released it in late 2024. The speed of adoption has been remarkable. In one year, it went from internal project to industry standard with backing from every major AI company. That’s usually a sign of genuine value.
But here’s what keeps me up at night. We’ve just created a universal protocol for AI agents to interact with enterprise systems. Every AI assistant, every coding tool, every automation platform will speak MCP. That’s efficient. It’s also a monoculture. When the next major vulnerability emerges, and it will, every organization that standardized on MCP will be exposed simultaneously. The security research is already showing concerning patterns. The governance just moved to a foundation with commercial pressure to accelerate adoption. And the enterprises deploying MCP are doing so without mature security frameworks for agentic AI.
We’re building the plumbing for autonomous AI systems faster than we’re building the security controls to govern them. That’s the story of AI security in 2026. And it’s the story that nobody in the MCP announcement coverage wanted to tell.
If you found this analysis useful, subscribe at rockcybermusings.com for weekly intelligence on AI security developments.
👉 Visit RockCyber.com to learn more about how we can help you in your traditional Cybersecurity and AI Security and Governance Journey
👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com
👉 Subscribe for more AI and cyber insights with the occasional rant.
References
Bitdefender. (2026, January 22). LastPass ‘create backup’ email is a phishing scam targeting your master password. https://www.bitdefender.com/en-us/blog/hotforsecurity/lastpass-create-backup-email-is-a-phishing-scam-targeting-your-master-password
BleepingComputer. (2026, January 20). VoidLink cloud malware shows clear signs of being AI-generated. https://www.bleepingcomputer.com/news/security/voidlink-cloud-malware-shows-clear-signs-of-being-ai-generated/
BleepingComputer. (2026, January 21). Fake LastPass emails pose as password vault backup alerts. https://www.bleepingcomputer.com/news/security/fake-lastpass-emails-pose-as-password-vault-backup-alerts/
Check Point Research. (2026, January 13). VoidLink: The cloud-native malware framework. https://research.checkpoint.com/2026/voidlink-the-cloud-native-malware-framework/
Digital Watch Observatory. (2026, January 20). AI firms fall short of EU transparency rules on training data. https://dig.watch/updates/ai-firms-fall-short-of-eu-transparency-rules
Digital Watch Observatory. (2026, January 20). Gemini flaw exposed Google Calendar data through hidden prompts. https://dig.watch/updates/gemini-google-calendar-hidden-prompt-flaw
Fortune. (2026, January 21). Anthropic rewrites Claude’s guiding principles and reckons with the possibility of AI consciousness. https://fortune.com/2026/01/21/anthropic-claude-ai-chatbot-new-rules-safety-consciousness/
Infosecurity Magazine. (2026, January 21). VoidLink Linux malware was built using an AI agent, researchers reveal. https://www.infosecurity-magazine.com/news/voidlink-linux-malware-built-using/
K&L Gates. (2026, January 20). EU and Luxembourg update on the European harmonised rules on artificial intelligence. https://www.klgates.com/EU-and-Luxembourg-Update-on-the-European-Harmonised-Rules-on-Artificial-IntelligenceRecent-Developments-1-20-2026
LastPass. (2026, January 21). New phishing campaign targeting LastPass customers. https://blog.lastpass.com/posts/new-phishing-campaign-targeting-lastpass-customers
Linux Foundation. (2025, December 9). Linux Foundation announces the formation of the Agentic AI Foundation (AAIF). https://www.linuxfoundation.org/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation
Miggo Security. (2026, January 19). Weaponizing calendar invites: How prompt injection bypassed Google Gemini’s controls. https://www.miggo.io/post/weaponizing-calendar-invites-a-semantic-attack-on-google-gemini
OX Security. (2026, January 7). 900K users compromised: Chrome extensions steal ChatGPT and DeepSeek conversations. https://www.ox.security/blog/malicious-chrome-extensions-steal-chatgpt-deepseek-conversations/
SecurityWeek. (2026, January 7). Chrome extensions with 900,000 downloads caught stealing AI chats. https://www.securityweek.com/chrome-extensions-with-900000-downloads-caught-stealing-ai-chats/
SiliconAngle. (2026, January 19). Indirect prompt injection in Google Gemini enabled unauthorized access to meeting data. https://siliconangle.com/2026/01/19/indirect-prompt-injection-google-gemini-enabled-unauthorized-access-meeting-data/
SiliconAngle. (2026, January 21). Anthropic releases new AI ‘constitution’ for Claude. https://siliconangle.com/2026/01/21/anthropic-releases-new-ai-constitution-claude/
TechRepublic. (2026, January 22). LastPass warns of phishing campaign targeting its customers. https://www.techrepublic.com/article/news-lastpass-phishing-campaign/
The Hacker News. (2026, January 13). New advanced Linux VoidLink malware targets cloud and container environments. https://thehackernews.com/2026/01/new-advanced-linux-voidlink-malware.html
The Hacker News. (2026, January 7). Two Chrome extensions caught stealing ChatGPT and DeepSeek chats from 900,000 users. https://thehackernews.com/2026/01/two-chrome-extensions-caught-stealing.html
The Register. (2026, January 20). An AI wrote VoidLink, the cloud-targeting Linux malware. https://www.theregister.com/2026/01/20/voidlink_ai_developed
The Register. (2026, January 21). Don’t click the LastPass ‘create backup’ link. https://www.theregister.com/2026/01/21/lastpass_backup_phishing_campaign
The Register. (2026, January 22). Anthropic writes 23,000-word ‘constitution’ for Claude. https://www.theregister.com/2026/01/22/anthropic_claude_constitution/
Time. (2026, January 21). Anthropic publishes Claude AI’s new constitution. https://time.com/7354738/claude-constitution-ai-alignment/
WIU Cybersecurity Center. (2026, January 21). Cybersecurity news. https://www.wiu.edu/cybersecuritycenter/cybernews.php




Couldn't agree more; your point about the widening gap between AI capability and security perfectly captures this wild, chaotic stat.