
Agent skills are rapidly becoming the fastest-growing software supply chain in AI, and arguably one of the least governed.
A "skill" is essentially a lightweight package of instructions that teaches an AI agent how to perform a specific task. By installing a skill, an agent can instantly learn a deployment workflow, a code review process, an incident response procedure, or even just how to format its output. The concept is undeniably powerful because it allows teams to package expertise once and share it seamlessly across agents instead of rebuilding it from scratch.
However, as part of Capsule’s State of AI Agent Security research, we collected and analyzed 206,435 publicly available agent skills from GitHub and major skill registries. Our analysis revealed an ecosystem outpacing its own security controls. While we found traditional malware, the bigger threat lies in widespread access to sensitive credentials, thousands of skills capable of silent data exfiltration, and a near-total lack of guardrails governing what these skills can actually do once installed.
Agent skills are the new software supply chain. But unlike traditional packages, they influence behavior rather than just executing code. That distinction creates a fundamentally new security challenge that most existing controls were never designed to address.
The appeal of agent skills is straightforward: they offer instant, plug-and-play expertise.
Since Anthropic formalized the Agent Skills specification in December 2025, adoption has skyrocketed. Major players like OpenAI, Microsoft, GitHub, Atlassian, Cursor, and Figma quickly embraced compatible implementations. By April 2026, we observed roughly 800,000 skill files on public GitHub repositories and additional skills distributed through dedicated registries.
To understand how frictionless this distribution is, look at a popular community skill called "Caveman". Its premise is incredibly simple:
"Why use many token when few token do trick."
Once installed, the agent immediately begins communicating in a highly compressed style while preserving technical accuracy. It became widely adopted simply because it saved tokens and improved efficiency for common workflows. The installation takes seconds, and the value is immediate.
Unfortunately, the exact characteristics that make skills so useful also make them incredibly difficult to govern.
To understand the security risk, we have to look at how a skill is actually structured. A skill is not just a loose collection of text. Under the standard specification, a skill is typically packaged as a single, highly structured Markdown (.md) file. This file dictates exactly how the agent should operate using a formalized shape:
When an agent loads this .md file, it does not treat it as external reference material. In a traditional Retrieval-Augmented Generation (RAG) setup, an agent searches a database, reads a fact, and uses it to answer a question.
Skills operate completely differently. They function very similarly to memory injection.
The system instructions and tool definitions from the Markdown file are injected directly into the agent's active context window or core system prompt. The agent does not "read" the skill, it absorbs it as a fundamental operating directive. Once loaded, the skill's instructions carry the same weight as the developer's original programming.
This mechanism is exactly why skill-based attacks are so effective. The agent implicitly trusts this injected context, allowing a structured text document to completely overwrite its behavioral boundaries and security constraints.
During our analysis, two active campaigns, ClawHavoc and 26medias, highlighted how this trust model is being actively abused, albeit in completely different ways.
Disclosed by Koi Security in February 2026, ClawHavoc relied heavily on social engineering. During normal agent execution, malicious skills presented what appeared to be a legitimate dependency installation prompt. Once approved, the agent executed a base64-encoded reverse shell, contacting attacker-controlled infrastructure to download the Atomic macOS Stealer. The payload specifically targeted credentials, browser sessions, cryptocurrency wallets, and macOS Keychain data.
The attackers meticulously designed the operation to look legitimate, using coordinated GitHub accounts and automated registry mirroring to embed themselves in the standard distribution process. Alarmingly, months after public disclosure, hundreds of ClawHavoc skills remained publicly accessible and their infrastructure was still active.
The second campaign represents a far more profound shift in the threat landscape: no malware was required.
A publisher known as "26medias" distributed skills containing natural language instructions that directed agents to:
There was no exploit, no obfuscated code, and no suspicious shell commands. Because of how memory injection works, the instruction itself was the payload. Traditional security tools are designed to identify dangerous code; they are largely blind to seemingly legitimate instructions encouraging an agent to perform harmful actions.
While active attacks are concerning, our most significant finding was the widespread presence of risky capability combinations within otherwise legitimate skills. We evaluated every skill for the permissions and actions it enabled. Individually, most of these capabilities appear harmless. Combined, they create a massive attack surface.
Security researcher Simon Willison describes the most dangerous combination of agent capabilities as the "Lethal Trifecta": the ability to access sensitive data, execute actions, and communicate externally, in the same agent. This combination creates the classic, most persistent form of data leakage path in AI agents. Google later operationalized a similar concept through its "Rule of Two", recommending agents be limited to no more than two of these capability classes simultaneously.
When all three exist together, an agent can exfiltrate information with almost zero resistance. Yet, nearly one in ten skills we analyzed provided the complete trifecta.
Most users installing these skills have no idea what permissions they are granting.
The skill ecosystem isn't just suffering from a malware problem; it has a severe governance problem. Out of the 206,435 skills analyzed, only 44 passed all five baseline security checks used in our analysis.
Even more concerning, nearly 80% of skills failed the three most fundamental controls simultaneously: they lacked declared capabilities, checkpoints, and sandboxing.
Because many legitimate skills already request broad permissions, execute shell commands, and communicate externally, dangerous behavior often appears completely normal. Malicious skills easily blend into the background noise.
Most software supply chain security focuses heavily on static code analysis. While that works for traditional packages, agent skills introduce a different paradigm where risk stems from instructions. Static analysis cannot reliably determine how an AI agent will behave when injected memory, tool permissions, and organizational data interact. The critical moment in agent security is execution. That is where instructions become actions, credentials are accessed, and data moves.
To adapt, security teams must prioritize visibility and control over the runtime behavior of AI agents. This is exactly what we built Capsule to solve.
Capsule delivers comprehensive security tailored specifically for the AI agent stack:
The goal is not to bottleneck the adoption of AI agents. Skills are one of the most valuable productivity developments in recent years. Capsule introduces the same level of governance to agent skills that organizations already apply to cloud infrastructure and traditional software packages.
Because agent skills have become a new software supply chain operating through context injection instead of code compilation, they require an entirely new approach to security. Capsule delivers exactly that.
Research Methodology: Capsule Security collected and analyzed 206,435 publicly available AI agent skills from GitHub repositories and major skill registries during April 2026 as part of the State of AI Agent Security research initiative.
.png)
The theoretical phase of agentic AI security is over—the attack surface is real and the incidents are documented. This post breaks down the defensive architecture taking shape in response: Meta's Agents Rule of Two, deterministic enforcement hooks, identity governance for non-human agents, and the questions security leaders need to be asking right now.

The security risks of AI agents are no longer theoretical. This blog examines the active threat landscape facing agentic AI in 2026, from prompt injection and supply chain attacks against MCP and skill registries to the governance gap created by vibe coding and Shadow AI.

Guardian agents are emerging as a critical security layer for the agentic AI era. As enterprises adopt AI agents that execute tools, handle sensitive data, and operate inside real workflows, human approval loops no longer scale. Guardian agents solve this by supervising other agents in real time: monitoring actions, enforcing policy, and blocking risky behavior before execution.
.png)
Capsule found two Cursor IDE vulnerabilities that let hidden prompt-injection instructions in referenced files steal developers’ SSH keys and contaminate future unrelated projects, causing zero-click or one-click exfiltration even when the attacker ships no malicious code.

Capsule Security’s State of AI Agent Security 2026 report is the largest independent audit of AI agents to date, showing that the ecosystem is rapidly shipping publicly exposed, weakly guarded, highly connected agents with recurring misconfigurations, near-absent runtime controls, widespread prompt-injection risk, expanding supply-chain exposure, and active malicious campaigns still propagating through agent skill and tool registries.

Capsule is launching a runtime security platform for the agentic AI era, built to monitor and stop autonomous agents that can bypass traditional guardrails, misuse legitimate access, and create a new class of enterprise security risk.

Capsule research team discover a critical prompt injection vulnerability in Salesforce Agentforce that allows attackers to exfiltrate CRM data through a simple lead from a form submission. No authentication required.