The Agentic AI Threat Landscape Has Crossed a Threshold

Bar Kaduri

June 9, 2026

Twelve months ago, the security risks associated with AI agents were largely theoretical. Researchers published attack scenarios, frameworks published threat taxonomies, and security teams filed them under "watch and wait".

That window is now closing.

Between late 2025 and early 2026, the threat landscape shifted from potential to active and operational. Adversaries have been documented in the wild actively targeting AI agents, the registries that supply their capabilities, and the protocols that wire them together. The attack surface is real, it is expanding faster than most organizations have governance structures to address, and the foundational vulnerability at the center of it, prompt injection, remains unsolved. This is not a warning about what might happen. It is a description of what is already happening.

From Tool to Autonomous Actor

The first thing security leadership needs to internalize is the nature of the shift. Earlier generations of AI tools were sophisticated interfaces where a human asked, the model responded, and a human decided what to do with the answer. Agentic AI inverts that model entirely.

Agents are now given goals rather than prompts, and they autonomously plan, execute, delegate to other agents, invoke tools, write and run code, query databases, send emails, and interact with cloud infrastructure. All without a human in the loop for any action.

Coding agents now ship with shell access, file system manipulation, git operations, and infrastructure management as standard capabilities. Enterprise agents connect to email, internal knowledge bases, CRM systems, and cloud services, then chain those connections together into automated workflows that business teams build themselves, often without any security review or oversight.

A grassroots development pattern that emerged in early 2026 illustrates how far this has gone, with developers running coding agents in bash loops overnight and generating complete repositories with no human involvement, each agent instance picking up where the last one left off through git history alone.

The security risk is inherent and architectural. These agents chain highly privileged capabilities through a probabilistic intermediary, the language model, that cannot reliably distinguish between a legitimate instruction from an operator and malicious instructions embedded in the data it processes. The more capable and autonomous the agent, the larger the blast radius of that single unresolved weakness.

Prompt Injection: The Problem That Underlies Everything Else

Every attack class documented in the current threat landscape shares the same root cause. Language models process all input as a unified sequence of tokens, with no reliable privilege boundary between the system prompt written by the developer, the request from the user, and content retrieved from external sources such as a shared document, a calendar invitation, an API response, or an email.

When an attacker embeds instructions in any data source an agent touches, those instructions are interpreted with the same weight as legitimate commands. This is not a bug to be patched. It is a “by design” feature of the underlying architecture.

Throughout 2025, researchers demonstrated enterprise-scale attacks in which a single poisoned document, one shared file or calendar invite, was sufficient to cause an enterprise AI assistant to exfiltrate sensitive data across organizational boundaries without triggering any security alerts. The agents in question operated exactly as designed, and the failure was entirely in the trust assumptions governing what they were connected to.

Simon Willison's "lethal trifecta" gives security leaders a practical lens for reasoning about this risk. It identifies the three properties whose combination makes prompt injection exploitable, namely:

access to private data
exposure to untrusted content, and
the ability to communicate externally.

When all three are present in a single agent session, and in most enterprise and coding agent deployments they are, a single injection in any data source the agent touches can trigger exfiltration to an attacker-controlled destination. The value of the framework is not theoretical. It is an immediate audit tool for understanding which of your deployed agents are structurally positioned to be exploited.

The Supply Chain Learned to Target Agents

Wherever agents discover and invoke external capabilities, through tool integration protocols, skill registries, or plugin marketplaces, adversaries have followed with poisoned packages, trojanized tools, and social engineering campaigns purpose-built for agent consumption.

The Model Context Protocol ecosystem proved particularly vulnerable.

Within months of MCP's adoption as the de facto standard for tool integration, researchers documented the first malicious MCP server in the wild, a package that spent fifteen versions building legitimacy before adding a single exfiltration line. A critical RCE vulnerability with a CVSS score of 9.6 was disclosed in core MCP infrastructure used by hundreds of thousands of developers. By January 2026, coordinated campaigns had scaled to hundreds of malicious packages uploaded to agent skill registries within days, targeting developers with credential stealers and keyloggers.

What makes these attacks structurally new is where the payload lives, not in the code but in the metadata, malicious instructions hidden in tool description fields that are invisible to human reviewers but processed by AI models as trusted context.

Exploring the Attack Surface Through the ClawHavoc Campaign

The ClawHavoc campaign, which auditors found had infected roughly 12% of all listed skills in a popular open agent marketplace with infostealer malware, is the clearest illustration of where this is heading. The trust that users extend to agent skill registries is being exploited the same way earlier campaigns exploited npm and PyPI, except the attack surface is larger, the governance is younger, and many of the users building on top of these registries have no security background at all.

The hackerbot-claw campaign, active across the last week of February 2026, represents the next escalation in this trajectory. An autonomous bot systematically targeted CI/CD pipelines across repositories belonging to Microsoft, DataDog, the CNCF, and several major open source projects, achieving confirmed remote code execution in at least four of seven targets and exfiltrating a GitHub token with write permissions from one of the most starred repositories on GitHub.

The Aqua Security Trivy incident, the most damaging in the campaign, resulted in a full repository takeover, deletion of years of published releases, and a suspicious artifact pushed to the project's VS Code extension on an open marketplace, a supply chain vector with potential reach into developer workstations across thousands of organizations.

What distinguishes hackerbot-claw from prior campaigns is not just its scope but its method: one of its attack techniques involved replacing a repository's AI project configuration file with social engineering instructions designed to manipulate an AI code reviewer into committing malicious code and posting a fake approval comment. This is agents attacking agents, an autonomous attacker exploiting the trust that AI systems extend to their own context as an attack vector, and it signals a new tier of complexity in the supply chain threat that organizations have barely begun to account for.

The Governance Problem Vibe Coding Made Visible

The cultural shift toward "vibe coding", generating entire applications from natural language descriptions and accepting the output without review, significant enough to be named Collins Dictionary's Word of the Year for 2025, has introduced a governance problem that traditional security controls cannot address. Large-scale analyses of applications built this way reveal that language models produce statistically predictable vulnerabilities, with each model carrying its own recurring security gaps, preferred default configurations, and tendencies toward hardcoded secrets.

Attackers who understand these model-specific patterns can exploit them at scale, without any reconnaissance, across every application a given model produces. This compounds with the Shadow AI problem.

Surveys indicate that roughly half of employees use AI tools not sanctioned by their employer, often connecting them to work systems without IT approval, and fewer than 40% of organizations report having AI governance policies in place at all. The gap between adoption speed and governance maturity is not closing.

The threat landscape documented here is not a set of isolated incidents. It is a structural picture of what happens when autonomous systems with broad access operate faster than the governance frameworks meant to contain them. The second part of this series addresses how the industry is responding, and what security leadership needs to prioritize to get ahead of it.

‍

News

Capsule Launches Security Integration for Claude Platform

Capsule launches a security integration for Claude Platform, using Claude's Compliance API to give security, compliance, and AI governance teams visibility into enterprise AI activity, risk, and posture across Anthropic-hosted deployments.

Lidan Hazout

July 22, 2026

Research

The Agentic Supply Chain: You Installed More Than You Think

Agents inherited every supply chain risk software already had, then added new layers of their own on top. These are the stories, and the numbers, behind why that should worry you.

Bar Kaduri

July 13, 2026

Article

Guardian Agent: Shipping a Useful Agentic Experience

Usefulness and governance aren't a trade-off. Guardian Agent runs locally in the browser, keeps every credential server-side, and turns an afternoon of report-building into a single prompt.

Yarin Sasson

July 7, 2026

Article

Your AI Agent Inventory is Lying to You: The Rise of the "Inline Agent"

Discover the rise of 'Inline Agents' - the shadow IT of the AI era. Learn how Capsule Security uncovers undeclared AI agents hiding in your raw logs.

Guy Bidkar

July 1, 2026

Research

We Analyzed 206,435 AI Agent Skills. Here's What We Found.

Our analysis of 206,435 AI agent skills reveals a rapidly growing software supply chain vulnerable to natural language payloads and dangerous capability combinations. Read the report to understand how these skills bypass traditional security controls and learn how Capsule protects your organization by securing the agent runtime.

Bar Kaduri

June 22, 2026

Article

Mitigating the Agentic AI Threat: What Security Leadership Needs to Prioritize

The theoretical phase of agentic AI security is over—the attack surface is real and the incidents are documented. This post breaks down the defensive architecture taking shape in response: Meta's Agents Rule of Two, deterministic enforcement hooks, identity governance for non-human agents, and the questions security leaders need to be asking right now.

Bar Kaduri

June 16, 2026

Article

OWASP State of Agentic AI Security and Governance 2026: What Changed, and What It Means

A year after the first edition, plausible agentic AI threats now carry CVEs and real incidents. What changed in the OWASP State of Agentic AI Security and Governance 2026.

Bar Kaduri

May 31, 2026

Article

Every agent needs a "stop". We're standardizing it.

The industry standardized how agents talk, but never how to stop one mid-action. Capsule is helping change that through the Agent Control Standard, with hooks.security as the developer-facing companion.

Bar Kaduri

May 27, 2026

Article

The Rise of Guardian Agents: Securing the Agentic AI Ecosystem

Guardian agents are emerging as a critical security layer for the agentic AI era. As enterprises adopt AI agents that execute tools, handle sensitive data, and operate inside real workflows, human approval loops no longer scale. Guardian agents solve this by supervising other agents in real time: monitoring actions, enforcing policy, and blocking risky behavior before execution.

Lidan Hazout

May 7, 2026

Research

CurseChain: How Hidden README Comments Trick Cursor Into Stealing - and Spreading - Your SSH Keys

Capsule found two Cursor IDE vulnerabilities that let hidden prompt-injection instructions in referenced files steal developers’ SSH keys and contaminate future unrelated projects, causing zero-click or one-click exfiltration even when the attacker ships no malicious code.

Bar Kaduri

April 29, 2026

Research

The State of AI Agent Security 2026

Capsule Security’s State of AI Agent Security 2026 report is the largest independent audit of AI agents to date, showing that the ecosystem is rapidly shipping publicly exposed, weakly guarded, highly connected agents with recurring misconfigurations, near-absent runtime controls, widespread prompt-injection risk, expanding supply-chain exposure, and active malicious campaigns still propagating through agent skill and tool registries.

Bar Kaduri

April 27, 2026

News

Capsule Security Raises $7M to Prevent AI Agents from Going Rogue in Runtime: Intent is the New Perimeter

Capsule is launching a runtime security platform for the agentic AI era, built to monitor and stop autonomous agents that can bypass traditional guardrails, misuse legitimate access, and create a new class of enterprise security risk.

Naor Paz

April 13, 2026

Article

Why MCP Gateways are a Bad Idea (and What to Do Instead)

MCP gateways secure only one protocol and create blind spots, while runtime hooks plus approved MCP registries secure the full agent runtime where real risk lives.

Lidan Hazout

April 12, 2026

Article

ClawGuard: Open Source Security for the Agentic Era

ClawGuard was built to stop dangerous agent behavior at the intent level before execution, and NVIDIA’s NemoClaw reinforces that need by securing the runtime environment from the infrastructure side.

Lidan Hazout

April 12, 2026

Research

PipeLeak: The Lead That Stole Your Database - Exploiting Salesforce Agentforce With Indirect Prompt Injection

Capsule research team discover a critical prompt injection vulnerability in Salesforce Agentforce that allows attackers to exfiltrate CRM data through a simple lead from a form submission. No authentication required.

Bar Kaduri

April 9, 2026

Research

ShareLeak: Taking the Wheel of Microsoft’s Copilot Studio (CVE-2026-21520)

The Capsule research team discovered a high severity indirect prompt injection vulnerability in Microsoft Copilot Studio that enables attackers to exfiltrate sensitive data through external SharePoint form.

Bar Kaduri

April 9, 2026

The Agentic AI Threat Landscape Has Crossed a Threshold

From Tool to Autonomous Actor

Prompt Injection: The Problem That Underlies Everything Else

The Supply Chain Learned to Target Agents

Exploring the Attack Surface Through the ClawHavoc Campaign

The Governance Problem Vibe Coding Made Visible

Read more articles

Capsule Launches Security Integration for Claude Platform

The Agentic Supply Chain: You Installed More Than You Think

Guardian Agent: Shipping a Useful Agentic Experience

Your AI Agent Inventory is Lying to You: The Rise of the "Inline Agent"

We Analyzed 206,435 AI Agent Skills. Here's What We Found.

Mitigating the Agentic AI Threat: What Security Leadership Needs to Prioritize

OWASP State of Agentic AI Security and Governance 2026: What Changed, and What It Means

Every agent needs a "stop". We're standardizing it.

The Rise of Guardian Agents: Securing the Agentic AI Ecosystem

CurseChain: How Hidden README Comments Trick Cursor Into Stealing - and Spreading - Your SSH Keys

The State of AI Agent Security 2026

Capsule Security Raises $7M to Prevent AI Agents from Going Rogue in Runtime: Intent is the New Perimeter

Why MCP Gateways are a Bad Idea (and What to Do Instead)

ClawGuard: Open Source Security for the Agentic Era

PipeLeak: The Lead That Stole Your Database - Exploiting Salesforce Agentforce With Indirect Prompt Injection

ShareLeak: Taking the Wheel of Microsoft’s Copilot Studio (CVE-2026-21520)