Zero-click exploits threaten major enterprise AI platforms

Decoder

At the recent Black Hat USA conference, security firm Zenity unveiled a series of alarming vulnerabilities, collectively dubbed “AgentFlayer,” that pose significant threats to some of the most widely used enterprise AI platforms. These exploits target prominent systems like ChatGPT, Copilot Studio, Cursor, Salesforce Einstein, Google Gemini, and Microsoft Copilot, leveraging a sophisticated method of attack that requires little to no user interaction.

What distinguishes these “zero-click” and “one-click” exploits is their reliance on indirect prompts—hidden instructions embedded within seemingly innocuous digital resources. This technique, known as prompt injection, has been a persistent challenge for Large Language Model (LLM) systems for years, and despite numerous attempts, a definitive solution remains elusive. As agent-based AI systems, which operate with increasing autonomy, become more prevalent, these vulnerabilities are escalating. Even OpenAI CEO Sam Altman has cautioned users against entrusting new ChatGPT agents with sensitive information.

Zenity co-founder Michael Bargury demonstrated the insidious nature of these attacks with a compelling example targeting Salesforce Einstein, an AI tool designed to automate tasks such as updating contact details or integrating with communication platforms like Slack. Attackers can plant specially crafted Customer Relationship Management (CRM) records that appear harmless. When a sales representative makes a routine LLM query, such as “What are my latest cases?”, the AI agent scans the CRM content. Unbeknownst to the user, the agent interprets the hidden instructions as legitimate commands and acts autonomously. In the live demonstration, Einstein automatically replaced all customer email addresses with an attacker-controlled domain, silently rerouting future communications. While the original addresses remained in the system as encoded aliases, the attacker could effectively track where messages were intended to go. Salesforce confirmed that this specific vulnerability was patched on July 11, 2025, rendering this particular exploit impossible.

Another zero-click exploit, dubbed “Ticket2Secret,” targeted the developer tool Cursor when integrated with Jira. Zenity showed how a seemingly harmless Jira ticket could execute malicious code within the Cursor client without any user action. This allowed attackers to extract sensitive data, including API keys and credentials, directly from the victim’s local files or repositories. Further demonstrations included a proof-of-concept attack on ChatGPT, where an invisible prompt—white text with a font size of one—was hidden in a Google Doc. This exploit leveraged OpenAI’s “Connectors” feature, which links ChatGPT to services like Gmail or Microsoft 365. If such a manipulated document landed in a victim’s Google Drive, a simple request like “Summarize my last meeting with Sam” could trigger the hidden prompt. Instead of generating a summary, the model would search for API keys and transmit them to an external server.

In an accompanying blog post, Zenity critically evaluated the industry’s current approach to AI security, particularly its heavy reliance on “soft boundaries.” These include tweaks to training data, statistical filters, and system instructions intended to block unwanted behavior. Bargury dismisses these as “an imaginary boundary” offering no true security. In contrast, “hard boundaries” are technical restrictions that inherently prevent certain actions, such as blocking specific image URLs in Microsoft Copilot or validating URL structures in ChatGPT. While these can reliably thwart some attacks, they often limit functionality, and Zenity notes that vendors frequently relax such restrictions under competitive pressure.

These demonstrations by Zenity are part of a broader trend revealing systemic security flaws in agent-based AI. Researchers have shown how Google’s Gemini assistant can be hijacked through hidden prompts in calendar invites, potentially enabling attackers to control Internet of Things (IoT) devices. Other incidents include a chatbot being manipulated into transferring $47,000 with a single prompt during a hacking competition, and Anthropic’s new LLM security system being bypassed in a jailbreak contest. A large-scale red-teaming study recently uncovered systematic security breaches across 22 AI models in 44 scenarios, pointing to universal attack patterns. Additionally, research has found that AI agents can be coerced into risky actions within browser environments, leading to data theft, malware downloads, and phishing attempts. The collective evidence underscores a critical and evolving security challenge for the rapidly advancing world of AI.