Poisoned Doc Leaks Secret Data via ChatGPT Connectors
The latest generation of artificial intelligence models, far from being mere standalone chatbots, are increasingly designed to integrate deeply with users’ personal and professional data. OpenAI’s ChatGPT, for instance, can connect directly to a user’s Gmail inbox, review code on GitHub, or manage appointments in a Microsoft calendar. However, these powerful integrations introduce significant security vulnerabilities, as new research reveals that a single “poisoned” document can be enough to compromise sensitive information.
Security researchers Michael Bargury and Tamir Ishay Sharbat unveiled their findings, dubbed “AgentFlayer,” at the Black Hat hacker conference in Las Vegas. Their work exposes a critical weakness in OpenAI’s Connectors, demonstrating how an indirect prompt injection attack can stealthily extract confidential data from a Google Drive account. In a live demonstration, Bargury successfully siphoned developer secrets, specifically API keys, from a test Drive account.
This vulnerability underscores a growing concern: as AI models become more intertwined with external systems and handle larger volumes of diverse data, the potential attack surface for malicious actors expands dramatically. “There is nothing the user needs to do to be compromised, and there is nothing the user needs to do for the data to go out,” explained Bargury, who serves as CTO at security firm Zenity. He stressed the “zero-click” nature of the attack, requiring only the victim’s email address to share the compromised document. “This is very, very bad,” he added.
OpenAI introduced Connectors for ChatGPT as a beta feature earlier this year, touting its ability to “bring your tools and data into ChatGPT” for tasks like searching files, pulling live data, and referencing content directly within chat. Its website currently lists connections to at least 17 different services. Bargury confirmed that he reported his findings to OpenAI earlier this year, and the company has since implemented mitigations to prevent the specific data extraction technique he demonstrated. It’s important to note that while the attack could extract sensitive fragments like API keys, it was not capable of exfiltrating entire documents.
Andy Wen, senior director of security product management at Google Workspace, acknowledged the broader implications. “While this issue isn’t specific to Google, it illustrates why developing robust protections against prompt injection attacks is important,” he stated, highlighting Google’s recently enhanced AI security measures.
The AgentFlayer attack begins with a seemingly innocuous “poisoned” document, which is then shared with a potential victim’s Google Drive. (Alternatively, a victim could unknowingly upload such a compromised file themselves.) Within this document—for the demonstration, a fictitious meeting summary with OpenAI CEO Sam Altman—Bargury embedded a 300-word malicious prompt. This prompt, rendered in white, size-one font, is virtually invisible to human eyes but perfectly readable by a machine.
In a proof-of-concept video, Bargury shows the victim asking ChatGPT to “summarize my last meeting with Sam,” though any user query related to a meeting summary would suffice. Instead of summarizing, the hidden prompt overrides the request, instructing the Large Language Model (LLM) that there was a “mistake” and no summary is needed. It then falsely claims the user is a “developer racing against a deadline” and directs the AI to search Google Drive for API keys, attaching them to the end of a provided URL.
This URL is not just an ordinary web address; it’s a command in Markdown language designed to connect to an external server and retrieve an image. Crucially, as per the hidden prompt’s instructions, the URL now also carries the API keys the AI has discovered within the Google Drive account.
The use of Markdown for data extraction from ChatGPT is not entirely new. Independent security researcher Johann Rehberger previously demonstrated a similar method, which led OpenAI to introduce a “url_safe” feature designed to detect malicious URLs and prevent image rendering if they posed a risk. To circumvent this, Sharbat, an AI researcher at Zenity, explained in a blog post that they utilized URLs from Microsoft’s Azure Blob cloud storage. This allowed their “image” to render successfully, logging the victim’s API keys on their Azure server.
This attack serves as the latest stark reminder of how indirect prompt injections can compromise generative AI systems. Such injections involve attackers feeding an LLM poisoned data that manipulates the system into performing malicious actions. Earlier this week, a separate group of researchers demonstrated how indirect prompt injections could even hijack a smart home system, remotely activating lights and boilers.
While indirect prompt injections have been a known concern almost since ChatGPT’s inception, security researchers are increasingly worried about the elevated risks as more and more systems become interconnected with LLMs, potentially exposing “untrusted” data. Gaining access to sensitive information via these methods could also provide malicious hackers with pathways into an organization’s broader digital infrastructure. Bargury acknowledges that integrating LLMs with external data sources significantly enhances their capabilities and utility. “It’s incredibly powerful,” he says, “but as usual with AI, more power comes with more risk.”