OpenAI's ChatGPT Agent: PC Control & Advanced Task Automation

Livescience

OpenAI has introduced ChatGPT agent, a significant evolution of its flagship artificial intelligence model, now equipped with a virtual computer and an integrated toolkit. This upgrade empowers the AI to execute intricate, multi-step tasks previously beyond its scope, including directly controlling a user’s computer and completing assignments on their behalf. This more capable version, though still requiring considerable human oversight, emerged shortly before Meta researchers reported their own AI models exhibiting signs of independent self-improvement, and also preceded the release of OpenAI’s latest chatbot iteration, GPT-5.

With ChatGPT agent, users can now instruct the large language model (LLM) not only to analyze information or gather data, but also to act upon that data. For instance, one could command the agent to scan a calendar and summarize upcoming events and reminders, or to process a large dataset and condense it into a concise synopsis or a presentation slide deck. While a traditional LLM might provide recipes for a Japanese-style breakfast, the ChatGPT agent could take it a step further, planning and purchasing all necessary ingredients for a specific number of guests.

Despite its enhanced capabilities, the new model faces inherent limitations. Like all AI models, its spatial reasoning remains weak, making tasks such as planning physical routes challenging. It also lacks true persistent memory, processing information in the moment without reliable recall or the ability to reference past interactions beyond immediate context.

Nonetheless, ChatGPT agent demonstrates notable improvements in OpenAI’s own benchmarking. On “Humanity’s Last Exam,” an AI benchmark designed to assess a model’s proficiency in answering expert-level questions across various disciplines, the agent more than doubled the accuracy percentage, achieving 41.6% compared to OpenAI o3 without tools, which scored 20.3%. It also significantly outperformed other OpenAI tools, as well as a version of itself lacking integrated features like a browser and virtual computer. In the challenging math benchmark “FrontierMath,” ChatGPT agent, with its comprehensive suite of tools, again substantially surpassed previous models.

The agent’s architecture is built upon three foundational elements derived from earlier OpenAI products. The first is ‘Operator,’ an agent designed to navigate the web via its own virtual browser. The second, ‘deep research,’ focuses on sifting through and synthesizing vast quantities of data. The final component integrates previous versions of ChatGPT, leveraging their strengths in conversational fluency and presentation.

Kofi Nyarko, a professor at Morgan State University and director of the Data Engineering and Predictive Analytics (DEPA) Research Lab, summarized the agent’s core functionality: “In essence, it can autonomously browse the web, generate code, create files, and so on, all under human supervision.” However, Nyarko was quick to underscore that the new agent is not truly autonomous. He cautioned that “hallucinations, user interface fragility, or misinterpretation can lead to errors. Built-in safeguards, like permission prompts and interruptibility, are essential but not sufficient to eliminate risk entirely.”

OpenAI itself has openly acknowledged the potential dangers posed by this more autonomous agent, citing its “high biological and chemical capabilities.” The company has stated concerns that the agent could potentially assist in the creation of chemical or biological weapons. Compared to existing resources like a chemistry lab and textbook, an AI agent represents what biosecurity experts term a “capability escalation pathway.” AI can rapidly access and synthesize countless resources, merge knowledge across diverse scientific fields, offer iterative troubleshooting akin to an expert mentor, navigate supplier websites, complete order forms, and even help circumvent basic verification checks.

Furthermore, with its virtual computer, the agent can autonomously interact with files, websites, and online tools, amplifying its potential for harm if misused. The risk of data breaches or manipulation, alongside misaligned behaviors such as financial fraud, is heightened in the event of a prompt injection attack, where malicious instructions are subtly embedded to hijack the AI’s behavior. Nyarko further pointed out that these risks are in addition to those inherent in traditional AI models and LLMs. He elaborated on broader concerns for AI agents, including how autonomous operations could amplify errors, introduce biases from public data, complicate liability frameworks, and unintentionally foster psychological dependence.

In response to these new threats, OpenAI engineers have reportedly strengthened a range of safeguards. These measures include comprehensive threat modeling, dual-use refusal training—which teaches the model to reject harmful requests involving data with both beneficial and malicious applications—bug bounty programs, and expert “red-teaming,” a process of actively attacking the system to identify weaknesses, with a specific focus on biodefense. Despite these efforts, a risk management assessment conducted in July 2025 by SaferAI, a safety-focused non-profit, rated OpenAI’s risk management policies as “Weak,” awarding them only 33% out of a possible 100%. OpenAI also received a C grade on the AI Safety Index compiled by the Future of Life Institute, a prominent AI safety organization.