OpenAI's GPT-5 Guide: Agentic Workflows & Coding Power

Decoder

OpenAI has unveiled an extensive prompting guide for its latest large language model, GPT-5, offering detailed insights into leveraging its capabilities for agentic workflows and advanced coding applications. This guide, which incorporates lessons learned from the integration of the Cursor code editor, highlights GPT-5’s foundational training for sophisticated tool use, precise instruction following, and comprehension of extremely long contexts, making it an ideal candidate for building autonomous AI agents.

For agentic applications—where AI models take initiative and perform multi-step tasks—OpenAI recommends the new Responses API. This API is designed to preserve the model’s internal reasoning processes between successive tool calls, significantly enhancing both efficiency and output quality. OpenAI’s data indicates a notable improvement: simply switching from traditional Chat Completions to the Responses API and passing previous reasoning using the “previous_response_id” parameter boosted Tau benchmark scores in trading from 73.9% to 78.2%. Maintaining this reasoning context not only conserves processing tokens but also ensures plans are consistently followed across multiple tool interactions, leading to better performance and reduced latency.

The degree of GPT-5’s “agentic eagerness”—its propensity to take initiative—can be finely tuned through prompt engineering and a new “reasoning_effort” parameter. Lowering this parameter reduces the model’s autonomy, while setting clear criteria for context searches and limiting the number of tool calls (for instance, to two) provides greater control, including options for the model to proceed even when some uncertainty remains. Conversely, to encourage more initiative, the guide suggests increasing the reasoning effort and adding explicit instructions for persistence to minimize unnecessary clarifying questions. It also advises establishing clear stop conditions, distinguishing between safe and risky actions, and defining thresholds for when tasks should be handed back to a human user. For example, a lower threshold for user intervention is recommended in sensitive scenarios like purchase or payment flows compared to a simple search, and deleting files in programming tasks should require far more caution than basic text searches. For longer, multi-stage tasks, GPT-5 is trained to outline its plan at the outset and then provide concise progress updates. The frequency, style, and content of these updates are fully customizable via the prompt, ranging from simple goal paraphrasing to structured plans, sequential status messages, and comprehensive final reports. OpenAI further recommends breaking down highly complex tasks into smaller, manageable subtasks across multiple agent rounds.

OpenAI positions GPT-5 as a robust assistant for software development, capable of handling large codebases, debugging, processing major code changes, performing multi-file refactoring, implementing significant new features, and even generating entire applications from scratch. For new web application development, OpenAI suggests a specific technology stack including Next.js (TypeScript), React, HTML, Tailwind CSS, shadcn/ui, Radix Themes, popular icon sets, the Motion animation library, and various modern fonts. For new “greenfield” projects, the guide proposes a prompt pattern where the model first establishes an internal set of quality criteria (typically five to seven categories) and then iteratively refines its output until all criteria are fully met. When making incremental changes or refactoring existing code, GPT-5’s modifications are designed to integrate seamlessly. The guide emphasizes the importance of explicitly mirroring the existing technical setup of the codebase, including its guiding principles, directory structure, and UI/UX rules. OpenAI provides example principles like clarity, reuse, consistency, simplicity, and visual quality, along with stack standards and UI/UX guidelines covering typography, colors, spacing, state indicators, and accessibility.

Early testing with the Cursor code editor provided valuable real-world insights into GPT-5’s behavior. Cursor aimed to strike a balance between the model’s autonomy and the conciseness of its status messages during longer tasks. Initially, GPT-5 generated overly detailed status updates while producing overly terse code within tool calls, sometimes using single-letter variable names. Cursor addressed this by setting the global “verbosity” API parameter to low, while simultaneously prompting the model to be more detailed specifically within code tools, instructing it to “Write code for clarity first… Use high verbosity for writing code and code tools.” This approach resulted in compact status and summary messages while ensuring highly readable code changes. The Cursor team also observed that GPT-5 sometimes asked unnecessary follow-up questions. Providing more precise context about undo/reject functions and user preferences helped reduce these interruptions, leading the model to apply changes proactively and submit them for review rather than seeking prior approval. Another key insight was that prompts effective with earlier models sometimes triggered an excessive number of tool calls in GPT-5. By reducing these “extra-thoroughness” instructions, GPT-5 became more adept at discerning when to leverage its internal knowledge versus when to utilize external tools. The use of structured, XML-like specifications further improved instruction following, and user-configurable Cursor rules provided additional layers of control.

Beyond “reasoning_effort,” GPT-5 introduces a new “verbosity” API parameter, which controls the length of the final answer independently. While a global verbosity value can be set, it can also be overridden as needed, allowing for concise status messages alongside detailed code outputs, as demonstrated in the Cursor integration. GPT-5 also supports a “minimal reasoning” mode, designed for maximum speed while retaining the benefits of its underlying reasoning paradigm. OpenAI recommends prompts for this mode that begin with a brief rationale, include clear status updates before tool calls, provide explicit and persistent tool instructions, and encourage the agent to complete tasks fully before handing them back. For users migrating from GPT-4.1, OpenAI points to patterns outlined in its previous guide. However, OpenAI cautions that GPT-5 is extremely literal in its instruction following, and vague or contradictory prompts can disrupt its reasoning processes. To assist users in avoiding these pitfalls, OpenAI provides access to its Prompt Optimizer, a tool designed to flag inconsistencies and unclear instructions.