DeepMind Unveils Genie 3: Real-time Interactive World Models for AGI

Deepmind

Google DeepMind has unveiled Genie 3, a groundbreaking general-purpose world model capable of generating an unprecedented variety of interactive environments. Announced on August 5, 2025, by authors Jack Parker-Holder and Shlomi Fruchter, Genie 3 allows users to navigate dynamic virtual worlds in real time at 24 frames per second, maintaining visual consistency for several minutes at 720p resolution, all from a simple text prompt.

For over a decade, Google DeepMind has been at the forefront of research into simulated environments, ranging from training AI agents in real-time strategy games to developing complex settings for open-ended learning and robotics. This foundational work led to the development of world models—AI systems that leverage their understanding of the world to simulate its various aspects. Such models empower AI agents to predict environmental evolution and the impact of their own actions, serving as a crucial stepping stone towards Artificial General Intelligence (AGI) by enabling the training of agents in an unlimited curriculum of rich simulation environments. Building on the foundation laid by Genie 1 and Genie 2, introduced last year, and advancements in video generation with Veo 2 and Veo 3, Genie 3 marks a significant leap, particularly as DeepMind’s first world model to offer real-time interaction while simultaneously enhancing consistency and realism.

Genie 3 showcases a wide array of capabilities in world generation. It can model the physical properties of the world, simulating natural phenomena like water and lighting, and intricate environmental interactions such as navigating volcanic terrains or experiencing hurricane conditions. The model is also adept at simulating the natural world, creating vibrant ecosystems complete with animal behaviors and detailed plant life, from glacial lakes and dense forests to bioluminescent deep-ocean environments and meticulously designed Japanese zen gardens. Beyond realism, Genie 3 can tap into imagination, generating fantastical scenarios and expressive animated characters, including whimsical creatures on rainbow bridges or origami-style lizards. Furthermore, it allows for the exploration of diverse locations and historical settings, transporting users to the ancient palace of Knossos or the canals of Venice.

Achieving this level of real-time interactivity and environmental consistency required significant technical breakthroughs. Genie 3 must account for a growing trajectory of previously generated frames, referencing information from minutes ago to maintain coherence, even when revisiting locations. This complex computation occurs multiple times per second in response to user inputs. While generating environments auto-regressively typically leads to accumulated inaccuracies, Genie 3 largely maintains consistency for several minutes, with its visual memory extending back up to one minute. Unlike methods relying on explicit 3D representations like NeRFs or Gaussian Splatting, Genie 3’s worlds are dynamically created frame by frame based on world descriptions and user actions, allowing for far greater dynamism and richness.

In addition to navigational controls, Genie 3 introduces “promptable world events,” an expressive form of text-based interaction. This feature enables users to dynamically alter the generated world, for instance, by changing weather conditions or introducing new objects and characters. This capability also expands the scope for counterfactual or “what if” scenarios, proving invaluable for agents learning to handle unexpected situations through experience.

Genie 3 is already being leveraged to fuel embodied agent research. DeepMind has used it to generate worlds for a recent version of its SIMA agent, a generalist agent designed for 3D virtual settings. In these simulated environments, SIMA pursues distinct goals by sending navigation actions to Genie 3, which, unaware of the agent’s specific objective, simulates the future based on the agent’s actions. Genie 3’s ability to maintain consistency over longer horizons allows for the execution of more complex action sequences and the achievement of more intricate goals, a critical development as AI agents are expected to play a greater role in the world and as DeepMind pushes towards AGI.

Despite its advanced capabilities, Genie 3 has acknowledged limitations. These include a currently constrained action space for agents, ongoing challenges in accurately modeling complex interactions between multiple independent agents, and an inability to simulate real-world locations with perfect geographic accuracy. Furthermore, clear and legible text is often generated only when explicitly provided in the input description, and continuous interaction is currently limited to a few minutes rather than extended hours.

DeepMind emphasizes its commitment to responsible development, particularly given the open-ended and real-time nature of Genie 3. The company has collaborated closely with its Responsible Development & Innovation Team to address potential safety and responsibility risks. Genie 3 is being released as a limited research preview, providing early access to a select group of academics and creators. This approach aims to gather crucial feedback and interdisciplinary perspectives to better understand risks and develop appropriate mitigations. DeepMind intends to continue working with the community to ensure the technology is developed responsibly.

Looking ahead, Genie 3 is seen as a significant milestone for world models, poised to impact AI research and generative media broadly. DeepMind is exploring wider availability for additional testers in the future, envisioning applications in education and training, where it could help students learn and experts gain experience. Beyond training autonomous systems and robots, Genie 3 could also facilitate the evaluation of agent performance and the exploration of their weaknesses, all while prioritizing safe and responsible development for the benefit of humanity.