Tencent's AI transforms images into interactive gaming videos

Decoder

Tencent has unveiled Hunyuan-GameCraft, an advanced artificial intelligence system designed to transform static images into interactive gaming videos. This innovative platform stands apart from conventional video generators, which typically produce fixed clips, by empowering users with real-time camera control. Players can navigate generated scenes freely using standard keyboard inputs like WASD or arrow keys, experiencing dynamic movement through AI-rendered environments. The system is built upon Tencent’s open-source text-to-video model, HunyuanVideo, and is specifically engineered to deliver exceptionally smooth and consistent camera motion.

The framework supports a comprehensive range of camera movements, including three axes of translation—forward/backward, left/right, and up/down—along with two axes of rotation for looking around. Notably, the ability to roll the camera has been intentionally omitted, a design choice Tencent highlights as uncommon in most games. Key to its interactivity is an “action encoder,” which translates keyboard input into numerical values that the video generator can interpret. This encoder also accounts for the duration of key presses, allowing for adaptive movement speeds.

To maintain high video quality over extended sequences, GameCraft employs a sophisticated training technique known as Hybrid History-Conditioned Training. Rather than attempting to generate an entire video at once, the model constructs each new video segment incrementally, drawing upon previously generated segments. Videos are subdivided into approximately 1.3-second chunks. A “binary mask” within the system differentiates between existing parts of each frame and those that still need to be generated, striking a balance between consistency and flexibility. Tencent states that this hybrid approach overcomes the visible quality drops often associated with training-free methods and the reduced responsiveness found in pure history conditioning, yielding videos that remain both fluid and consistent while reacting instantly to user input, even during prolonged sessions.

Hunyuan-GameCraft’s remarkable capabilities are rooted in its extensive training dataset, which comprises over one million gameplay recordings sourced from more than 100 AAA titles, including critically acclaimed games like Assassin’s Creed, Red Dead Redemption, and Cyberpunk 2077. Within this vast collection, scenes and actions were meticulously segmented, filtered for quality, annotated, and structured with detailed descriptions. Developers further enriched the dataset by creating an additional 3,000 motion sequences derived from digital 3D objects. The training process itself was a significant undertaking, conducted in two phases across 192 Nvidia H20 GPUs, spanning 50,000 iterations.

In head-to-head performance evaluations, Hunyuan-GameCraft demonstrated superior results. It achieved a 55 percent reduction in interaction errors when compared to Matrix-Game and delivered enhanced image quality and more precise control than specialized camera control models such as CameraCtrl, MotionCtrl, and WanX-Cam.

To ensure the system’s practicality for real-time interactive experiences, Tencent integrated a Phased Consistency Model (PCM). This innovation significantly accelerates video generation by allowing the system to bypass intermediate steps of the typical diffusion process, jumping directly to plausible final frames. This optimization boosts inference speed by 10 to 20 times, enabling GameCraft to achieve a real-time rendering rate of 6.6 frames per second, with user input response times kept under five seconds. Internally, the system operates at 25 frames per second, processing video in 33-frame segments at 720p resolution, thus striking a crucial balance between speed and visual fidelity essential for interactive control.

The full code and model weights for Hunyuan-GameCraft have been made openly available on GitHub, and a web demonstration is currently under development. This development positions Tencent at the forefront of a rapidly evolving field of interactive AI world models, competing alongside notable systems like Google DeepMind’s Genie 3 and Skywork’s open-source Matrix-Game 2.0. It also represents a significant advancement from Tencent’s earlier work, such as the Hunyuan World Model 1.0, which could generate 3D scenes but was limited to static panoramas, underscoring a notable leap forward in interactive AI experiences.