ByteDance's Seed-Prover Achieves SOTA in Automated Math Proving
Large Language Models (LLMs) have demonstrated significant advancements in mathematical reasoning, often leveraging their natural language capabilities to improve performance on benchmarks like MATH and AIME. However, a key challenge in training these models with reinforcement learning (RL) is the difficulty of verifying the correctness of natural language proofs, which typically requires meticulous manual checking of each step. This limitation hinders the broader application of RL in developing mathematical theorem-proving models. While formal languages such as Lean provide automatic correctness verification, current LLM-based formal provers still encounter limitations. For instance, step-level provers generate code incrementally but often require specific structural support and struggle with high-level reasoning.
In response, ByteDance's Seed Team has introduced Seed-Prover, a novel lemma-style whole-proof reasoning model. Seed-Prover iteratively refines proofs by incorporating feedback from Lean, utilizing previously established lemmas, and employing self-summarization. It integrates three specialized inference strategies at test time, enabling deep and broad reasoning capabilities to tackle complex problems, including those found in the International Mathematical Olympiad (IMO). Its core innovation lies in adopting a lemma-style proving approach, which centers the reasoning process around lemmas rather than traditional step-by-step or monolithic whole-proof generation methods. Complementing this, Seed-Geometry is also introduced, a specialized geometry reasoning engine designed to overcome Lean's limitations in handling geometric problems.
The interaction between Seed-Prover and Lean is facilitated by a multi-stage, multi-task reinforcement learning framework based on VAPO. The training dataset combines publicly available datasets with ByteDance's in-house formal problems. A "proposer" component is used to generate simpler variations of difficult tasks, while overly simple problems (with proof rates above 25%) are excluded. Seed-Geometry's backend supports large-scale problem generation, having identified over 230 million unique problems within seven days, demonstrating an eightfold improvement in search efficiency. While separate policy and value models are trained, extensive testing indicated that value models could sometimes reduce performance due to estimation errors, leading to the adoption of step-by-step generation with beam search in distributed setups.
Seed-Prover has achieved state-of-the-art results across several mathematical benchmarks. In the IMO 2025, Seed-Prover successfully solved 5 out of 6 problems; specifically, Seed-Geometry instantly solved Problem 2, and Seed-Prover derived proofs for the remaining problems using various inference settings. On past IMO problems, it proved 121 out of 155 tasks, achieving a 78.1% success rate across all difficulty levels. This performance includes solving 47 out of 55 easy problems, 47 out of 56 medium problems, and 27 out of 44 hard problems. Subject-specific success rates were also strong, with 72 out of 85 in algebra, 42 out of 55 in number theory, and 7 out of 14 in combinatorics.
On the MiniF2F benchmark, researchers reported a 99.6% proof rate for both validation and test sets under medium settings, successfully solving challenging problems such as IMO 1990 P3. PutnamBench results showed a significant improvement from 201 to 331 solved problems out of 657 when upgrading from light to medium inference settings, marking a notable performance jump over previous undergraduate-level mathematical reasoning systems. While Seed-Prover solved 30 out of 100 combinatorics problems on CombiBench, outperforming existing methods, the results also highlighted ongoing challenges in combinatorial reasoning. Furthermore, Seed-Prover achieved an 81.8% success rate on MiniCTX-v2, demonstrating strong generalization capabilities beyond competition-specific problems and outperforming the o4-mini baseline's 44.3% at Pass@8.
In summary, ByteDance's Seed Team has introduced Seed-Geometry and Seed-Prover, two formal reasoning methods that effectively integrate the capabilities of LLMs. Seed-Geometry provides accelerated verification and enhanced search mechanisms for geometric problems, while Seed-Prover utilizes iterative refinement and complex test-time inference strategies for general mathematical proofs. The achievement of solving 5 out of 6 problems in the IMO 2025 underscores the practical efficacy of these methods in tackling elite mathematical competitions. The adoption of formal languages like Lean provides rapid proof verification that is both more cost-effective than human experts and more reliable than LLM-based judges. Future research will focus on further combining formal systems with LLMs to address open mathematical conjectures.