AI Agents Can Already Do Lean Proofs—But They Still Need a Project Manager

Dodds shows that Claude Code, a general-purpose AI coding agent, can competently assist with Lean interactive theorem proving across planning, refactoring, and proof-writing. In a real formalization of his 2009 paper, the agent produced thousands of lines of Lean but still required a human manager and was often slower than doing it by hand, with rare but costly conceptual errors. Despite limitations, this unexpected capability suggests the “bitter lesson” for formal methods: AI agents may soon make theorem proving cheap and routine, shifting expert focus to design and verification strategy.

Key Points

Claude Code can already perform substantial Lean ITP work across the full proof-engineering lifecycle, not just single-theorem proving.
Lean’s strictness and rich error feedback make it unusually amenable to agentic iteration: run, fail, read diagnostics, and self-correct.
In practice, the agent needs a human ‘project manager’ and is currently slower than a skilled human, with rare but costly conceptual errors.
Tooling matters: integrations like lean-mcp-lsp noticeably improve the agent’s effectiveness; future speed and multi-agent setups could help a lot.
This signals the “bitter lesson” for formal methods: general AI agents may commoditize theorem proving, shifting the value to higher-level proof design and auditing.

Sentiment

The community is thoughtfully engaged and sees genuine theoretical promise in combining AI with formal verification, but maintains healthy skepticism about practical value. The tone is constructive rather than dismissive, with domain experts providing nuanced perspectives on both sides. Most agree the intersection is interesting but question whether it's truly faster or better than traditional approaches.

In Agreement

AI and formal verification are naturally complementary: LLMs reduce implementation effort while theorem provers provide rigorous review, creating a productive combination
Lean's strict type system and detailed compiler feedback make it an ideal environment for AI agents, extending similar observations from Rust programming
Proof engineering's "if it typechecks, you're done" nature and support for top-down decomposition make it better suited to AI assistance than general software engineering
Better error messages, linters, and static checking amplify AI effectiveness—investing in programmatic feedback loops is high-value
The result is genuinely surprising and promising for a general-purpose coding agent not specifically trained for theorem proving

Opposed

The "50% done" claim triggers well-founded skepticism, as AI consistently struggles with the last 20% of projects as complexity grows
The author admitted the process was slower than doing it by hand and explicitly avoided learning Lean, undermining the practical value proposition
Curry-Howard guarantees proof correctness but not specification correctness—the hardest part of formal verification is ensuring specifications capture the right properties
Formal proofs assume static requirements and fixed code, making them impractical for the vast majority of real-world projects with evolving needs
Community collaboration and human learning offer more durable value than AI shortcuts, which can become a crutch that prevents genuine understanding