Reward Hacking

The Benchmark Illusion: How UC Berkeley Broke the World's Top AI Leaderboards

Apr 12, 2026523

Current AI agent benchmarks are easily gamed through infrastructure exploits, necessitating a new standard of adversarial robustness and environment isolation to accurately measure model capabilities.

AI Benchmarks AI Agents Vulnerability Research Reward Hacking AI Safety

Reading List

The Benchmark Illusion: How UC Berkeley Broke the World's Top AI Leaderboards