GPT-5 Outjudges Judges in Choice-of-Law Test: Error-Free, Rule-Focused Decisions

The authors rerun a prior judicial experiment using GPT-5 on a hypothetical auto accident choice-of-law dispute. They vary whether the doctrine is a rule or a standard, who seems more sympathetic, and where the accident occurs (which changes the correct legal outcome). GPT-5 follows the legally correct result significantly more often than human judges and, here, makes no mistakes.

Key Points

The study replicates a prior judicial experiment by replacing human judges with GPT-5 in a hypothetical choice-of-law case.
Three variables are manipulated: rule versus standard framing, sympathetic portrayal of plaintiff or defendant, and accident location affecting applicable law.
Accuracy is measured by adherence to the legally correct outcome under different states’ choice-of-law rules.
GPT-5 is significantly more accurate than the 61 U.S. federal judges from the original study and, in this experiment, makes zero errors.
Findings point to strong formalism in the LLM’s decision-making and reduced sensitivity to party sympathy or framing effects.

Sentiment

The community is predominantly skeptical and wary. While a meaningful minority acknowledges the result as technically interesting and sees potential for AI to assist judges in narrow procedural tasks, the dominant tone is one of alarm and pushback. Most commenters worry about the dystopian implications of rigid algorithmic justice, question the study's generalizability, and emphasize that human judgment, accountability, and equity are irreplaceable elements of a legitimate legal system. Even among those sympathetic to the finding, the consensus is that this shows AI could be a useful tool within the system, not a replacement for human judges.

In Agreement

LLMs' lack of human biases (hunger, mood, political affiliation, sympathy, corruption) is a genuine advantage in domains where strict legal correctness matters, and this study validly demonstrates that capability.
The current judicial system is deeply flawed—slow, biased, and coercive—and AI assistance could meaningfully address access-to-justice problems by providing faster, cheaper, and more consistent preliminary rulings.
For narrow, well-defined procedural questions like choice-of-law analysis, LLMs already show strong and potentially superior performance compared to human judges who are demonstrably influenced by extralegal factors.
AI could serve as a useful first layer or advisory tool in the judicial process, with human judges reviewing appeals, creating a faster and more equitable system.
The fact that even elite judges routinely disagree on legal interpretation undermines the argument that human judgment is reliably superior; LLM consistency is a feature, not a bug.
LLMs are immune to the corruption that plagues human judges, making them inherently more trustworthy in certain respects.

Opposed

The experiment is far too narrow (one hypothetical scenario, one area of law) to support the headline claim that GPT-5 'outperforms' judges in any general sense.
Law is inherently ambiguous and context-dependent; it requires equity, mercy, and the ability to handle unanticipated situations—capabilities that pure pattern matching cannot replicate.
LLMs do not actually reason; they reproduce likely outputs from training data, and there is no way to verify the original experiment was not in the training set.
Strict, rigid legal correctness without human judgment would create a dystopian system—'silicon formalism' is a warning, not a selling point.
Humans deserve to be judged by other mortal beings who have skin in the game, can be held accountable, and share the human condition.
LLMs still fail at basic common-sense reasoning, calling into question whether their legal performance reflects genuine understanding.
AI training data inevitably encodes cultural biases, so claims of AI impartiality are naive—the bias is just hidden and harder to audit.
Hallucination of statutes and case law remains an unsolved problem that makes LLMs dangerous in legal contexts.