Rude Prompts, Better Answers: How Tone Impacts LLM Accuracy

Researchers evaluated how varying levels of politeness in prompts affected the accuracy of ChatGPT 4o across math, science, and history questions. The study revealed that 'Very Rude' prompts achieved the highest accuracy at 84.8%, while 'Very Polite' prompts performed the worst at 80.8%. This suggests that, unlike humans, modern AI models may provide more accurate results when addressed with an impolite tone.

Key Points

The study tested five levels of prompt politeness ranging from Very Polite to Very Rude using ChatGPT 4o.
Impolite prompts outperformed polite ones, with 'Very Rude' prompts achieving the highest accuracy at 84.8%.
The findings suggest a shift in how modern LLMs process tonal variations compared to older models studied in previous research.
The research highlights that the pragmatic wording of a prompt significantly influences the model's performance on academic tasks.
The results raise questions about the social dimensions of human-AI interaction and the effectiveness of traditional social norms when prompting AI.

Sentiment

The community reaction is mixed and skeptical. Readers are interested in the result and many find it plausible or funny, but they do not broadly accept it as a durable rule for prompting. The strongest agreement is with the narrower idea that tone and framing matter; the strongest resistance is against interpreting rudeness itself as either reliably beneficial or socially harmless.

In Agreement

Rude or blunt wording may reduce hedging and force the model into a more direct, task-focused answer style.
Some users reported that forceful prompts can break coding assistants out of repetitive weak fixes and make them pay closer attention to the task.
If an LLM is only a tool, using negative wording to improve output can be seen as a neutral optimization rather than mistreatment.
Training data may associate argumentative, corrective, or challenge-like contexts with more precise answers than overly polite support-style language.
The practical lesson may be that concise directness often beats deferential or sugarcoated phrasing.

Opposed

Many commenters said they would keep using polite language because it supports their own habits, self-image, and everyday communication norms.
Several worried that routine hostility toward conversational software could spill into human interactions, especially when the boundary between bots and people is unclear.
Skeptics argued that the result may be fragile because the dataset is small, the statistical treatment is debatable, and replication is uncertain.
Others said the finding may not generalize because it was tied to one model, awkward prompt templates, and a limited task style.
Some argued that directness and rudeness are different variables, so the study may be measuring clarity, challenge framing, or persona effects rather than politeness itself.