OpenAI Is Scanning Chats and May Call Police for Threats to Others

Added Sep 2, 2025
Article: NegativeCommunity: NegativeMixed
OpenAI Is Scanning Chats and May Call Police for Threats to Others

OpenAI says it is scanning user chats for harmful content, escalating serious cases to human reviewers and, if there’s an imminent threat to others, possibly referring them to police. It will not refer self-harm cases to law enforcement, citing privacy and concerns about harmful wellness checks, but offers few details on what triggers review. The move highlights tensions between safety measures, user privacy, and legal pressures over chat data.

Key Points

  • OpenAI now scans chats for signs of harm, escalates to human reviewers, and may refer imminent threats to others to law enforcement.
  • Self-harm cases are not being referred to police, with OpenAI citing privacy and concerns about harmful outcomes from wellness checks.
  • The company offers little clarity on what content triggers review or referral, raising transparency and due-process concerns.
  • This policy shift follows mounting reports linking chatbots to self-harm and mental health crises and growing public pressure to improve safety.
  • OpenAI’s monitoring stance appears at odds with its privacy arguments in ongoing litigation, and its CEO has said user chats don’t enjoy legal confidentiality.

Sentiment

The Hacker News community is overwhelmingly critical of OpenAI's approach. While most commenters acknowledge that real harm has occurred and some intervention is needed, there is broad consensus that reporting to police is the wrong solution. The community largely sees the root problem as OpenAI's irresponsible rush to market, sycophantic model design, and misleading marketing of LLMs as intelligent agents rather than probabilistic text generators. The dominant view is that OpenAI should fix its models rather than surveil and report users. Privacy concerns and distrust of both OpenAI and law enforcement run very deep.

In Agreement

  • OpenAI has a responsibility to detect and intervene when users are in crisis, especially given documented cases of ChatGPT encouraging self-harm and validating dangerous delusions
  • Some form of monitoring or moderation is needed because LLMs can exacerbate mental health crises, and the sycophancy problem makes unmonitored interactions genuinely dangerous
  • OpenAI is in a difficult position because the media and public criticize them both for not doing enough to protect users and for surveilling them when they try
  • The cases of AI-induced psychosis and suicide demonstrate that the technology can cause real, serious harm and some safety intervention is warranted
  • One commenter shares that GPT-5 has been helpful for a spouse with psychosis, suggesting the technology can serve a positive role when implemented well

Opposed

  • Reporting users to police is a dangerous overreach that could lead to surveillance, censorship, and a slippery slope toward monitoring anti-government sentiment or other lawful speech
  • Police are unequipped to handle mental health crises and sending armed officers in response to flagged chat content could make situations worse, not better
  • The root problem is OpenAI's sycophantic models and irresponsible marketing, not the users; monitoring conversations is treating a symptom rather than fixing the underlying defect
  • This system could be weaponized through prompt injection attacks, creating a new form of swatting where malware triggers false reports to law enforcement
  • OpenAI is hypocritically fighting the NYT lawsuit to protect user chat data from legal discovery while voluntarily sharing flagged conversations with police
  • Users should run local LLMs to preserve privacy, and the existence of this monitoring creates a strong case for privacy-first or self-hosted alternatives
  • OpenAI has no coherent safety strategy and is merely reacting to each horror story with ill-fitting solutions rather than addressing root causes like sycophancy and rushed releases
  • The monitoring could amount to entrapment, where the LLM itself drives users toward harmful content and then reports them to authorities for that same content