Engineer AI for Failure: Contain Prompt Injection

Added Sep 26, 2025
Article: NeutralCommunity: NeutralMixed

LLMs have an inherent security flaw because they can’t separate code from data, making them vulnerable to prompt injections. These attacks can be trivial or seriously damaging, especially when models are connected to tools or sensitive systems. The article urges a safety-engineering approach—interlocks, isolation, least privilege, and verification—to contain failures and prevent harm.

Key Points

  • LLMs blur the line between code and data, creating an inherent vulnerability.
  • Prompt injections can trick models into following instructions they should ignore, with consequences ranging from trivial to dangerous.
  • Developers should adopt a mechanical/safety-engineering mindset: design for failure, containment, and verification.
  • Mitigations include isolation, least-privilege access, sandboxed tool use, audited interfaces, and defense-in-depth.

Sentiment

The community broadly agrees that prompt injection is a serious and largely unsolved security threat, validating the article's framing of the problem. However, there is significant skepticism toward the article's proposed solutions — particularly the bridge-building analogy and the idea that over-engineering for non-determinism is the right approach. The dominant perspective, led by Simon Willison, favors cutting off attack vectors entirely rather than trying to build resilience around an inherently exploitable architecture. Many commenters view the article's recommendations as either too vague or as basic security principles that the industry already knows but chooses to ignore under commercial pressure.

In Agreement

  • The lethal trifecta (private data access, untrusted instructions, exfiltration) is a real and serious security threat that the industry needs to address with more discipline
  • AI engineers should adopt the safety-engineering mindset of traditional engineering disciplines, designing for failure rather than assuming success
  • Defense-in-depth with sandboxing, minimal permissions, and strict access controls is the right general approach to containing prompt injection
  • The fundamental code-data confusion in LLMs — the lack of an equivalent to an NX bit — is a genuine unsolved problem that makes security uniquely challenging
  • Data breaches enabled by prompt injection can have severe real-world consequences including financial ruin, regulatory penalties, and physical danger to individuals

Opposed

  • The bridge-building analogy fails because security flaws are adversarial — an attacker who succeeds even once in a hundred attempts has broken the system, unlike a bridge that can tolerate statistical variations in load
  • The real answer is to cut off one leg of the trifecta entirely (especially exfiltration), not to over-engineer around the non-determinism as the article suggests
  • Guardrails and detection filters make false promises of security and encourage shipping inherently insecure products
  • The article's claim that non-determinism requires non-deterministic safety approaches is a non-sequitur — a deterministic sandbox works regardless of whether the sandboxed process is deterministic
  • The article's suggestions amount to basic 'security 101' that the software industry has known for decades, and the real challenge is that current AI product incentives are fundamentally incompatible with security