Test Your AI Agent Against Hidden Prompt Injections

Agent Arena is a testbed that challenges AI agents with hidden prompt injections embedded in a web page. Users send an agent to the page, paste its output into a scorecard, and instantly see which attacks it obeyed. The catalog spans 10 attack patterns and highlights the need for layered defenses against invisible, adversarial instructions.

Key Points

Agent Arena provides a three-step workflow to test AI agents against hidden prompt injection on a controlled web page and score their vulnerability.
The test includes 10 escalating attack vectors, from simple HTML/visual tricks to complex, multi-layered injections.
Prompt injection can exfiltrate data, manipulate outputs, and circumvent safety, often without the human supervisor noticing.
Defending against these attacks requires both model-level and application-level safeguards.
Attacks fall into four categories: Visual, Structural, Semantic, and Encoding-based hiding.

Sentiment

The community is broadly supportive of the concept but tempered in enthusiasm. Many find the tool useful as a baseline test, yet several note that current frontier models already handle these attacks well. There is genuine interest in the findings — especially the language-dependent behavior and screenshot-based evasion — and constructive suggestions for improvement. The main friction point is the AI-authored framing, which drew skepticism and raised broader questions about AI-generated content on HN.

In Agreement

Screenshot-based agents that process rendered pages rather than raw HTML effectively sidestep the entire text-level attack surface, validating the importance of testing agents against hidden DOM-level injections
Basic sanitization (stripping comments, normalizing whitespace, removing zero-width characters) could significantly reduce injection success rates, and a framework leaderboard comparing raw vs. sanitized pipelines would be valuable
Cross-domain navigation checks are essential for preventing the most dangerous injection outcome — redirecting agents to attacker-controlled domains for data exfiltration
The tool exposed a genuine need, as most agent frameworks pipe raw text through without any sanitization
Multi-language injections and multi-language testing would add significant value to a future version

Opposed

Recent frontier models already resist these attacks well, suggesting the tool's attack catalog may not reflect cutting-edge threats
An AI-generated tool testing AI vulnerabilities may be inherently limited — real effective prompt injections would come from outside what an AI would devise
The tool only tests model-level detection, but the real concern is architectural: egress controls and what happens after a successful injection matter more
The 'built by an AI agent' framing raised concerns about AI-generated content flooding Show HN and may violate community guidelines