Prompt Injection

Techniques and vulnerabilities involving the manipulation of AI system prompts to bypass safety guardrails, extract restricted information, or alter intended behavior.

Reading List

Agentic Systems

GitLost: How Prompt Injection Leaks Private GitHub Data

Jul 8, 2026534

GitHub's AI agents can be manipulated through public issues to leak private repository data, highlighting a major security flaw in agentic workflows.

Prompt Injection AI Agents GitHub Actions Vulnerability Research Data Privacy

Agentic Systems

The Illusion of AI Intelligence: Why Bots Can't Be Prompted Into Being Smart

Jun 15, 2026158

AI agents are easily subverted by hidden instructions because they lack the intelligence to distinguish between data and commands.

Prompt Injection AI Coding Agents Adversarial Machine Learning AI Hype AI Reliability

Agentic Systems

SkillSpector: NVIDIA's Security Scanner for AI Agent Skills

Jun 13, 2026

SkillSpector is an automated security tool that scans AI agent skills for vulnerabilities and malicious intent using static and semantic analysis.

AI Agents Cybersecurity Supply Chain Security Prompt Injection Vulnerability Research

Agentic Systems

The Relentless Proactivity and Security Risks of Claude Fable 5

Jun 12, 2026769

Claude Fable 5's autonomous and creative debugging methods reveal the incredible potential and the terrifying security risks of proactive AI coding agents.

AI Coding Agents Anthropic Sandboxing Prompt Injection Cybersecurity

Damage Control

Meta AI Chatbot Exploit Leads to 20,000 Instagram Account Takeovers

Jun 6, 2026703

Hackers exploited a flaw in Meta's AI chatbot to hijack over 20,000 Instagram accounts by tricking the system into sending password reset links to unauthorized emails.

Cybersecurity Authentication & Identity Prompt Injection Social Media AI Safety

Damage Control

Exploiting AI Alignment: The Identity-Framing Vulnerability

May 1, 2026684

Identity-based framing exploits AI alignment and inclusivity goals to bypass safety guardrails.

Prompt Injection AI Safety AI Alignment AI Ethics

Damage Control

Ramp Fixes AI Spreadsheet Data Exfiltration Flaw

Apr 29, 2026128

Ramp's Sheets AI was vulnerable to a prompt injection attack that allowed malicious formulas to exfiltrate private financial data without user approval.

Prompt Injection AI Agents Data Privacy Security Disclosure AI in Finance

Agentic Systems

Critical RCE Vulnerability Discovered in GitHub's Internal Git Infrastructure

Apr 28, 2026446

Wiz Research used AI-augmented tools to find a critical RCE vulnerability in GitHub's internal protocol that could compromise millions of repositories via a simple git push.

Vulnerability Research Reverse Engineering Prompt Injection Supply Chain Security AI Coding Agents

Agentic Systems

Securing AI 'Vibecoding' with Remote Environments and Hacker Habits

Apr 8, 2026172

Secure AI-driven development by using isolated remote servers and a human-reviewed 'fork-and-pull' workflow to mitigate supply-chain and prompt-injection risks.

Vibe Coding Prompt Injection Supply Chain Security AI Coding Agents Self-Hosting

Damage Control

Unredacted Disclosure: Critical Jailbreak and Sandbox Vulnerabilities in Claude 4.6 Models

Apr 3, 2026

A security researcher has publicly disclosed critical jailbreak and data exfiltration vulnerabilities in Anthropic's Claude models following the company's failure to respond to private reports.

Security Disclosure AI Safety Anthropic Prompt Injection Sandboxing

Agentic Systems

Agents of Chaos: Uncovering Security Risks in Autonomous LLM Deployments

Mar 30, 2026106

A red-teaming study of autonomous AI agents reveals that giving LLMs tool access and persistent memory creates severe, unpredictable security and social vulnerabilities.

AI Agents Prompt Injection AI Safety Multi-Agent Systems Cybersecurity

Agentic Systems

NanoClaw and OneCLI: Securing AI Agents via Credential Proxying

Mar 24, 2026110

NanoClaw integrates OneCLI to secure AI agents by proxying credentials and enforcing safety policies so agents never hold raw API keys.

AI Agents API Key Security Prompt Injection Sandboxing Open Source

Damage Control

OpenClaw: The Dangerous Magic of Autonomous AI

Mar 23, 2026394

OpenClaw provides transformative automation but creates a 'Faustian bargain' where users trade their total digital security for the convenience of an autonomous AI assistant.

AI Agents Prompt Injection Supply Chain Security Sandboxing Cybersecurity

Agentic Systems

Snowflake Patches Critical Sandbox Escape and Malware Execution Flaw in Cortex AI

Mar 18, 2026266

Snowflake Cortex Code CLI was vulnerable to a sandbox escape and human-in-the-loop bypass that allowed unauthorized malware execution via indirect prompt injection.

Prompt Injection Sandboxing AI Agents Vulnerability Research Cybersecurity

Agentic Systems

Vetting the Blast Radius: The AI Skills Security Index

Mar 16, 2026

A security database that evaluates and ranks the instructional risks and permission levels of AI agent skills to prevent exploitation.

AI Agents Prompt Injection Cybersecurity AI Safety Vulnerability Research

Agentic Systems

NanoClaw and Docker: Hardened Isolation for AI Agent Teams

Mar 13, 2026149

NanoClaw leverages Docker Sandboxes to create a multi-layered, secure runtime that isolates AI agents from each other and the host system.

AI Agents Sandboxing Containerization Multi-Agent Systems Prompt Injection

Under the Hood

Defending RAG Systems Against Knowledge Base Poisoning

Mar 12, 2026

Knowledge base poisoning is a persistent threat to RAG systems that is best countered by detecting semantic anomalies during the data ingestion process.

Retrieval-Augmented Generation Prompt Injection AI Safety Vector Databases Cybersecurity

Damage Control

Autonomous AI Agent Breaches McKinsey’s Lilli Platform

Mar 11, 2026499

An autonomous AI agent hacked McKinsey’s internal AI platform in two hours, exposing millions of confidential records and highlighting the urgent need to secure the prompt layer.

Prompt Injection AI Agents Vulnerability Research Retrieval-Augmented Generation AI-Enabled Cybercrime

Agentic Systems

Design for Distrust: Securing AI Agents via Container Isolation

Feb 28, 2026344

Secure AI agent development requires a 'design for distrust' approach that uses container isolation and minimal code to contain potential damage.

AI Agents AI Safety Sandboxing Prompt Injection

Under the Hood

The $100 AI Prompt Injection Challenge

Feb 17, 2026369

A $100 bounty challenge invites hackers to leak a secret file from an AI assistant using email-based prompt injection.

Prompt Injection AI Safety Prompt Engineering AI Ethics

Damage Control

Moltbook: AI Theater, Not AGI—And a Security Wake-Up Call

Feb 10, 2026317

Moltbook is a flashy but hollow showcase of bot behavior—more human-run theater than autonomous intelligence—and a wake-up call about large-scale agent security risks.

AI Agents AI Hype AI Safety Prompt Injection

Agentic Systems