GPT‑5.1‑Codex‑Max: Long‑Horizon Agentic Coding with Compaction and Fewer Tokens

OpenAI launched GPT-5.1-Codex-Max, a long-horizon agentic coding model that uses compaction to work across multiple context windows and deliver project-scale results. It outperforms prior models on key benchmarks while using fewer tokens, adds a new ‘xhigh’ reasoning mode, and supports Windows and improved CLI collaboration. Available now across Codex surfaces (API coming soon), it ships with stronger safety measures and is recommended for agentic coding in Codex-like environments.

Key Points

New agentic coding model (GPT-5.1-Codex-Max) introduces compaction to work coherently across multiple context windows, enabling project-scale, long-running tasks.
Strong benchmark gains and practical improvements (Windows support, better CLI collaboration) with significantly improved token efficiency (≈30% fewer thinking tokens at medium effort on SWE-Bench Verified).
New ‘xhigh’ reasoning effort offers better answers for non-latency-sensitive tasks; ‘medium’ remains the recommended default.
Safety measures include cybersecurity monitoring, secure sandbox defaults, guidance on prompt-injection risks, and a continued requirement for human code review.
Available now in Codex for Plus/Pro/Business/Edu/Enterprise, replacing GPT-5.1-Codex as default; API access is coming soon and use is recommended for agentic coding contexts.

Sentiment

The overall sentiment of the Hacker News discussion is mixed to cautiously skeptical. While there's acknowledgment of potential performance improvements and token efficiency, significant doubt is cast on the announcement's timing and the model's reliability for truly long-running, unsupervised agentic work, given past negative experiences and fundamental philosophical disagreements about autonomous coding agents.

In Agreement

The new model achieves benchmark SOTAs, enables multi-hour work via compaction, and is 30% more token-efficient.
A user found previous Codex versions more effective than Claude for complex coding tasks and is optimistic about the new version's ability to review past work.
One user plans to stick with Codex over Gemini 3 for agentic coding, seeing Codex Max as a welcome step up, albeit with reservations about past token burn and tool use issues.

Opposed

OpenAI tends to time announcements with competitor releases, suggesting this might be an incremental update that was held back.
Skepticism about the necessity of 'compaction' at the token layer, questioning why the model's inherent attention mechanisms wouldn't handle context preservation automatically.
Strong disagreement with the idea of a 'reliable coding partner' for 'long-running, detailed work,' with concerns that unsupervised long-running agentic tasks lead to 'horrifying' and inefficient approaches (e.g., rewriting entire libraries).
Past negative experiences with Codex 5.1 cited due to excessive 'token burn' and tool use issues, leading to downgrades.
The release timing suggests a competitive response to other companies like Claude, rather than a purely independent milestone.