Faster LLMs, Bigger Demands: Why Coding Agents Won’t Stabilize Soon

LLM coding agents are powerful but currently feel like dialup: unreliable, slow, and prone to flakiness amid surging token demand. Faster throughput (tok/s) could unlock parallel, semi-unsupervised workflows—but semiconductor limits, reliability issues, and pricing pressures will shape how this scales. Developers who stay curious and adapt their tooling will capture the most productivity gains.

Key Points

Reliability is shaky across major LLM providers, and agentic coding workflows are flakey and resource-hungry, echoing the dialup era.
Even limited OpenRouter data shows explosive token growth; agentic coding likely uses ~1000x more tokens than basic chat.
Throughput (tok/s) is now a key UX constraint; at ~2000 tok/s, the human becomes the bottleneck and new parallel, semi-unsupervised workflows become viable.
Demand will keep compounding as capabilities rise, but semiconductor stagnation limits efficiency, stressing infrastructure and reliability.
Expect pricing to shift (e.g., off-peak incentives) to flatten demand; developers who adopt and adapt agent tooling will gain the most.

Sentiment

The overall sentiment of the Hacker News discussion is largely skeptical and critical of the article's more optimistic assertions, particularly regarding universal productivity gains and the benefits of sheer speed. While acknowledging some specific utility for LLMs, the predominant view challenges the notion of a net productivity increase and questions the optimal speed and integration methods, often highlighting the cognitive burdens and fundamental differences of AI dependency.

In Agreement

LLM providers indeed suffer from reliability issues and outages, which severely impact productivity, similar to other essential tools like GitHub or Slack going down.
LLMs can significantly increase productivity for specific, lower-stakes tasks (e.g., internal tools, personal projects) that would otherwise not get done due to time constraints.
There's an expectation that AI tooling will continue to improve in speed and integration (e.g., Cursor is seen as a good balance, future local models could reduce token usage).
The dependence on AI agents for coding is a significant factor impacting productivity, with analogies drawn to critical infrastructure outages like coal shortages for factories.

Opposed

Skepticism that LLMs genuinely increase developer productivity; instead, they reduce cognitive engagement with tasks, leading to a lack of context in generated code and feelings of it being "alien."
Conversely, LLM use can *increase* cognitive engagement and mental exhaustion due to the constant "hand-holding" required to keep agents on track and prevent over-engineering.
The article's qualification "know how to harness it" is seen as a "cop-out," akin to recurring phrases in past tech fads like "you're doing Agile wrong."
Current LLM streaming speeds are not a problem; some users find them ideal for real-time supervision and intervention, suggesting faster output might even be detrimental and require a "slow down" feature.
Dependency on AI coding agents is fundamentally different and potentially worse than other tools (GitHub/Slack) because it's like relying on "external brains" with knowledge not possessed by the developer, and local hosting isn't always feasible.
Discomfort or outright opposition to deep LLM integration into IDEs, comparing it to annoying and counterproductive intellisense features.