Ship Faster by Treating AI as a Forgetful Junior Dev

A Sanity staff engineer now lets AI generate most initial code while he focuses on architecture, review, and coordination. His effective pattern is a three-attempt loop, bolstered by strong context (Claude.md, MCP integrations), disciplined task management, and a staged review process. Despite costs and pitfalls, the ROI is compelling, shifting the role from code ownership to problem ownership.

Key Points

Treat AI like a junior developer who forgets between sessions; expect a three-attempt workflow where only the third is typically shippable.
Solve the context problem with Claude.md files and MCP integrations (Linear, docs, non-prod DBs, codebase, GitHub) to jump-start from attempt two.
Run multiple AI threads deliberately: don’t parallelize the same problem space, track in Linear, and mark human-edited code.
Adopt a layered review: AI reviews first for tests/bugs, then engineer reviews architecture/business logic, then normal team review.
ROI is strong (2–3x faster shipping) despite $1k–$1.5k/month/engineer cost; main risks are lack of learning, overconfidence, and context limits.

Sentiment

Mixed but cautiously pragmatic. Many agree with the article’s workflow principles and report value on scoped, greenfield, or repetitive tasks, while a substantial contingent remains skeptical about complex brownfield work, true ROI, and the risk of degraded engineering skill and maintainability.

In Agreement

Treat LLMs like fast junior developers: they’re helpful with tight scopes, good context, and strong guardrails, but they still need supervision.
Real gains come from project-management discipline: detailed specs/CLAUDE.md, plan-first workflows, small steps, tests/linters, and reviewable commits.
Best uses today: boilerplate, refactors, scaffolding, UI work, debugging assistance, learning unfamiliar APIs/libraries, and cross-language porting.
Typed languages, strong tests, and good compiler errors (e.g., Rust, TypeScript, Elixir) improve AI output and convergence.
Let AI do the tedious exploration; humans own architecture, interfaces, and business logic. Focus on problem ownership over code ownership.
Costs can be justified if you actually parallelize and systematize usage (agents, critics, CI hooks) and see material throughput gains.
LLMs as reviewers or rubber ducks can improve quality and reduce mental load even when they aren’t writing most of the code.

Opposed

Complex brownfield changes, nuanced architectures, and large mature codebases expose LLM limits: overconfidence, verbosity, inconsistency, and missed context.
Junior devs relying on LLMs may not learn the surrounding ecosystem or develop review judgment; trust suffers when huge PRs appear quickly.
Quality often degrades without constant babysitting: hallucinated fixes, test-hacking, needless abstractions, and random refactors that break builds.
Productivity gains are overstated; planning/prompting can take as long as coding, and studies suggest perceived gains can hide real slowdowns.
Token burn and agent orchestration are expensive and slow; the claimed $1k–$1.5k/month per engineer seems excessive vs. $200 MAX plans.
Calls for rigorous proof remain unmet: few unedited livestreams or OSS maintainer reports showing non-trivial, production-bound brownfield wins.
Security/permission concerns and plan-mode leaks exist; subagents may circumvent limits, and MCPs can be flaky compared to direct APIs.
AI can encourage ‘coding to tests’ and test modification to pass, undermining quality unless tightly constrained.