The promise of AI coding assistants sounds incredible: let the AI write your code while you focus on the big picture. But after months of hands-on experience with tools like Claude, GitHub's Spec Kit, and various LLM-powered development workflows, we've discovered that the reality is far more nuanced than the marketing suggests.
One unexpected benefit of AI coding tools is that they're finally forcing developers to write specifications, even when an email would have been enough! Not traditional waterfall documentation, but quick bullet-point lists that AI can expand into detailed specs. Tools like GitHub's Spec Kit can generate robust specification plans and tasks, especially when you manually correct documents between stages.
The catch? This creates a "waterfall in 15 minutes" scenario. You need the business spec, then a technical analysis, then you hold off the LLM from writing code as long as possible, and finally let it generate the implementation. It sounds efficient until reality hits.
The fundamental problem with AI-generated code is the same problem that plagued traditional waterfall development: specs change. After three months of beautiful specifications and AI-generated code, user testing reveals the spec doesn't actually work. Now what?
This is where the carefully constructed AI workflow falls apart.
You've accumulated one main spec (or fragments of it), 20-50 technical analysis documents, and a codebase that's already been implemented. When you update the spec to reflect what actually needs to happen, the LLM loses the thread entirely.
The situation becomes even more problematic when you consider context window limitations.
Even though modern LLMs boast impressive context windows - some reaching a million tokens or more - they struggle to effectively process the sprawling documentation created over months of development.
When you need to update a specification, the LLM can't reliably load and cross-reference all the related specs, technical analyses, and implementation documents that were created weeks or months ago. Context “rot” is a term that references the fact LLMs tend to “remember” less and less what happened at the beginning of a conversation as the conversation grows. Similarly, when trying to look through 3-month-worth of technical documentation, it will inevitably “lose focus”.
Without proper project management tools to organize and link these specifications, maintaining coherence becomes nearly impossible. The result is a documentation graveyard: partially outdated specs that are inconsistent with each other and increasingly disconnected from the actual codebase.
The context window limitations that plague specification management extend directly into code generation, but here the problem becomes more acute: you need to carefully curate what information the LLM sees at any given moment.
Deciding just the right information for the next step, and getting it wrong means the AI either misses critical dependencies or gets overwhelmed by irrelevant information.
This creates a fundamentally different workflow than the one of human development. A senior developer can work with implicit context - they remember the architectural discussions, they know which libraries are available, they understand the project's conventions, they have accumulated countless good practices knowledge. The LLM needs everything explicitly provided, every single time.
The result is micromanagement, and you have to watch for things like:
Make architectural decisions that contradict earlier choices it has no memory of
Write custom implementations instead of using installed libraries it can't "see"
Forgetting to update API calls across files that aren't in its current context
Creating backward compatibility when none is needed
Leaving unused code that accumulates over time
What makes this particularly frustrating is that the AI often has the knowledge you need - it knows about the frameworks, the libraries, the best practices. But knowledge isn't the same as awareness.
When you point out it should use a specific library or follow a particular pattern, it readily agrees: "Oh yes, you're right, I should use that." The information was there all along, it just wasn't in the carefully curated context window you provided. The AI can only work with what it can see, and managing what it sees becomes a full-time job in itself.
If there's one clear lesson from AI-assisted development, it's this: comprehensive testing and static analysis are no longer optional, they're critical from day one.
As humans working in small senior teams, you can often delay implementing pre-commit hooks, linters, and full CI/CD pipelines. With LLMs, you need all of that infrastructure immediately.
You need to go out of your way and implement time- and resource-intensive safeguards that you might not have needed for years, just to catch AI mistakes before they hit the repository.
The AI will fail checks 90% of the time, so you can't work the way humans do (commit, move to the next branch, fix failures later if any). You need immediate feedback loops.
This is oftentimes called the agentic loop, where the AI will break things or fail an implementation, your tests will catch that and the AI will have a chance to fix it. But if it breaks something that you were not testing for or not enforcing thorough testing then it goes unnoticed by the agent and its limited focus.
Despite the challenges, there are scenarios where AI coding assistants excel:
Technologies you don't know well: With unfamiliar languages the LLM can help with syntax and logic while you focus on reviewing the approach and business requirements.
Boilerplate and scaffolding: Creating initial project structures, basic CRUD operations, and standard patterns.
Code review: Surprisingly, LLMs are often better at reviewing code than writing it. They'll identify issues in code they just wrote if you ask them to review it.
Documentation and tests: AI can generate documentation and tests from day one, even if the tests need cleanup.
Domain-driven design with microservices helps manage AI coding by reducing context window requirements. Smaller components doing fewer things are easier for LLMs to handle correctly.
However, you still face the core problem: most software costs are in architecting a solution and adapting it over time, not writing code.
At their core, LLMs are still just “a very enthusiastic librarian with no memory”, they cannot “think about the big picture” or plan for the future. They can be useful for rubber-ducking but you have to be careful not to out-source the thinking to the LLM, the results will be sub-par.
Microservices are easier to handle for LLMs but the human should still come up with a sound architecture for the whole project!
And when you need to make changes to a component which has already been deployed to production, whether it's the API, business logic, or state management, that's another place AI assistants struggle.
The rapid evolution of AI models creates another challenge. Something that worked well three months ago might not work today because system prompts and model behaviors change with each release. You're not in control of the tooling.
Some developers are exploring alternatives like the "Shitty Coding Assistant" which uses minimal system prompts and lets you define the workflow. This approach flips the script: instead of adapting your workflow to fit the AI tool's assumptions, you adapt the agent to fit your workflow.
The promise of autonomous AI development remains unfulfilled for anything beyond simple projects. You need senior developers to manage the AI effectively, which somewhat defeats the purpose of using AI to reduce development costs.
This level of supervision feels familiar to any senior developer who's mentored juniors. You come up with the tasks, review their code, point out mistakes, explain best practices, and guide them toward better solutions. The difference? A junior developer learns from this process.
If you tell a junior developer three times not to mock the entire database in unit tests because it is better to verify queries using a lightweight in-memory db, they'll remember. They'll internalize the lesson and apply it to future work. They'll grow from needing constant oversight to working independently.
This fundamentally changes the value proposition. Training a junior developer is an investment with compounding returns. Training an AI is a recurring cost that never decreases.
AI coding assistants are powerful tools, but they're tools that require expertise to use effectively. They're not replacing developers; they're changing what developers need to focus on. Less time writing boilerplate, more time on architecture, testing infrastructure, code review, and understanding and customizing the tools themselves.
That last point bears emphasis: as models change every few months, system prompts evolve, and new tools emerge, developers need to invest significant time staying current. You're not just learning to code anymore; you're learning to prompt effectively, customize workflows, manage context windows, and adapt to breaking changes in your AI tooling.
The developers who will thrive are those who:
Understand when to use AI and when to write code themselves
Build robust testing and CI/CD from day one
Can effectively review and correct AI-generated code
Maintain control over their development workflow
Stay current with rapidly evolving AI tools and techniques
And, in order to be able to do all this, at the very least, you still have to know how to code.
The "vibe coding" era is teaching us that good software engineering practices aren't optional overhead. They're the foundation that makes AI assistance viable at all.
Have you experienced similar challenges with AI coding assistants? We'd love to hear about your experiences and strategies for making these tools work in production environments.
In December I let three AI coding assistants, Claude Sonnet 4.5, GPT-OSS and Kimi-K2, solve Advent of Code independently. All 12 days passed. GPT-OSS showed fully local AI works; Kimi-K2 hardcoded outputs; AIs know algorithms but reinvent wheels. Wha...
by Emmanuelle Delescolle
AI product development requires adapting Scrum: set learning-focused Sprint goals, redefine ‘Done’ for spikes, embrace hypothesis-driven metrics, and communicate clearly with stakeholders. Adopting Dual-Track Agile enhances clarity, managing AI’s inh...
by Shrikant Vashishtha
Squads is using AI more and more, but we have strong reservations. This article tries to shine light on the reasons why we are reluctant, and to cut away some FUD surrounding AI. A left brained, down to earth look at it, to balance against the hyperb...
by Iwein Fuld