On May 23, the Wall Street Journal ran an interview that read less like a warning and more like a confession. Two of the people behind Pi, the agent harness at the core of the popular OpenClaw framework, turned around and warned the industry about the thing they helped build. Mario Zechner and Armin Ronacher gave us a name for what is now filling our repositories: vibe slop. It is two older ideas welded together. "Vibe coding," Andrej Karpathy's term for building software by describing it to a chatbot in plain English, and "AI slop," the low-value machine output that has saturated every content platform you use. Put them together and you get code that demos fine and falls apart the moment real users prod it.
This is not coming from the usual AI skeptics. Zechner has been writing code for thirty years and created libGDX. Ronacher built Flask and Jinja and spent a decade on open-source infrastructure at Sentry. These are the people who shipped the agentic tooling that everyone is now pointing back at their own codebase. When the engineers who built the engine start warning about it, it is worth slowing down to listen.
My take: the scary part is not that AI can write a bad function. We have always been able to write bad functions on our own. The scary part is the speed, the confidence, and the fact that nobody on the team can explain what just got merged.
Intro
The warning lands differently because of who is making it. Ronacher says he spoke with more than thirty teams and saw code quality measurably decline where vibe coding had taken over, and not on weekend toy apps. On serious projects. His own essay, "Some Things Just Take Time," makes the argument plainly: a lot of the friction we are racing to automate away was doing real work. Reviews, design, infrastructure decisions, the cooling-off period before you commit something you do not fully understand. We have decided that friction is the enemy. Sometimes the friction is the point.
So this is not really a story about AI writing bad code. AI writes whatever the prompt and the odds produce. It is a story about teams shipping that output straight into production, skipping the steps that used to stand between an idea and a release. The model got faster. The discipline got optional.
Root cause
At the narrowest level, the root cause is obvious: people are shipping code they cannot evaluate. But "people are lazy" is a lazy diagnosis. The deeper cause is that the tooling finally made it possible to skip every quality gate and still get something that compiles, runs, and looks done. Compiling was never the hard part. Knowing whether the thing is correct, safe, and maintainable was the hard part, and that is exactly the part the prompt-and-ship workflow removes.
Here is the uncomfortable bit. AI did not cancel the skills a team needs to operate a complex system. Those skills are still the job. Designing the system so it holds together, reviewing changes against intent, writing tests that mean something, keeping someone around who understands the whole thing and not just the diff. None of that got repealed. It got skipped, because the tool let it be skipped and the calendar rewarded the skip. We celebrated autonomous development as if the autonomy was the achievement. The achievement was always whether the result survives contact with reality.
Zechner's sharpest point is about money, and it is the one I would put in front of a CFO. Bad code does not just break more often. It runs worse. It burns more compute, more memory, more bandwidth. He thinks the rising cloud bill is the thing that eventually forces the reckoning, especially for startups that vibe-coded their way to a product and now cannot afford to run it. The industry can keep playing this game "for a couple more months, or maybe even years," in his words. Then it catches up.
Lessons learned
First, treat AI output the way you treat a junior engineer's first pull request. Fast, occasionally brilliant, and not to be trusted unsupervised. The model is a very confident junior who never says "I am not sure." That confidence is the problem, not the help.
Second, speed is a loan. The grey-literature research on this calls it a "speed-quality trade-off paradox": teams feel an instant rush of progress, then most of them describe the result as fast but flawed. The same work identifies a new class of developer, one who can build a product but cannot debug it when it breaks. If you cannot fix it, you do not own a product. You own a liability with a nice UI.
Third, the cost does not disappear, it moves. The "endless stream of AI slop" study out of the academic side put numbers to what reviewers already feel: every AI-generated change that skips real review pushes its cost onto the maintainers, the reviewers, and the next person who has to touch the file. GitHub has already added policies and features to fight slop in its own repositories. When the world's main code host starts building defenses against its own users' output, the problem is not theoretical anymore.
Fourth, this is bigger than software. The same dynamic showed up in research this spring under the name "hypothesis slop." A Forbes piece reported that submissions to a major management journal jumped 42% after ChatGPT arrived, with the AI-heavy manuscripts harder to read and more likely to be rejected. Nature reported that preprint servers and conference organizers are fighting a wave of fully AI-generated papers, some with outright fabrications. Different field, same failure: production without comprehension.
But the tools keep getting better
The honest counterargument deserves space, because I half believe it. Models improve every month. Today's slop is partly tomorrow's solved problem. Vibe coding has genuinely let non-engineers build things that used to need a team, and that is not nothing. Some of the loudest objections really are gatekeeping dressed up as concern.
But notice that the people raising the alarm here are not gatekeepers. They are the builders. Zechner even credits Anthropic for dogfooding Claude Code on its own systems, then turns around and says he does not like the tool. The complaint is not "AI is bad at coding." It is that the workflow removes the human who understands the system, and a better model that removes that human faster just produces more slop faster. Capability is not the bottleneck. Judgment is. Better autocomplete does not create judgment. It just makes the absence of judgment cheaper to act on.
My read: the model getting smarter does not save you if your process is built to ship things nobody read. It just means the things nobody read are longer and more plausible.
What's next?
Here is where I have to be honest about my own seat. I work in professional services. The failed AI project coming back through the door is not a thought experiment for me, it is a line item. We are already getting the calls. A product that flew through its first six months, then started falling over, and now nobody on the original team can say why, because the original team prompted it into existence and moved on. No tests. No docs. No design intent anyone can reconstruct. Just a working demo that stopped working.
The cruel part is the timing. By the time these projects reach a shop like mine, the cheap fixes are gone. Catching slop early means a refactor and some tests. Catching it late means a rewrite, which is slow and expensive, the exact opposite of the speed the whole approach promised. The bill arrives, and it arrives with interest. For context, the accumulated technical debt principal in the US software economy was already estimated at around $1.52 trillion by CISQ back in 2022, before any of this. AI gives us a machine for generating new debt at inference speed.
So if you lead engineering, the next move is not a vague call for "better quality." It is specific:
- Make "can you explain this code" a merge gate, not just "does it pass CI." If the author cannot walk through it, it does not ship.
- Require tests written by a human who understands the requirement, not tests generated alongside the code by the same model that wrote the bug.
- Track cloud cost per feature. A sudden jump is a quality signal, not just a finance problem.
- Keep at least one person who can read the whole system, not only the diff in front of them.
- Use the AI. Just review its work like you would review anyone else's, because that is what it is: someone else's work.
Autonomous development was a great demo. Treating the demo as the destination is the mistake. The skills never went away. We just stopped paying for them, and the invoice is going to land on somebody's desk. From where I sit, it usually lands on ours, and it usually lands late.
References
- "The AI Superstars Who Say a 'Vibe Slop' Crisis Is Coming," The Wall Street Journal, May 23, 2026.
- Armin Ronacher, "Some Things Just Take Time," lucumr.pocoo.org, March 2026.
- Baltes, Cheong, Treude, "An Endless Stream of AI Slop: The Growing Burden of AI-Assisted Software Development," arXiv:2603.27249, March 2026.
- "Vibe Coding in Practice: Motivations, Challenges, and a Future Outlook — a Grey Literature Review," arXiv:2510.00328.
Comments