The Vibe-Coding Bubble

It's now the mid-2020s, 2025 to be specific, and the latest AI fashion is large language models (LLM) that are now so convoluted – and being fed so much data, so carefully curated, while running on vast arrays of processors that consome vast amounts of electricity and coolant water, enough to cause significant environmental harms – that they can actually do quite a good job of selecting, among things others have said, to pick which ones match a given prompt – even mixing them and matching them in ways that sometimes prove genuinely useful, if you are clever enough about constructing a good prompt – and making up really convincing bullshit (which can be fun, but gets to be a problem because folk can't see the boundary between this and the stuff parotted from others, that might have some chance of not all being bullshit).

In particular, by throwing insanely large amounts of version-control history of how code has changed (and thus, incidentally, of code) at them, it's possible to get them to help folk write code. Which is lovely and I'm happy to hear it. I'm way too close to retirement (and going off to do something completely different) for it to be worth investing significant amounts of my time in learning how to prompt them or for me to need to worry about them taking my job – indeed, if one can actually replace me, that'll be great, because my colleagues are already worrying about how they're going to do that in a few years time, when I do retire.

However, as ever in the capitalist system, there are rich people rubbing their hands with glee at the thought that they can use this to replace programmers or at least bully programmers into putting up with lower pay or worse working conditions (which would just make us less poroductive, but the idiots can't see that for all the pennies they (probably wrongly) imagine it'd save them) by claiming they could replace programmers. In this they are wildly mistaken, if only because actual programmers are – of necessity – not entirely stupid and can see that, at the very least, they'll still need someone skilled at prompting and to review the resulting slop.

An interesting experiment

Furthermore – as even one of those proponents of AI (Sam Altman, footnote 2) cited in this lovely article (by Mike Judge) said – the world wants (arguably even needs) 100 times maybe a thousand times more software, so the mere factor of ten claim he used to make for increased productivity of programmers – which the article gives compelling reasons for being skeptical about – is still way short of what we need to make up for the shortage of people with the right kind of gently demented mind to actually make good programmers.

Lest anyone suspect I might think otherwise, let me just take a moment to be quite clear that that same mind-set has its down-sides, too, and there are other mind-sets the world needs plenty of for other purposes – it's not that programmers are in any particular sense better people or anything, just one of the various types of people, for which the world currently has lots of demand. It takes, as the song says, many kinds of people to make the world go round, and the programmer type is just one of those many.

That article also includes some hard data on the author's own practial experience of how using AI on projects affected his productivity in practice. He reports (and this is a sensible measure) the ratio of how much he estimated it would take to do the job himself – estimated before he'd decided, by coin-toss, whether to vibe-code it instead – to how long it actually took, classified by which way the coin came down when he tossed it. He's done his own statistical analysis of that, and I recommend you read the article for the details, but my own eye-balling of his data says it looks like data from a negative exponential distribution (and post-hoc thinking about it does indeed have me thinking we should expect that), so the usual statistical analysis (that he most likely used, unless he knows statistics better than most; it uses the normal distribution) isn't necessarily apt. Rather, I'd say the AI increases the width scale of the distribution – and that's bad, because it not only increases the mean but also makes it harder to plan because it's noisier (i.e. it also increases the variance). Doing regular statistical analysis on the logarithms of the ratios would likely be more instructive than doing it to the raw ratios, but the general implications are going to work out similar.

Discussion

Here are some of my thoughts provoked by reading the Hacker News discussion of this:

There's an old (wry) saying along the lines of the first 90% of the job takes the first 90% of the time – and the rest takes the other 90% of the time, which more or less reflects that folk's estimates of how much work a job involves are accurate for the easy bulk of the work but wildly wrong for the tricky and fiddly parts that only come to their attention while they're doing that easy bulk. LLMs have, to borrow a phrase from Spinal Tap, turned that up to eleven: as one commenter said, it is so fast to get to a working prototype that it feels like you are almost there, but there is a lot of effort to go. The earlier revolution of writing a minimal viable product (MVP), that it's possible to test and on which users might even give feedback, and then evolving it from there (always expanding the tests and listening to new feedback) gave those practicing a genuine improvement in effectiveness (because their goals were better aligned with those of their users) and gave something that felt like it was most of the way there in only a fraction of time that getting it to a good and finished state took. It seems to me that LLMs have just cranked that up to eleven: you feel like you're most of the way there even sooner, but may actually have more work to do overall. (In contrast, I'll allow that the MVP approach may well have saved time and effort overall.)
That phenomenon of having got most of the way there did lead to one problem the MVP approach hit, summarised by another commenter: People made tools specifically to make templates look like templates and not a finished product, just because client was ready to push the product just after seeing real looking mockups of interface. Similarly, I remember once working on a product which was shipped in many different configurations (on different platforms to the general public, and to different paying customers) and made (as most modern software does) moderately extensive use of third-party stuff, in so far as its licenses allowed. Our help system included a thank you and credits page with the full list of all the third parties, along with their licenses (as some of these required – but we were always grateful so did also include those whose licenses didn't require it). Which things appeared on that page did, of course, depend on the configuration – and I'd been the one who brought order to the chaos of the code that used to sort that out, so I'm the one someone in legal came to in response to a ticket I'd filed saying we should get legal to review all the wordings. Naturally, he'd just opened the relevant page on his Mac (oblivious to all the licenses not on it, that were on our other versions of that page), imported it to Word, hacked it around into a form he liked and sent it to me, expecting me to (literally) paste it into the software and have the result be used. Fortunately he did have enough intelligence that, once I'd sat him down and walked him through the complexity of the real situation, he wised up and we were able to work productively on making it all better. However, the shock of his initial blasé expectation did wake me up to how the rest of the world lives. Those whose tools offer no other perspective than what you see is all you've got can have a hard time understanding just how much complexity lurks beneath the surface of what they simply see. I have no doubt LLMs shall turn out to be amplifying this problem, too.
This comment is disturbingly insightful: Engineers, per usual, are pointing out the problem before it becomes one and no one is listening. The engineering mind-set calls for ruthless honesty – because, without it, you get things wrong and there shall be pain later – and can therefore see that the hype is hiding important issues. However, managers are political creatures, for whom truth is maleable as it's all about who believes what, which can be manipulated to advantage. So they interpret the engineers as if they're talking like that and fail to understand that engineers are talking about the non-maleable truth of how stuff that doesn't care what lies you tell is going to behave regardless. Which leads to pain for all parties, because the ones with the power don't understand the ones who know the truth.
One commenter said: If you sat down in front of a 15 yo computer, or tried to solve a technical challenge with the tooling of 15-10 years ago, I don't think you'd get a significantly worse result. I am literally writing this on a computer of that antiquity, using the same tools I've been using for the whole of that time (albeit an updated version of my editor, but that's frankly causing me pain more than it helps: I endured the updates because Linux updates to keep on a maintained, hence secure, version obliged me to) and, at work, pretty much the same is true, although the machine I mostly work on there is 10 years old, so at the newer end of the commenter's range. The power of the machine or its tools isn't what limits me: my own cognitive capabilities are my rate-limiting step. Although, to be fair, at work I do now make a fair bit of use of git revise, which either wasn't available or I didn't know about it until recently, and that's genuinely improved my productivity.
There are places where LLM assistants genuinely can produce significant productivity improvements, even quite plausibly by the factor of ten that is so often cited: specifically, any job that is essentially IT drudgery. Translating from one widely-used language to another or churning out yet another of some standard thing – such as transforming a restraurant's menu into a web-site that lets customers place orders from that menu, pay for them and get an estimate of when their food shall arrive – that's been done loads of times before. This is good. People previously tied up doing those things are now available to other employers to do something else than grind out yet more of those every week – which was boring, and they didn't learn or grow beyond the point where they'd made a handful of them. They may have to live frugally until they find their next job, but once they get it they'll do something different and, in the process, learn and grow. Rinse, repeat. The flip side of this is: if someone tells you they've become 10× more productive with AI (by which they mean LLMs), ask them how repetitive, or boring, their work was, before they made the switch.
Ah, The Register also has an article about the paper that prompted Mike Judge to do his experiments and consequently change his mind about AI's productivity benefits. Gotta love The Reg – their ingrained moderate cynicism is such a relief from the rest of the world's hype-or-hate extremism.
Someone objects to describing what LLMs do as glorified pattern matching by saying it ignores the fact that this mere pattern patching (sic.) text machine is doing things people said were impossible a few years ago and (remember when people said LLM's could never do math, or that image models could never get hands or text right?) Well, there may well have been folk who saw that LLMs couldn't do math, and other stuff, and extrapolated that it never would, or that those tasks would be impossible for LLMs, but my guess is that those weren't the smart geeks on Hacker News who have reservations about how much we can realistically expect from LLMs. Those opposed to a thing are not one homogeneous mass. There are smart geeks who object, for typically well thought-out reasons, to things that other folk object to on the most comically specious of grounds that make the geeks objectors roll their eyes just as much as proponents do. Much as there are smart geek proponents, for well thought-out reasons, of those same things, and others whose reasons make the geek proponents cry and bang their heads on desks in frustration at how willing folk are to open their mouths and let out (usually poorly-articulated) opinions that reveal an utter failure to think. Geeks have long known that pattern-matching is mighty powerful, you'd be amazed at what it can do for you, and surely there are ways to make it better whenever it runs up against limitations. And Lo ! LLMs have limitations, some of which shall soon be overcome, others of which shall remain frustratingly stubborn. We're not sure which those are going to be, but the fact is that all they're doing is looking at texts (enormous corpuses of them, pre-curated by humans, often to specifically include annotations to help LLMs dodge dumb mistakes to which they're susceptible) without first-hand real-world experience of anything but the text (albeit the text tells enough to give a pretty good account of the real world, if you but understand how to read it as such) and spotting patterns in the texts. That does turn out to deliver a lot (as long as you chuck enough text at them, that you've curated enough, and you can afford to throw a vast amount of computing power, and enough electrical power to be a major environmental hazard, at it, while using enough water to keep the processors cool, again causing a major environmental hazard), but some of us can't help but suspect the lack of practical experience of interacting with the material world is going to be a big problem. Personally I do think there's potential for a robot / AI hybrid to get a lot more interesting, but I'll wait until I see the evidence before I believe any computer system is genuinely intelligent, as opposed to merely extremely clever.
Just to be clear, smart geek LLM skeptics (mostly) believe genuine artificial intelligence is possible, and that we'll probably get there this century – unlike Roger Penrose who, though undeliably a smart geek, is convinced machines can't be intelligent, or perhaps it was just conscious that he didn't believe in, to be fair. We just don't think LLMs are there, and we strongly suspect they're missing something that's qualitatively important, that you won't make up for by just doing LLMs harder. We probably don't all agree on exactly what that qualitative gap is, but in my view it's about understanding the real world, in ways that only come from interacting with a lot of it.
A lot of those objecting to the original post are accusing its author (and others skeptical of LLM hype) of various things, like being scared of AI taking his job, or not knowing how to use it, or whatever. They don't seem to have read the original post. This is common on internet chat fora, sadly. The original poster made clear that he'd been enthusiasically using LLM tools to help him in his work for some time, and had become convinced it was improving his productivity. Then he read an article that said a study had shown that a lot of folk, similarly convinced by the experience, turned out to be delusional, when actual productivity was measured. Trusting to Science to adjudicate between them and his own experience, he embarked on a tolerably well-formed experiment, to determine whether his perceptions were deluded or actually faithful. He was clearly surprised, even shocked, by the results. He wanted to believe, but his own data told him to give it up – and he had the courage to trust it. He'd been vibe-coding for months and had plenty of prior experience of making software, so was presumably about as good at it as you can expect (and, if not, then the AI hype is glossing over just how hard it really is to learn to use its shiny new toys), yet actually nowhere near as good as the experience left him feeling he was. His prior enthusiasm for vibe-coding and conviction that it was really making him more productive leaves little scope for accusing him of being scared of AI stealing his job, or being biased against it for other reasons. He actually relished the prospect of using it to do his job better, thought it was making him more porductive and is disappointed to see it wasn't, even though it seemed at the time like it did.

and I think that's enough reading of the comments for me. Partly because the arguments are showing up repeatedly, but partly becaus I just know that, if I keep going long enough I'll run into side-threads so far off topic and so devoid of any semblance of sanity as to constitute a Lovecraftian horror in their own right. Internet discussion threads, how I love thee not. Besides, it's getting into the evening and I have more important things to do like reading cartoons and drinking.

How skilled geeks see LLMs

A major part of the story that tends to get missed is that most software geeks who are any damn good at their jobs would positively love to see something that deserves the name AI, precisely because it'd thin the herd – the world is so desperate for software talent that some pretty useless idiots do get hired to do technical work, only to disappoint anyone competent they might be working with. Those would be the first to go (at least from any employer worth working for) if there were a drop in demand for programmers, and better tools would enable those remaining to take up the slack in the world's continuing desperate need for significantly more than ten times as many as are presently available.

If anyone's thinking that basic economic theory claims this situation should lead any market even remotely close to free to offer big incentives to programmers, well, yes the theory does say that, and to some degree the market has done that (which has contributed to talentless fools with delusions of skilz leaping in to claim a slice of the pie and making the situation worse by being bad at it), but according to the theory the market is still out of balance, so why hasn't it corrected itself further ? Which is basicaly why a lot of competent geeks are profoundly skeptical about contemporary economic theories or claims that we live in the free markets those theories presume and (at least some of) our governments claim to provide.

The programmers who would welcome the AI revolution are typically the ones who are contradicting the current AI hype about LLMs, because we're engineers who care about what a thing actually does – not about how cool it feels to use it or how pretty or shiny it is – and understand the significance of short-comings others might dismiss as unimportant or just plain ignore. We know how easy it is to fool yourself, which is why – as Mike Judge did and reported in his article – we actually test our beliefs to find out whether we're fooling ourselves, and change our minds (again, as Mike did) when we discover we have been. This Is Just Science.

We deny that LLMs are as awesome as claimed – while being very clear about what they are good at and for, much of which really is useful and good, which many of my peers are using it for, and we're happy to have it and acknowledge some value in LLMs for it, but it's not what the hype claims – and we do so not out of sour grapes or fear that we'll lose our jobs but because it's being oversold and that's causing various kinds of harm, that are only going to stop when folk wake up and recognise its limitations, use it for what it's actually good for (there's plenty of it) and stop fooling themselves that it's good for more than it is. Wishful thinking leads to disasters.

Final stray thought

I'm left wondering whether anyone has done similar experiments for caffeine use. I realise the claims in favour of that are nowhere near as heavily hyped, but it might be interesting to see some hard experimental data.

Written by Eddy.