Where's the shovelware? Why AI coding claims don't add up

mikelovesrobots.substack.com

Daily Digest email

Get the top HN stories in your inbox every day.

some-guy

These claims wouldn't matter if the topic weren't so deadly serious. Tech leaders everywhere are buying into the FOMO, convinced their competitors are getting massive gains they're missing out on. This drives them to rebrand as AI-First companies, justify layoffs with newfound productivity narratives, and lowball developer salaries under the assumption that AI has fundamentally changed the value equation.

This is my biggest problem right now. The types of problems I'm trying to solve at work require careful planning and execution, and AI has not been helpful for it in the slightest. My manager told me that the time to deliver my latest project was cut to 20% of the original estimate because we are "an AI-first company". The mass hysteria among SVPs and PMs is absolutely insane right now, I've never seen anything like it.

culopatin

Today a friend of mine connected me with his uncle who wanted to develop an MVP for his company. He claimed he didn’t want to distract his engineers on this project, but that “it shouldn’t take you more than 5hs to do this with vibe coding”. I promptly declined, if it takes 5hs why are you reaching out to me? It would take more than 5hs just to bring me into the loop of what you want vs your own engineers. If vibe coding is that good might as well DIY while you’re showering!

noduerme

These kinds of "clients" have always been around. If they want to tell me how long they think it should take, my answer has always been that they should do it themselves. If they say something like, "it shouldn't take too long," I said "send me what you need it to do." Then I look at that and ask "what if A, B or C happens? What if a user does X?" Make a list about 15 bullet points long of edge cases they hadn't thought of showing, the flaws in their business logic. (This, btw, is what I'd do if I were instructing an LLM as well). Then it's, "well, how long will that take?"

And my answer - like the best of car mechanics who work on custom rides - is: I don't know how long until I actually get in there, but minimum 5x what you think, and my rate is $300/hr. Is it worth it to you to do it right?

Usually the answer is no. When it's yes, I have a free hand. And having a few clients who pay well is worth a lot more than having a few dozen who think they know everything and are too cheap to pay for it anyway.

Wowfunhappy

I have fully vibe coded several apps at this point (none professionally, but stuff I actively use in my life). One thing I don't think everyone understands is that it still takes time. I need to do countless rounds of testing, describing the exact problem I find, rinse and repeat again and again for hours and hours. I happen to enjoy this way of working and I do think it's faster than writing the code myself, but it's not fast.

patapong

Yep agreed... It's a bit like sifting gold, most of what it produces is crap, but if you stay with it long enough and put in enough effort you will eventually have something that works well.

Part of the issue is that it is so fast to get to a working prototype that it feels like you are almost there, but there is a lot of effort to go.

some-guy

Not only that, but the requirements are far more lax when it's your own project. In an enterprise setting on a 14-year old codebase (where this 80%-reduced-timeframe project lies), vibe coding doesn't work at all! PMs and managers simply do not understand the nuances of these tools.

dangus

I find that the main way it saves time is in situations where yes I know how to write code but I am looking to write something in a domain I’m less familiar with.

And regex.

duxup

If it's 5hs, why not distract his own engineers?

That's pretty much nothing ... to me that line indicates a whole lot of other possible things.

gt0

[flagged]

DaiPlusPlus

I'm sure they justify their strong opinions on the grounds that ChatGPT, no less, had individually and specifically told each and every one of them that they're the single most leading expert in immunology in the world.

DrillShopper

Expertise is second to results - without results, expertise doesn't matter. I'm sure there are a lot of homeopathic "healers" who feel they have a lot of expertise, but the results are sorely lacking.

Ship more or STFU

materielle

My opinion is that AI isn’t actually the root of the problem here.

It’s that we are heading towards a big recession.

As in all recessions, people come up with all sorts of reasons why everything is fine until it can’t be denied anymore. This time, AI was a useful narrative to have lying around.

osigurdson

I think a kind of AI complacency has set in. Companies are just in chill mode right now, laying off people here and there while waiting for AI to get good enough to do actual work.

baselessness

Everyone is bracing for a labor supply shock. It will move in the direction opposite what investors expect.

2030 will be 2020 all over again.

aprilthird2021

First thing I thought of was Benioff saying he cut thousands of customer support roles because AI can do it better then turning around and giving lackluster earnings report with revised down guidance and the stock tanks

some-guy

I have never, ever seen SVPs, CEOs, and PMs completely misunderstand a technology before. And I agree with you, I think it's more of an excuse to trim fat--actual productivity is unlikely to go up (it hasn't at our Fortune 500 company)

osigurdson

>> productivity is unlikely to go up

I wonder how that would even be measured? I suppose you could do it for roles that do the same type of work every day. I.e. perhaps there is some statistical relevance to number of calls taken in a call center per day or something like that. One the software development side however, productivity metrics are very hard to quantify. Of course, you can make a dashboard look however you want, but impossible, essentially to tie those metrics to NPV.

jrochkind1

> I have never, ever seen SVPs, CEOs, and PMs completely misunderstand a technology before.

I'm legit not sure if that's sarcasm or not

isodev

> we are heading towards a big recession

Who is we? One country heading into a recession is hardly enough to nudge the trend of "all things code"

viridian

The last US recession that didn't also pull in the rest of the western world was in 1982, over 40 years ago. Western Europe, Aus, NZ, Canada, and the US all largely rise and sink on the same tides, with differences measured in degrees.

danaris

Enough of the tech industry is America-based that a US recession is enough to do much more than nudge the trend of "all things code". Much as I would prefer that it were not so.

colechristensen

America's recessions are global recessions.

dragonwriter

If that “one country” is the US and not, say, Burkina Faso, it is a major impact on financing, and software has an unusually high share of positions dependent on speculative investment for future return rather than directly related to current operations.

rootusrootus

> My manager told me that the time to deliver my latest project was cut to 20% of the original estimate

That's insane. Who the hell pulls a number out of their ass and declares it the new reality? When it doesn't happen, he'll pin the blame on you, but everyone else above will pin the blame on him. He's the one who will get fired.

Laying off unnecessary developers is the answer if LLMs turn out to make us all so much more productive (assuming we don't just increase the amount of software written instead). But that happens after successful implementation of LLMs into the development process, not in advance.

Starting to think I should do the inadvisable and move my investments far far away from the S&P 500 and into something that will survive the hype crash that can't be too far off now.

torginus

The whole 'startup/scaleup' culture (which have become the industry titans of today) - is insane and have been insane for as long as it has been a thing - the culture of either 'just grow and figure out how to monetize'(Social media, food delivery etc.) or 'we're selling you this technology that doesn't exist yet but going to be insanely valuable in the future (AGI, self driving)' or 'we're selling shovels to the first 2' (cloud providers, Nvidia) has been insane.

I'd argue that compared to a decade. 15 years ago, relatively little value has been created. If you sat down in front of a 15 yo computer, or tried to solve a technical challenge with the tooling of 15-10 years ago, I don't think you'd get a significantly worse result.

Yet in this time the US has doubled its GDP, most of it owning to the top, to the tech professionals and financiers who benefited from this.

And some of this money went into assets with constrained supplies, such as housing, marking up the prices adjusted for inflation, making average people that much worse off.

While I do feel society is making progress, it's been a slow and steady march. Which in relative terms means slowing. Of I gave you $10 every week, by week 2, you'd double your wealth, by the end of the year, you almost didn't notice the increase.

Technology accumulation is the same, I'd argue it's even worse, since building on top of existing stuff has a cost proportional to the complexity for a fixed unit of gain (and features get proportionally less valuable as you implement the most important ones first).

Sorry got distracted from my main point - what happens when people stop believing that these improvements are meanigful or that technology that was priced in to produce 100x the value will come at all (and more importantly the company you're invested in will be able to caputre it)?

dangus

> If you sat down in front of a 15 yo computer, or tried to solve a technical challenge with the tooling of 15-10 years ago, I don't think you'd get a significantly worse result.

While you have decent points in your comment (essentially, the idea of tech industry growth slowing due to low hanging fruit being picked), if this statement going to be your barometer you’re going to end up looking stupendously wrong.

You can sit your Grandma down at her computer and have her type in “please make me a website for my sewing club that includes a sign up form and has a pink background” and AI will just do it and it’ll probably even work the first time you run it.

15 years ago tossing a website on Heroku was a pretty new concept, and you definitely had to write it all on your own.

10 years ago Kubernetes had its initial release.

Google Drive and Slack are not even 15 years old.

TensorFlow is just hitting its 10th birthday.

I think you’re vastly underestimating the last 15 years of technological progress and in turn value creation.

queenkjuul

I actually had a lot of fun building a native .NET web frontend in VB 2005 recently lol. I thought it kind of amazing that i could just bind UI controls directly to state objects and the UI would automatically React to any changes i made. Felt very natural as a modern web dev lol. Found a lightweight .NET JSON library that was compatible all the way back to VB 2005 as well.

In case you also need to control Spotify from Windows 95 :D

https://github.com/queenkjuul/spotify97

cortesoft

> That's insane. Who the hell pulls a number out of their ass and declares it the new reality?

Product and Sales?

Ekaros

Why don't we cut sales commissions by 30% and expect double the sales now. Surely LLMs will make them that much more effective and they still make more.

toomuchtodo

VXUS

Not investing advice; the bottom 490 companies in the S&P500 are nominally flat since 2022 and down against inflation, GPUs and AI hype are holding everything together at the moment.

> In simpler terms, 35% of the US stock market is held up by five or six companies buying GPUs. If NVIDIA's growth story stumbles, it will reverberate through the rest of the Magnificent 7, making them rely on their own AI trade stories.

https://www.wheresyoured.at/the-haters-gui/

> Capex spending for AI contributed more to growth in the U.S. economy in the past two quarters than all of consumer spending, says Neil Dutta, head of economic research at Renaissance Macro Research, citing data from the Bureau of Economic Analysis.

https://www.bloodinthemachine.com/p/the-ai-bubble-is-so-big-...

> Two Nvidia customers made up 39% of Nvidia’s revenue in its July quarter, the company revealed in a financial filing on Wednesday, raising concerns about the concentration of the chipmaker’s clientele.

https://www.cnbc.com/2025/08/28/nvidias-top-two-mystery-cust...

cmckn

> He's the one who will get fired.

I wouldn’t count on it.

theandrewbailey

He will fire you before he gets fired.

rootusrootus

In some cases, I'm sure it would play that way. But I've been on both sides, and most places I've worked have been more reluctant to fire engineers than managers.

InsideOutSanta

Here's a new keyboard. I've cut all your estimations by five percent; surely you can type much faster with this.

y1n0

> That's insane. Who the hell pulls a number out of their ass and declares it the new reality?

Chatgpt.

klodolph

I think ChatGPT isn’t the “who”, it’s just the ass that people are pulling numbers out of. A big ole extra butt you graft onto your body.

kunley

Them managers have always been pulling a number out of their ass.

Seattle3503

> My manager told me that the time to deliver my latest project was cut to 20% of the original estimate because we are "an AI-first company".

If we can delegate incident response to automated LLMs too, sure, why not. Let the CEO have his way and pay the reputational price. When it doesn't work, we can revert our git repos to the day LLMs didn't write all the code.

I'm only being 90% facetious.

bitwize

The CEO doesn't care. He'll fail upwards. Be able to spin it as increasing shareholder value by cutting costs and bounce before the chickens come home to roost.

vorpalhex

I agree with you and I'm being 0% facetious.

I think making stakeholders have to engage with these models is the most critical point for people having deadlines or expectations based on them.

Let Claude run incident response for a few weeks. I'll gladly pause pagerduty for myself.

rglover

> My manager told me that the time to deliver my latest project was cut to 20% of the original estimate because we are "an AI-first company".

Lord, forgive them, they know not what they do.

leoc

I think Chuck Prince's "As long as the music is playing, you've got to get up and dance. We're still dancing." from the GFC https://www.reuters.com/article/markets/funds/ex-citi-ceo-de... is the more relevant famous line here.

coffeemug

I haven’t heard this before, this is incredible. Thanks for sharing. There were a bunch of phenomena that didn’t quite make sense to me before, which make perfect sense now that I read the quote.

crabmusket

That quote was the inspiration for one of my favourite bits in the Lehman Trilogy, "the twist". There's a glimpse of it in the trailer here https://youtu.be/Lo4VC43h7ts?si=ebl9WwK2NIgW0sHD&t=49

"Bobby Lehman is ninety three years old and he dances the twist. He is 100 years old! 120! Maybe 140! He dances like a madman!"

bsder

Do not forgive them. We already have a description for them:

"A bunch of mindless jerks who'll be the first against the wall when the revolution comes."

o11c

Remember, the origin of that quote explicitly specifies "marketing department".

The thing about hype cycles (including AI) is that the marketing department manages to convince the purchases to do their job for them.

atleastoptimal

I think this hits at the heart of why you and so many people on HN hate AI.

You see yourselves as the disenfranchised proletariats of tech, crusading righteously against AI companies and myopic, trend-chasing managers, resentful of their apparent success at replacing your hard-earned skill with an API call.

It’s an emotional argument, born of tribalism. I’d find it easier to believe many claims on this site that AI is all a big scam and such if it weren’t so obvious that this underlies your very motivated reasoning. It is a big mirage of angst that causes people on here to clamor with perfunctory praise around every blog post claiming that AI companies are unprofitable, AI is useless, etc.

Think about why you believe the things you believe. Are you motivated by reason, or resentment?

RichardCA

Ironically, I would far prefer the Douglas Adams idea of "Genuine People Personalities" over the current status quo.

If the self checkout scanner at the supermarket started bickering with me for entering the wrong produce code, that would wrap up the whole Turing Test thing for me.

herpdyderp

Oh, they for sure know what they're doing.

DrillShopper

Counterpoint: fuck them, they know exactly what they do (try to extract more work for the exact same pay out of their subordinates)

xbmcuser

For me this is the biggest disconnect the current level of AI/level is not good enough for replacing devs but is good enough to automate a lot of office work ie managers that would have cost to much time and effort to automate before. I think google seems to understand this a bit as they have replaced a lot of middle management because of AI and not as many developers.

insane_dreamer

also customer service; I was at my dental office today and there were 3 people handling checkin/checkout. I'm quite confident 80% of their workload could be automated away to where you would just need a single person to handle edge cases. That's where we're going to see a lot of entry-level jobs go away, in many domains.

phatskat

> That's where we're going to see a lot of entry-level jobs go away, in many domains.

And to me this is worse news. People in higher paying jobs are the ones that would hurt the economic fabric more, but by that token they’d have more power and influence to ensure a better safety net for the inevitable rise of AI and automation in much of the workforce.

Entry level workers can’t afford to not work, they can’t afford to protest or advocate, they can’t afford the future that AI is bringing closer to their doorsteps. Without that safety net, they’ll be struggling and impoverished. And then will everyone in the higher paying positions help, or will we ignore the problem until AI actually is capable of replacing us, and will it be too late by then?

vkou

I'd like to see those SVPs and PMs, or shit, even a line manager use AI to implement something as simple as a 2-month intern project[1] in a week.

---

[1] We generally budget about half an intern's time for finding the coffee machine, learning how to show up to work on time, going on a fun event with the other interns to play minigolf, discovering that unit tests exist, etc, etc.

elevatortrim

I actually built something (a time tracking tool that helps developers log their time consistently on jira and harvest) that most developers in my company use in under a week.

I have backend development background so I was able to review the BE code and fix some bugs. But I did not bother learning Jira and Harvest API specs at all, AI (cursor+sonnet 4) figured it out all.

I would not be able to write the front-end of this. It is JS based and updates the UI based on real-time http requests (forgot the name of this technology, the new ajax that is) and I do not have time to learn it but again, I was able to tweak what AI generated and make it work.

Not only AI helped me do something in much shorter than it would take, it enabled me do something that otherwise would not be possible.

panarchy

I'd rather see those SVPs, PMs, and line managers be turned into AI.

curvaturearth

This is the way

sotix

At my company, the tech leaders aren't doing it out of mass hysteria. They're very smart individuals. The push is coming from our investors that come from the ring of classic YC-affiliated VCs. My friend who runs a YC-backed company has been told to do it by his investors too. It's a coordinated effort by external investors rather than a mass panic by individual tech leaders. If you read VC investor literature, it's full of incredible claims about how companies who don't use AI will be left behind. The exact type of stuff you'd expect to hear from groups who aim to hit the lottery with a few of their investments.

com2kid

Multiple things can be true at the same time:

1. LLMs do not increase general developer productivity by 10x across the board for general purpose tasks selected at random.

2. LLMs dramatically increases productivity for a limited subset of tasks

3. LLMs can be automated to do busy work and although they may take longer in terms of clock time than a human, the work is effectively done in the background.

LLMs can get me up to speed on new APIs and libraries far faster than I can myself, a gigantic speedup. If I need to write a small bit of glue code in a language I do not know, LLMs not only save me time, but they make it so I don't have to learn something that I'll likely never use again.

Fixing up existing large code bases? Productivity is at best a wash.

Setting up a scaffolding for a new website? LLMs are amazing at it.

Writing mocks for classes? LLMs know the details of using mock libraries really well and can get it done far faster than I can, especially since writing complex mocks is something I do a couple times a year and completely forget how to do in-between the rare times I am doing it.

Navigating a new code base? LLMs are ~70% great at this. If you've ever opened up an over-engineered WTF project, just finding where HTTP routes are defined at can be a problem. "Yo, Claude, where are the route endpoints in this project defined at? Where do the dependency injected functions for auth live?"

Right tool, right job. Stop using a hammer on nails.

heavyset_go

> LLMs can get me up to speed on new APIs and libraries far faster than I can myself, a gigantic speedup. If I need to write a small bit of glue code in a language I do not know, LLMs not only save me time, but they make it so I don't have to learn something that I'll likely never use again.

I wax and wane on this one.

I've had the same feelings, but too often I've peaked behind the curtain, read the docs and got familiar with external dependencies and then realize whatever the LLM responds with paradoxically either wasn't following convention or tried to shoehorn your problem to fit code examples found online, used features inappropriately, took a long roundabout path to do something that can be done simply, etc.

It can feel like magic until you look too closely at it, and I worry that it'll make me complacent with the feeling of understanding without actually taking away an understanding.

SchemaLoad

Yeah LLMs get me _an_ answer far faster than I could find it myself, but it's often not correct. And then I have to verify it myself which was exactly the work I was trying to skip by using the LLM to start with.

If I have to manually verify every answer, I may as well read the docs myself.

emodendroket

Is it really that different from scrolling through Stack Overflow answers and rejecting the ones that aren't suitable? A lot of times you can tell it what specifically you didn't like about the solution and get another crack anyway (e.g., "let's iterate over the characters to do this rather than using a regex")

Onawa

It doesn't completely solve this problem but definitely helps to have something like context7 MCP server running that `Copilot et al.` can reach dramatically reduces hallucinations for most tools. And additionally I've used Continue.dev VSCode along with manually specified docs and guides that you can selectively inject into your context. Both of those tactics make a huge difference in answer quality.

nicklaf

Personally, I don't trust LLMs to write code for me, generally speaking. That said, as of late I've been very pleased with the whole "shoehorn your problem to fit code examples found online" thing these LLMs do, in the very special case of massaging unix scripts, and where the "code examples found online" part seems to mostly amount to making fairly canonical reference to features documented in man pages that are plastered all over the web and haven't changed much in decades.

For questions that I know should have a straightforward answer, I think it beats searching Stackoverflow. Sure, I'll typically end up having to rewrite most of the script from scratch; however, if I give it a crude starting point of a half-functional script I've already got going, pairing that with very clear instructions on how I'd like it extended is usually enough to get it to write a proof of concept demonstration that contains enough insightful suggestions for me to spend some time reading about features in man pages I hadn't yet thought to use.

The biggest problem maybe is a propensity for these models to stick in every last fancy feature under the sun. It's fun to read about a GNU extension to awk that makes my script a couple lines shorter, but at best I'll take this as an educational aside than something I'd accept at the expense of portability.

culopatin

Before accepting an answer I’ve started asking “is there a simpler more straight forward way of achieving that?” Ans most of the time it changes the whole thing it just wrote lol

mvdtnz

> LLMs can be automated to do busy work and although they may take longer in terms of clock time than a human, the work is effectively done in the background.

What is this supposed busy work that can be done in the background unsupervised?

I think it's about time for the AI pushers to be absolutely clear about the actual specific tasks they are having success with. We're all getting a bit tired of the vagueness and hand waving.

Kiro

No, you've got it backwards. If anything, people are getting tired of comments like yours.

undefined

[deleted]

iainctduncan

HN votes say otherwise, lol

dbalatero

Nope!

thegrim33

"LLMs can get me up to speed on new APIs and libraries far faster than I can myself, a gigantic speedup"

Just a random personal anecdote I wanted to throw out. I recently had to build some custom UI with Qt. I hadn't worked with Qt in a decade and barely remembered it. Seems like a perfect use case for AI to get me "up to speed" on the library, right? It's an incredibly well documented library with lots written on it, perfect fodder for an AI to process.

So, I gave it a good description of the widget I was trying to make, what I needed it to look like and how it should be behave, and behold, it spit out the specific widget subclass I should use and how I should be overriding certain methods to customize behavior. Wow, it worked exactly like promised.

So I implemented it like it suggested and was seemingly happy with the results. Went on with working on other parts of the project, dealing with Qt more and more here and there, gaining more and more experience with Qt over time.

A month or two later, after gaining more experience, I looked back at what AI had told me was the right approach on that widget and realized it was completely messed up. It had me subclassing the completely wrong type of widget. I didn't need to override methods and write code to force it to behave the way I wanted. I could instead just make use of a completely different widget that literally supported everything I needed already. I could just call a couple methods on it to customize it. My new version removes 80% of the code that AI had me write, and is simpler, more idiomatic, and actually makes more sense now.

So yeah, now any time I see people write about how "well, it's good for learning new libraries or new languages", I'll have that in the back of my mind. If you don't already know the library/language, you have zero idea whether what the AI teaching you is horrible or not. Whether there's a "right/better" way or not. You think it's helping you out when really you're likely just writing horrible code.

rurp

Just recently I was having trouble getting something to work with a large well documented framework library. Turned out the solution I needed was to switch to a similar but different API. But that's not what Claude told me. Instead it wanted me to override and rewrite a bunch of core library code. Fortunately I was able to recognize that the suggested solution was almost certainly bad and did some more digging to find the right answer, but I could easily see nightmarish code that solves immediate problems in terrible ways piling up fast in a vibe coded project.

I do find LLMs useful at times when working in unfamiliar areas, but there are a lot of pitfalls and newly created risks that come with it. I mostly work on large existing code bases and LLMs have very much been a mildly useful tool, still nice to have, but hardly the 100x productivity booster a lot of people are claiming.

komali2

This keeps happening to me. I keep coming across big files written during my Cursor hype period from 3 months ago and finding huge non DRY chunks and genuinely useless nonsense. Yes, I should have reviewed better, but it's a lot to wade through and it ostensibly "worked," as in, the UI looked as it should.

Sammi

Today I asked Claude how to ignore Typescript type checking in some vendored js files in my project. It churned on this and ended up turning off type checking on all js files in my project and proudly declaring it a great success because the errors were gone. Hurray. If I knew nothing about my project then I would be none the wiser.

ksenzee

> Stop using a hammer on nails.

sorry, what am I supposed to use on nails?

falcor84

Nail polish remover

dsign

Cursor, is that you?

sethammons

I think it was a typo and should have been "Stop using a hammer on screws," suggesting tool / application mismatch.

DonHopkins

PHP of course!

https://blog.codinghorror.com/the-php-singularity/

svieira

This glass bottle

alexchamberlain

> Setting up a scaffolding for a new website? LLMs are amazing at it.

Weren't the code generators before this even better though? They generated consistent results and were dead quick at doing it.

camdenreslink

And they were frequently in public repos that were updated with people filing issues if necessary.

socalgal2

Would it more correct to change this

> LLMs can get me up to speed on new APIs and libraries far faster than I can myself

To this?

> LLMs can get me up to speed on old APIs and old libraries that are new to me far faster than I can myself

My experience is if the library/API/tool is new then the LLM can't help. But maybe I'm using it wrong.

retreatguru

An MCP server called Context7 excels at providing up to date api/library documentation for LLMs.

iLoveOncall

> Setting up a scaffolding for a new website? LLMs are amazing at it.

So amazing that every single stat showed by the author in the article has been flat at best, despite all being based on new development rather than work on existing code-bases.

daxfohl

Maybe the world has run out of interesting websites to create. That they are created faster doesn't necessarily imply they'll be created more frequently.

daxfohl

Of course if that's the case (and it well may be), then THAT is the reason for tech layoffs. Not AI. If anything, it means AI came too late.

coffeebeqn

AI still fails to extrapolate. It can interpolate between things it’s trained on but that’s not exactly a new interesting product. If it truly could extrapolate at human-ish levels we would actually maybe have 10x more games and websites and whatnot

3uler

Working with LLMs has fundamentally changed how I approach documentation and development.

Traditional documentation has always been a challenge for me - figuring out where to start, what syntax conventions are being used, how pieces connect together. Good docs are notoriously hard to write, and even harder to navigate. But now, being able to query an LLM about specific tasks and get direct references to the relevant documentation sections has been a game-changer.

This realization led me to flip my approach entirely. I’ve started heavily documenting my own development process in markdown files - not for humans, but specifically for LLMs to consume. The key insight is thinking of LLMs as amnesiac junior engineers: they’re capable, but they need to be taught what to do every single time. Success comes from getting the right context into them.

Learning how to craft that context is becoming the critical skill.

It’s not about prompting tricks - it’s about building systematic ways to feed LLMs the information they need.

I’ve built up a library of commands and agents for my Claude Code installation inspired by AgentOS (https://github.com/buildermethods/agent-os) to help engineer the required context.

The tool is a stochastic parrot, you need to feed it the right context to get the right answer. It is very good at what it does but you need to use it to its strengths in order to get value from it.

I find people complaining about LLMs often expect vibe coding to be this magic tool that will build the app for you without thinking, which it unfortunately has been sold as, but the reality is more of a fancy prompt based IDE.

rglover

Most of it doesn't exist beyond videos of code spraying onto a screen alongside a claim that "juniors are dead."

I think the "why" for this is that the stakes are high. The economy is trembling. Tech jobs are evaporating. There's a high anxiety around AI being a savior, and so, a demi-religion is forming among the crowd that needs AI to be able to replace developers/competency.

That said: I personally have gotten impressive results with AI, but you still need to know what you're doing. Most people don't (beyond the beginner -> intermediate range), and so, it's no surprise that they're flooding social media with exaggerated claims.

If you didn't have a superpower before AI (writing code), then having that superpower as a perceived equalizer is something that you will deploy all resources (material, psychological, etc) to ensuring that everyone else maintain the position that 1) superpower good, 2) superpower cannot go away 3) the superpower being fallible should be ignored.

Like any other hype cycle, these people will flush out, the midpoint will be discovered, and we'll patiently await the next excuse to incinerate billions of dollars.

SchemaLoad

At least in my experience, it excels in blank canvas projects. Where you've got nothing and want something pretty basic. The tools can probably set up a fresh React project faster than me. But at least every time I've tried them on an actual work repo they get reduced to almost useless.

Which is why they generate so much hype. They are perfect for tech demos, then management wonders why they aren't seeing results in the real world.

tomrod

Exactly. It quickly builds a lot of technical debt that must be paid down, especially for people writing code in areas they aren't deep in.

For tight tasks it can be super helpful -- like for me, an AI/Data Science guy, setting up a basic reverse proxy. But I do so with a ton of scrutiny -- pushing it, searching on Kagi or docs to at least confirm the code, etc. This is helpful because I don't have a mental map about reverse proxy -- so it can help fill in gaps but only with a lot of reticence.

That type of use really doesn't justify the billion dollar valuations of any companies, IMO.

ethanwillis

What do you mean by you don't have a mental map about a reverse proxy?

caro_kann

Even scaffolding a new project is not easy work, especially a new stack or new versions of existing tools. For example, I have never been able to create a Vue 3 project with Vite and Tailwind setup correctly. I tried top SOTA models. Maybe my prompting skills are not good, but everytime it fails to set up a project correctly. Everytime it gives me some old configurations that's not relevant anymore.

lbreakjai

LLMs are probably the worst tool for the job. Code generators have been a thing forever. Why use a LLM when you can do "npm create vite@latest my-vue-app -- --template vue" ?

Rapzid

Why though? Vite supplies a project scaffolder and lists a one liner under getting started;

ie pnpm create vite

Tailwind is similarly a one liner to initialize(might be a vite create option now).

Edit: My bad, you are talking about the LLMs! I'm always surprised how still for past years, even though we have great projects scalfolding across the node verse, people are still complaining about how hard setting up projects is..

rkozik1989

The reason LLMs suck in plenty of brownfield projects is because those codebases likely either implemented frameworks in a proprietary way, maybe did not rely on any public framework at all, or were in general done in an esoteric way and therefor few (if any) similar codebases exist within the LLMs training data. Which is problematic because LLMs aren't capable of reasoning or learning they're literally just predicting the next most likely token in a chain similarly to how autocomplete works. Without you supplying additional context and explicitly defining guardrails for preforming common tasks the LLM has no frame of reference for working with your codebase.

herpdyderp

I've had great success with GPT5 in existing projects because its agent mode is very good (the best I've seen so far) at analyzing the existing codebase and then writing code that feels like it fits in already (without prompt engineering on my part). I still agree that AI is particularly good on fresh projects though.

SchemaLoad

Could be that there is a huge difference in the products. Last few companies have given me Github Copilot which I find entirely useless, I found the automatic suggestions more distracting than useful, and the fix and explain functions never work. But maybe if you burn $1000/day on Claude Code it works a lot better. And then companies see the results from that and wonder why they aren't getting it spending a couple of dollars on Copilot.

empath75

I actually completely disagree with this, and IMO it works best with projects that are templated with AI development in mind. Lots of documentation and comments, working tests, etc.

You want as much context as possible _right in the code_.

dmonitor

By "knowing what you're doing" do you mean "have enough experience to it by hand", "have experience with a specific AI tool and its limitations" or a combination?

devjab

You don't need sotware engineering to build successful software, until you do.

In my experience you don't need to know a whole lot about LLM's to work them. You need to know that everything they spit out is potential garbage, and if you can't tell the good from the garbage then whatever you're using them for is going to be terrible. In terms of software terrible is fine for quite a lot of systems. One of the first things I build out of university in the previous millennium is still in production today and it's horrible. It's inefficient, horribly outdated since it hasn't been updated ever. It runs 10 times a day and at least 1 of them will need to automatically restart itself because it failed. It's done it's job without the need for human intervention for the past many decades though. I know because one of my old colleagues still works there. It could've been improved, but the inefficiency cost over all those years is probably worth about two human hours, and it would likely take quite a while to change it. A lot of software is like that, though a lot of it doesn't live for so long. LLM's can absolutely blast that sort of thing. It's when the inefficiency cost isn't less than a few human hours that LLM's become a liability if you don't know how to do the engineering.

I use LLM's to write a lot of the infrastructure as code we use today. I can do that because I know exactly how that should be engineered. What the LLM can do that I can't, is that it can spit out the k8s yaml for an ingress point with 200 lines of port settings in a couple of seconds. I've yet to have it fail, probably because those configurations are basically all the same depending on the service. What a LLM can't do, however, is write the entire yaml config.

Similarily it can build you a virtual network with subnets in bicep based on a couple of lines of text with address prefixes. At the sametime it couldn't build you a reasonable vnet with subnets if you asked it to do it from scractch. That doesn't mean it can't build you one that works though, it's just that you're likely going to claim 65534 ip addresses for a service which uses three.

rglover

The first one.

fennecbutt

I mean the truth should be fairly obvious to people given a lot of the talk around AI stuff rings very much like the ifls/mainstream media style "science" articles which always make some outrageous "right around the corner" claim based off some small tidbit out of a paper they only skimmed the abstract of.

captainkrtek

This tracks with my own experience as well. I’ve found it useful in some trivial ways (eg: small refactors, type definition from a schema, etc.) but so far tasks more than that it misses things and requires rework, etc. The future may make me eat my words though.

On the other hand, I’ve lately seen it misused by less experienced engineers trying to implement bigger features who eagerly accept all it churns out as “good” without realizing the code it produced:

- doesn’t follow our existing style guide and patterns.

- implements some logic from scratch where there certainly is more than one suitable library, making this code we now own.

- is some behemoth of a PR trying to do all the things.

nicce

> implements some logic from scratch where there certainly is more than one suitable library, making this code we now own - is some behemoth of a PR trying to do all the things

Depending on the amount of code, I see this only as positive? Too often people pull huge libraries for 50 lines of code.

captainkrtek

I'm not talking about generating a few lines instead of importing left-pad. In recent PRs I've had:

- Implementing a scheduler from scratch (hundreds of lines), when there are many many libraries for this in Go.

- Implementing some complex configuration store that is safe for concurrent access , using generics, reflection, and a whole other host of stuff (additionally hundreds of lines plus more for tests).

While I can't say any of the code is bad, it is effectively like importing a library which your team now owns, but worse in that no one really understands it or supports it.

Lastly, I could find libraries that are well supported, documented, and active for each of these use-cases fairly quickly.

davidcelis

Someone vibe coded a PR on my team where there were hundreds of lines doing complex validation of an uploaded CSV file (which we only expected to have two columns) instead of just relying on Ruby's built-in CSV library (i.e. `CSV.parse` would have done everything the AI produced)

daxfohl

And that may be where the discrepancy comes in. You feel fast because, whoa I created this whole scheduler in ten seconds! But the you also have to spend an hour code reviewing that scheduler, which, still it feels fast to have a good working scheduler in such a short time. But without AI, maybe it feels slow to find and integrate with some existing scheduling library, but in wall clock time it was the same.

heavyset_go

Yes, for leftpad-like libraries it's fine, but does your URL or email validation function really handle all valid and invalid cases correctly now and into the future, for example?

nicce

There are good use cases and bad cases. Is a standard regex library better with known good pattern for email validation than some 3rd party library without regex until you benchmark them yourself? Or if you pull parser library, but parse only single type in a single way. There isn’t single truth but usually I see that the external library is included too easily.

adelie

i've seen this fairly often with internal libraries as well - a recent AI-assisted PR i reviewed included a complete reimplementation of our metrics collector interface.

suspect this happened because the reimplementation contained a number of standard/expected methods that we didn't have in our existing interface (because we didn't need them), so it was considered 'different' enough. but none of the code actually used those methods (because we didn't need them), so all this PR did was add a few hundred lines of cognitive overhead.

captainkrtek

I’ve seen this as well as PR feedback to authors of AI assisted PRs: “hey we already have a db driver and interface we’re using for this operation, why did you write this?”

mcny

> Too often people pull huge libraries for 50 lines of code.

I used to be one of those people. It just made sense to me when I was (I still am to some extent) more naïve than I am today. But then I also used to think "it makes sense for everyone to eat together at a community kitchen of some sort instead of cooking at home because it saves everyone time and money" but that's another tangent for another day. The reason I bring it up is I used to think if it is shared functionality and it is a small enough domain, there is no need for everyone to spend time to implement the same idea a hundred times. It will save time and effort if we pool it together into one repository of a small library.

Except reality is never that simple. Just like that community kitchen, if everyone decided to eat the same nutritious meal together, we would definitely save time and money but people don't like living in what is basically an open air prison.

codebje

Also there are people occasionally poisoning the community pot, don't forget that bit.

fennecbutt

Granted, _discovery_ of such things is something I'm still trying to solve at my own job and potentially llms can at least be leveraged to analyse and search code(bases) rather than just write it.

It's difficult because you need team members to be able to work quite independently but knowledge of internal libraries can get so siloed.

captainkrtek

I do think the discovery piece is hugely valuable. I’m fairly capable with grep and ag, but asking Claude where something is in my codebase is very handy.

skydhash

I've always gone from entry point of the code (with a lot of assumptions) and then do a deep dive of one of the module or branches. After a while you develop an intuition where code may be (or follow the import/include statement).

I've explored code like FreeBSD, Busybox, Laravel, Gnome, Blender,... and it's quite easy to find you way around.

lumost

The experience in green field development is very different. In the early days of a project, the LLMs opinion is about as good as the individuals starting the project. The coding standards and other items have not yet been established. The buggy/half nonsense code means that the project is still demo able. Being able to explore 5 projects to demo status instead of 1 is a major boost.

jryio

I completely agree with the thesis here. I also have not seen a massive productivity boost with the use of AI.

I think that there will be neurological fatigue occurring whereby if software engineers are not actively practicing problem-solving, discernment, and translation into computer code - those skills will atrophy...

Yee, AI is not the 2x or 10x technology of the future ™ is was promised to be. It may the case that any productivity boost is happening within existing private code bases. Even still, there should be a modest uptick in noticeably improved offer deployment in the market, which does not appear to be there.

In my consulting practice I am seeing this phenomenon regularly, wereby new founders or stir crazy CTOs push the use of AI and ultimately find that they're spending more time wrangling a spastic code base than they are building shared understanding and working together.

I have recently taken on advisory roles and retainers just to reinstill engineering best practices..

heavyset_go

> I think that there will be neurological fatigue occurring whereby if software engineers are not actively practicing problem-solving, discernment, and translation into computer code - those skills will atrophy...

I've found this to be the case with most (if not all) skills, even riding a bike. Sure, you don't forget how to ride it, but your ability to expertly articulate with the bike in a synergistic and tool-like way atrophies.

If that's the case with engineering, and I believe it to be, it should serve as a real warning.

jryio

Yes and this is the placid version where lazy programmers elect to lighten their cognitive load by farming out to AI.

An insidious version is AGI replacing human cognition.

To replace human thought is to replace a biological ability which progresses on evolutionary timescales - not a Moore's law approximate curve. The issue in your skull will quite literally be as useful as a cow's for solving problems... think about that.

Automating labor in the 20th century disrupts society and we've see its consequences. Replacing cognition entirely: driving, writing, decision making, and communication; yields far worse outcomes than transitioning the population from food production to knowledge work.

If not our bodies and not our minds, then what do we have? (Note: Altman's universal basic income ought to trip every dystopian alarm bell).

Whether adopted passivity or foisted actively - cognition is what makes us human. Let's not let Claude Code be the nexus for something worse.

card_zero

There's no connection between AI and AGI, apart from hopes. Besides which, if you're talking about AGI, you're talking about artificial people. That means:

• They don't really want to be servants.

• They have biases and preferences.

• Some of them are stupid.

• If you'd like to own an AGI that thinks for you, the AGI would also like one.

• They are people with cognition, even if we stop being.

cmsj

AGI isn't going to come from Transformer LLMs. They are Statistical Turks.

talldrinkofwhat

The author of the article had an interesting solution to this. Flip a coin to see who implements the feature.

Heads you code. Tails you review.

coffeebeqn

Same - I use it at work at a big tech company and the real world efficiency gains on net are probably nonexistent. We have multiple large and not so large codebases. In a super trivial script or creating a struct from documentation it does the thing - great. For unit tests it’s about 50-50 if it’s useful or if I waste a few hours and delete the change set. In any moderately complex codebase Claude Sonnet or GPT in agent mode builds unneeded complexity, gets lost in a spiraling amount of nonsense steps, builds things that already exist in the codebase constantly. The best outcome I have to edit and review so heavily it’s like I’m jumping in on someone else’s PR halfway and have to grok what the heck did they misunderstand.

The only actually net positive is the Claude.md that some people maintain - it’s actually a good context dump for new engineers!

wrs

This makes some sense. We have CEOs saying they're not hiring developers because AI makes their existing ones 10X more productive. If that productivity enhancement was real, wouldn't they be trying to hire all the developers? If you're getting 10X the productivity for the same investment, wouldn't you pour cash into that engine like crazy?

Perhaps these graphs show that management is indeed so finely tuned that they've managed to apply the AI revolution to keep productivity exactly flat while reducing expenses.

heavyset_go

As the rate of profit drops, value needs to be squeezed out of somewhere and that will come from the hiring/firing and compensation of labor, hence a strong bias towards that outcome.

99% of the draw of AI is cutting labor costs, and hiring goes against that.

That said, I don't believe AI productivity claims, just pointing out a factor that could theoretically contribute to your hypothetical.

wrs

Maybe if you have a business where the need for software is a constant, so it’s great to get it for 90% off. (It’s not clear what business that is in 2025, maybe a small plumbing contractor?)

But if your business is making software it’s hard to argue you only need a constant amount of software. I’ve certainly never worked at a software company where the to-do list was constant or shrinking!

typewithrhythm

If you expect input cost for something that's mostly labour to go dramatically down, then you also fear the value of your product crashing.

Culonavirus

At the end of the day economy is king. Nothing else matters. But economy is not something you can plan and predict (shout out to my communist friends), it is a chaotic system full of emergent elements, chance-based actors and third-order effects and so it usually takes years for trends and patterns to emerge. All I'm going to say is that unless AI keeps improving exponentially (and there's definitely an argument to be made that it's not already) there is going to be a hell to pay a few years down the road.

I use Grok, Claude and Gemini every day, these "tools" are very useful to me (in the sense of how google and wikipedia changed the game) and I watch the LLM space closely, but what I'm seeing in terms of relative improvement is far removed from all the promises of the CEOs of these companies... Like, Grok 4 was supposed to be "close to AGI" but compared to Grok 3 it's just a small incremental improvement and the same goes for others...

moduspol

A lot of these C-suite people also expect the remaining ones to be replaced by AI. They subscribe to the hockey-stick "AGI is around the corner" narrative.

I don't, but at least it is somewhat logical. If you truly believe that, you wouldn't necessarily want to hire more developers.

wrs

Or CEOs.

tempodox

If everyone can just generate the software they need, why would they pay anyone else to do it for them? If code generation were that good, software companies would die en masse.

quantumcotton

Today you will learn what diminishing returns are :)

You can only utilize so many people or so much action within a business or idea.

Essentially it's throwing more stupid at a problem.

The reason there are so many layoffs is because of AI creating efficiency. The thing that people don't realize is it's not that one AI robot or GPU is going to replace one human at a one to one ratio. It's going to replace the amount of workload one person can do. Which in turn gets rid of one human employee. It's not that you job isn't taken by AI. It's started. But how much human is needed is where the new supply demand lies and how long the job lasts. There will always be more need for more creative minds. The issue is we are lacking them.

It's incredible how many software engineers I see walking around without jobs. Looking for a job making $100,000 to $200,000 a year. Meanwhile, they have no idea how much money they could save a business. Their creativity was killed by school.

They are relying on somebody to tell them what to do and when nobody's around to tell anybody what to do. They all get stuck. What you are seeing isn't a lack of capability. It's a lack of ability to control direction or create an idea worth following.

Nextgrid

I disagree that layoffs are because of AI-mediated productivity improvements.

The layoffs are primarily due to over-hiring during the pandemic and even earlier during the zero-interest-rate period.

AI is used as a convenient excuse to execute layoffs without appearing in a bad position to the eyes of investors. Whether any code is actually generated by AI or not is irrelevant (and since it’s hard to tell either way, nobody will be able to prove anything and the narrative will keep being adjusted as necessary).

heavyset_go

Bootstrapping is a lot easier when you have your family's or someone else's money to start a business and then fall back on if it doesn't pan out.

The reason people take jobs comes down to economics, not "creativity".

mattmanser

The reason there were so many layoffs is because cheap money dried up.

Nothing to do with AI.

Interest rates are still relatively high.

cmsj

It might be more correct to say that interest rates are relatively normal, historically. Post-2008 we had a long period of abnormally low interest rates.

throwaway13337

Great angle to look at the releases of new software. I, too, thought we'd see a huge increase by now.

An alternative theory is that writing code was never the bottleneck of releasing software. The exploration of what it is you're building and getting it on a platform takes time and effort.

On the other hand, yeah, it's really easy to 'hold it wrong' with AI tools. Sometimes I have a great day and think I've figured it out. And then the next day, I realize that I'm still holding it wrong in some other way.

It is philosophically interesting that it is so hard to understand what makes building software products hard. And how to make it more productive. I can build software for 20 years and still feel like I don't really know.

bwfan123

> writing code was never the bottleneck

This is an insightful observation.

When working on anything, I am asked: what is the smallest "hard" problem that this is solving ? ie, in software, value is added by solving "hard" problems - not by solving easy problems. Another way to put it is: hard problems are those that are not "templated" ie, solved elsewhere and only need to be copied.

LLMs are allowing the easy problems to be solved faster. But the real bottleneck is in solving the hard problems - and hard problems could be "hard" due to technical reasons, or business reasons or customer-adoption reasons. Hard problems are where value lies particularly when everyone has access to this tool, and everyone can equally well create or copy something using it.

In my experience, LLMs have not yet made a dent in solving the hard problems because, they dont really have a theory of how something really works. On the other hand, they have really helped boost productivity for tasks that are templated .

prime_ursid

One of the rebuttals at the end of the post addresses this.

> That’s only true when you’re in a large corporation. When you’re by yourself, when you’re the stakeholder as well as the developer, you’re not in meetings. You're telling me that people aren’t shipping anything solo anymore? That people aren’t shipping new GitHub projects that scratch a personal itch? How does software creation not involve code?

So if you’re saying “LLMs do speed up coding, but that was never the bottleneck,” then the author is saying, “it’s sometimes the bottleneck. E.g., personal projects”

balder1991

Also when vou create a product you can’t speed up the iterative process of seeing how users want it, fixing edge cases that you only realized later etc. these are the things that make a product good and why there’s that article about software taking 10 years to mature: https://www.joelonsoftware.com/2001/07/21/good-software-take...

Nextgrid

This is the answer. Programming was never the bottleneck in delivering software, whether free-range, organic, grass-fed human-generated code or AI-assisted.

AI is just a convenient excuse to lay off many rounds of over-hiring while also keeping the door open for potential investors to throw more money into the incinerator since the company is now “AI-first”.

zahlman

The point was that "programming" is far more than just "writing code".

coffeebeqn

Just like writing Lord of the Rings is actually not just about typing. You have to live a life, go to war, think deeply for years, research languages and cultures and then one day you type all that out

searls

The answer is that we're making it right now. AI didn't speed me up at all until agents got good enough, which was April/May of this year.

Just today I built a shovelware CLI that exports iMessage archives into a standalone website export. Would have taken me weeks. I'll probably have it out as a homebrew formula in a day or two.

I'm working on an iOS app as well that's MUCH further along than it would be if I hand-rolled it, but I'm intentionally taking my time with it.

Anyway, the post's data mostly ends in March/April which is when generative AI started being useful for coding at all (and I've had Copilot enabled since Nov 2022)

sumeno

It's amazing how whenever criticisms pop up the responses for the last 3 years have been "well you aren't using <insert latest>, it's finally good!"

shepherdjerred

isn't this likely to be the case when a field is developing quickly and there are a large number of people who have different opinions on the subject?

e.g. I liked GitHub Copilot but didn't find it to be a game changer. I tried Cursor this year and started to see how AI can be today.

medvezhenok

Indeed. The LLMs have been pretty useful for greenfield projects & one off scripts for a while, but GPT-5 was the first time I've found a model to be quite helpful on large-scale legacy code (>1M LOC).

anp

FWIW this closely matches my experience. I’m pretty late to the AI hype train but my opinion changed specifically because of using combinations of models & tools that released right before the cut off date for the data here. My impression from friends is that it’s taken even longer for many companies to decide they’re OK with these tools being used at all, so I would expect a lot of hysteresis on outputs from that kind of adoption.

That said I’ve had similar misgivings about the METR study and I’m eager for there to be more aggregate study of the productivity outcomes.

dash2

Yeah, I released a new version of a little open source project based almost entirely on vibe-coding with Claude/Codex. It was more fun than bashing out my own code, and despite all the problems others have mentioned (ignored instructions, not using libraries, etc.), it was probably faster than if I'd added the new features myself.

philipwhiuk

> was probably faster

That sure doesn't sound like 10x

mvdtnz

> AI didn't speed me up at all until agents got good enough, which was April/May of this year.

That was 5 months ago, which is 6 years in 10x time.

furyofantares

> That was 5 months ago, which is 6 years in 10x time.

That's some pretty bad math.

But yes, it isn't making software get made 10x faster. Feel free to blow that straw man down (or hype influencer, same thing.)

mildweed

Interested in this Homebrew. Share when ready?

noidesto

Agreed. Agentic AI is a completely different tool than “traditional” AI.

Im curious what the author’s data and experiment would look like a year from now.

m-hodges

This article reminds me of two recent observations by Paul Krugman about the internet:

"So, here’s labor productivity growth over the 25 years following each date on the horizontal axis [...] See the great productivity boom that followed the rise of the internet? Neither do I. [...] Maybe the key point is that nobody is arguing that the internet has been useless; surely, it has contributed to economic growth. The argument instead is that its benefits weren’t exceptionally large compared with those of earlier, less glamorous technologies."¹

"On the second, history suggests that large economic effects from A.I. will take longer to materialize than many people currently seem to expect [...] And even while it lasted, productivity growth during the I.T. boom was no higher than it was during the generation-long boom after World War II, which was notable in the fact that it didn’t seem to be driven by any radically new technology [...] That’s not to say that artificial intelligence won’t have huge economic impacts. But history suggests that they won’t come quickly. ChatGPT and whatever follows are probably an economic story for the 2030s, not for the next few years."²

¹ https://www.nytimes.com/2023/04/04/opinion/internet-economy....

² https://www.nytimes.com/2023/03/31/opinion/ai-chatgpt-jobs-e...

abathologist

My theory is that the digital revolution has mostly cancelled out potential productivity gains with its introduction of productivity sinks: the technology has tended to encourage less rigorous thinking, more distraction, more complexity; and even if you can do task T X times faster, most people as spending X * Y more time being distracted, overwhelmed, or just reflective button pushers.

The ways AI is being used now will make this a lot worse on every front.

stillsut

Got your shovelware right here...with receipts.

Background: I'm building a python package side project which allows you to encode/decode messages into LLM output.

Receipts: the tool I'm using creates a markdown that displays every prompt typed, and every solution generated, along with summaries of the code diffs. You can check it out here: https://github.com/sutt/innocuous/blob/master/docs/dev-summa...

Specific example: Actually used a leet-code style algorithms implementation of memo-ization for branching. This would have taken a couple of days to implement by hand, but it took about 20 minutes to write the spec and 20 minutes to review solutions and merge the solution generated. If you're curious you can see this diff generated here: https://github.com/sutt/innocuous/commit/cdabc98

Noumenon72

You should have used the word "steganography" in this description like you did in your readme, makes it 100% more clear what it does.

InCom-0

On one hand I don't understand what all the fuss is about. LLMs are great at all kinds of things around and about: searching for (good) information, summarizing existing text, conceptual discussions where it points you in the right directions very quickly, etc. ..... they are just not great (some might say harmful) at straight up non-trivial code generation or design of complex systems with the added peculiarity that on the surface the models seem almost capable to do it but never quite ... which is sort their central feature: producing text so that it is seems correct from statistical perspective, but without actual reasoning.

On the other hand, I do understand that the things the LLMs are really great at is not actually all that spectacular to monetize ... and so as a result we have all these snake oil salesmen on every corner boasting about nonsensical vibecoding achievements, because that's where the real money would be ... if it were really true ... but it is not.

larve

In case the author is reading this, I have the receipts on how there's a real step function in how much software I build, especially lately. I am not going to put any number on it because that makes no sense, but I certainly push a lot of code that reasonably seems to work.

The reason it doesn't show up online is that I mostly write software for myself and for work, with the primary goal of making things better, not faster. More tooling, better infra, better logging, more prototyping, more experimentation, more exploration.

Here's my opensource work: https://github.com/orgs/go-go-golems/repositories . These are not just one-offs (although there's plenty of those in the vibes/ and go-go-labs/ repositories), but long-lived codebases / frameworks that are building upon each other and have gone through many many iterations.

nerevarthelame

How are you sure it's increasing your productivity if it "makes no sense" to even quantify that? What are the receipts you have?

larve

I have linked my github above. I don't know how that fares in the bigger scope of things, but I went from 0 opensource to hundreds of tools and frameworks and libraries. Putting a number on "productivity" makes no sense to me, I would have no idea what that means.

I generate between 10-100k lines of code per day these days. But is that a measure of productivity? Not really...

sarchertech

>I generate between 10-100k lines of code per day these days.

That’s absolute nonsense.

coffeebeqn

Who’s reviewing 10-100k lines of code per day? This sounds like a slop nightmare

trenchpilgrim

Same. On many days 90% of my code output by lines is Claude generated and things that took me a day now take well under an hour.

Also, a good chunk of my personal OSS projects are AI assisted. You probably can't tell from looking at them, because I have strict style guides that suppress the "AI style", and I don't really talk about how I use AI in the READMEs. Do you also expect I mention that I used Intellisense and syntax highlighting too?

droidjj

The author’s main point is that there hasn’t been an uptick in total code shipped, as you would expect if people are 10x-ing their productivity. Whether folks admit to using AI in their workflow is irrelevant.

trenchpilgrim

The bottleneck on how much I ship has never been how fast I can write and deploy code :)

larve

Their main point is "AI coding claims don't add up", as shown by the amount of code shipped. I personally do think some of the more incredible claims about AI coding add up, and am happy to talk about it based on my "evidence", ie the software I am building. 99.99% of my code is ai generated at this point, with the occasional one line I fill in because it'd be stupid to wait for an LLM to do it.

For example, I've built 5-6 iphone apps, but they're kind of one-offs and I don't know why I would put them up on the app store, since they only scratch my own itches.

Aeolun

I don’t think this is necessarily true. People that didn’t ship before still don’t ship. My ‘unshipped projects’ backlog is still nearly as large. It’s just got three new entries in the past two months instead of one.

warkdarrior

Maybe people are working less and enjoying life more, while shipping the same amount of code as before.

If someone builds a faster car tomorrow, I am not going to go to the office more often.

jplusequalt

>Do you also expect I mention that I used Intellisense and syntax highlighting too?

No, but I expect my software to have been verified for correctness, and soundness by a human being with a working mental model of how the code works. But, I guess that's not a priority anymore if you're willing to sacrifice $2400 a year to Anthropic.

trenchpilgrim

$2400? Mate, I have a free GitHub Copilot subscription (Microsoft hands them out to active OSS developers), and work pays for my Claude Code via our cloud provider backend (and it costs less per working day than my morning Monster can). LLM inference is _cheap_ and _getting cheaper every month_.

> No, but I expect my software to have been verified for correctness, and soundness by a human being with a working mental model of how the code works.

This is not exclusive with AI tools:

- Use AI to write dev tools to help you write and verify your handwritten code. Throw the one-off dev tools in the bin when you're done.

- Handwrite your code, generate test data, review the test data like you would a junior engineer's work.

- Handwrite tests, AI generate an implementation, have the agent run tests in a loop to refine itself. Works great for code that follows a strict spec. Again, review the code like you would a junior engineer's work.

noidesto

Agree. In the hands of a seasoned dev not only does productivity improve but the quality of outputs.

If I’m working against a deadline I feel more comfortable spending time on research and design knowing I can spend less time on implementation. In the end, it took the same amount of time, though hopefully with an increase of reliability, observability, and extendibility. None of these things show up in the author’s faulty dataset and experiment.

ryanobjc

The author is pointing out that aggregate productivity hasn't really gone up. The graphs are fairly compelling.

There are many reasons for your experience, and I am glad you are having them! That's great!

But the fact remains, overall we aren't seeing an exponential or even step function in how much software is being delivered!

xenobeb

What is even the point in having this argument?

At this point, one is gaining with each model release or they are not.

Lets see in 2035 who was right and who was wrong. My bet is the people who are not gaining right now are not going to like the situation in 2035.

undefined

[deleted]

philipwhiuk

I mean it's definitely shovelware, I'll give you that.

https://github.com/go-go-golems/ai-in-action-app/blob/main/c...

larve

Not sure what you mean? This was a demo in a live session that took about 30 minutes, including ui ideation (see pngs). It’s a reasonably well featured app and the code is fairly minimal. I wouldn’t be able to write something like that in 30 minutes by hand.

Daily Digest email

Get the top HN stories in your inbox every day.