Vibe coding and agentic engineering are getting closer than I'd like

Daily Digest email

Get the top HN stories in your inbox every day.

The disconnect for AI is that it is a jagged frontier and it only really shines when one of its jagged frontiers extends counter to one of your valleys.

If you've been writing Perl for 30 years, you might not want to learn JavaScript just to make a little fun idea in your head to show your wife. Vibe code that shit man. Who cares? Your wife does not care about LOC or those internal design decisions you made.

If you're trying to learn something new like an algorithm, protocol, or API write that shit by hand. You learn by doing, and when you know how the thing works and have that mental context, you will always be faster than an AI. Also, when did we stop liking to learn? Why is it a bad thing to know all the ins and outs of a programming language? To write and make all the decisions yourself? That shit is fun. I don't care if you disagree.

If you're at work and they really care about getting something out of the door, do whatever you think is best. If you just wanna ship vibed code and review PRs all day, all the power to you. If you wanna write it by hand, and use AI like a scalpel to write up boiler plate, review code, do PR audits, etc... go for it!

A hammer is a really great tool that has thousands of purpose-designed uses. I still prefer my key to get into my car. It's all tools, you are a person.

A lot of this stuff if coming top-down from people who do not have the experience you do. Wouldn't a smart employee use their expertise to advise the organization? If you work at a company where that would not be okay, maybe it's time to start looking for another firm.

ditchfieldcaleb

I agree with you on everything you said here except:

> when you know how the thing works and have that mental context, you will always be faster than an AI

That's just plain false, honestly. No one can type at the speed AI can code, even factoring in the time you need to spend to properly write out the spec & design rules the AI needs to follow when implementing your app/feature/whatever. And that gap will only increase as LLMs get more intelligent.

notnullorvoid

Some of us do actually have intimate knowledge in certain areas where guidance of an AI takes longer than doing it yourself. It's not about typing speed, it's that when you know something really really well the solution/code is already known to you or the very act of thinking about the problem makes the solution known to you in full. When that happens it's less text to write that solution than it is to write a sufficient description of the solution to AI (not even counting the back and forth required of reviewing the AI output and correcting it).

bulbar

Giving a precise description of what the computer is supposed to do is exactly what programming is.

The more specific your requirements the closer you get to natural language not being useful anymore.

lmeyerov

Maybe a failure to automate?

The volume of people successfully adopting agentic engineering practices suggests this stuff isn't rocket science, but it is a learned skill and takes setup.

A year later into heavy AI coding, my experience is what you're describing should aid in being able to run 5+ agents simultaneously on a project because you know what you're doing, you set it up right, and you know how to tell agents to leverage that properly.

Fr0styMatt88

Yeah it’s when you go off the happy path that it gets difficult. Like there’s a weird behaviour in your vibe-coded app that you don’t quite know how to describe succinctly and you end up in some back-and-forth.

But man AI is phenomenal for getting stuff out of your head and working quick.

larodi

Care to explain which particular intimate knowledge allowed you in the last 6-9 months to be faster than AI in certain area?

Honestly, I'm still faster than AI cooking scrambled eggs, but definitely not faster than neither AI (or compiler) in translating stuff into code.

threethirtytwo

I don't believe this. Either you're lying, or you just haven't caught on with how to use Agentic AI.

Everything I do to interact with my computer is through an agent now.

Turskarama

In my experience AI can write _something_ from scratch, but often edge cases won't be handled until I go through and read the results or test it. Usually when I'm writing by hand I will naturally find the majority of edge cases as I go. By the time I've read through the results and fixed said edge cases, I usually would have been faster just doing it myself.

toponijo

It also loves to add edge case handling where it's not needed and in poorly chosen places

utopiah

> No one can type at the speed AI can code

Don't we already have a weekly post nowadays explaining, again, that typing isn't the bottleneck?

JSR_FDED

It should be “…you will always be faster than someone _without the knowledge_ using an AI”

leostarship

as i understood it he's referring to the overall time it takes to build a complete finished piece of software, accounting for the refactoring and bug fixes and all that. cause handn't you understood the tools you're using you would be running into roadblocks and that adds up

undefined

[deleted]

gaanbal

if you've never had the experience of handing something off to someone else being more laborious and slower than doing it yourself due to having to set constraints and define success, then you simply haven't held a senior enough position to comment on this with any authority

andai

Also employees who work slower than you (and spend most of their time not actually working).

erfgh

Where does this certainty that LLMs will get more intelligent stem from?

charcircuit

>No one can type at the speed AI can code

You can definitely be faster than frontier models. The number of tokens per second is not that high and they require a lot of tokens for thinking and navigating things.

Aerolfos

Especially if you use auto-complete AI, ironically. You type a few characters, the line fills out in less than a second, as opposed to a reasoning model that takes maybe a second per 2-3 lines it writes out.

allthetime

AI is just revealing the two types of people in this line of work. Those who don’t actually like software and just do it because it’s lucrative, and the actual nerds who care.

smugglerFlynn

You are probably talking about people who just crunch out some half baked solutions for the sake of getting somewhere.

But there are other nerds who care, just not about the code quality, but about conversion, testing out business ideas quickly, getting to know their customers better.

There are nerds who care about business strategy.

There are nerds who care about accounting principles and clean financial reporting.

There are nerds who care about sales targets and partnerships.

There are many types of nerds out there. Don’t limit nerds to engineers, because “tech” world is not just an engineering world anymore. All these nerds you can team up with to build meaningful things, because they do care.

eli

A much more charitable framing: people who enjoy the process vs people who enjoy the result.

(Though, granted, the results are a lot better if you craft it by hand)

anygivnthursday

> enjoy the process

This means different things to different people, lot of people enjoy the process of engineering solutions with LLM agents, build out tailored skilled, custom approaches that make up their own flavour "agentic" workflow. There are also people who find joy in Javascript that other people cannot understand why. And other people again love system languages or even tinkering with assembly etc.

What I wanted to say is that LLM use does not automatically mean people just want to get results faster, there are still nerds enjoying the process of working with these new tools.

sdevonoes

But business people always cared only about thr result. My PM (who speaks like a salesman) only cares about the results. My “head of” same. My ceo same. The only ones who ever cared about the process and quality were us the engineers… if we don’t have that care, well, to hell with everything

Daishiman

I am not really sure. I wrote some scripts that aggregated data from several APIs with an LLM and the LLM had the foresight to create a caching layer for the API responses as it properly inferred that I would need the results over and over again as well as using asyncio to accelerate fetch speed. This would have been a v2 or v3 and it one-shotted it perfectly.

okdood64

I take software engineering and production reliability very seriously. But coding is just a small part of my job. It's not really the meat and potatoes. I'll vibe code (responsibility) where I can.

tyyyy3

Can we build a list of the actual nerds who care? Need it for my future recruitment needs lol.

andai

The benchmark is "do they do it for fun", i.e. personal projects.

But the real trick isn't "number of personal projects", but how weird they are. There's no "rational" reason to do them, they don't increase the person's marketability / hireability. They are done purely for intrinsic reasons.

(On reflection, this also seems to be a pretty robust predictor of autism. :)

stephenr

I've posited for a while now that the people who find spicy autocomplete to be exciting are the people who can't really do what it does.

I played with Image Playground last year some time. It was really fun. You know why? I can't draw, and I can't paint, to save my life. It's letting me do something I can't do well/at all on my own.

Using an LLM to do something I can do, with the caveat that it's pretty mediocre at the task, and needs to be constantly monitored to check it isn't doing stupid things? If I wanted that I'd just get an intern and watch them copy crappy examples from StackOverflow all day.

The same logic explains the use of LLM's to write emails/other long form text.

It makes accessible something that people otherwise cannot do well. Go look at submissions on community writing sites. The people who write because they're good at it, are adamant they don't use an LLM.

People use LLM's to do things they're otherwise not able to do. I will die on this hill.

Daishiman

I care a lot about software and I use LLMs extensively. There are some things I deeply understand yet I don't care for doing anymore because I've done them for years and there's nothing to be gained from doing them manually.

enraged_camel

I care about solving problems for and delivering value to my users. The software is simply a means to that end. It needs to work well, but that does not mean every line of code requires an artisanal touch and high attention to detail.

ehnto

I think there's some ambiguity in the discussion around what people mean when they say "good code".

Good code for a business is robust code, that's functionally correct, efficient where it needs to be and does not cost too much.

I believe most developers who care about good code are trying to articulate this, they care about a strong system that delivers well, which comes from good architecture.

LLMs actually deliver pretty well on the more trivial code cleanlines stuff, or can be made to pretty trivially with linters, so I don't think devs working with it should be worried about that aspect.

What is changing fast is that last point I mentioned, "that doesn't cost too much" because if you can get 70% of the requirements for 10% of the perceived up front cost, that calculus has changed. But you are not going to be getting the same level of system architecture for that time/cost ratio. That can bite you later, as it does often enough with human coders too.

techpression

It goes for all professions really, people who do it for work and people who care. Apply to any profession, plumbers, doctors, carpenters, cleaners, etc etc. Most of us have experienced both types and I haven’t heard of anyone preferring the ”do it for work” over the ones who care. And like those other professions, in software we accept the worse of the two because finding people who care is both time consuming and often much more expensive.

andai

>in software we accept the worse of the two

and the whole world suffers for it.

Finbel

>Also, when did we stop liking to learn? Why is it a bad thing to know all the ins and outs of a programming language?

I do not know the inns and out of the assembly layer my high level code end up as. It's not because I don't like to learn, it's because I genuinely don't need to. At a certain level of AI performance, how will this be any different?

californical

Because you may not know the specifics of the assembly being generated, but you’ve likely learned a language built on top of assembly. And the compilers do some great tricks behind the scenes to generate efficient assembly, but those tricks are specifically coupled to semantics of the source language.

An LLM is not coupled to anything and can generate output that simply does not relate to the input. This doesn’t happen with compilers, and if it does, then it’s a specific bug to be addressed. An LLM can never guarantee certain output based on the input.

If I write x < 100, I know exactly how the compiler will treat that code every single time, and I know what < means and how it differs from <=

If I tell an LLM that “I want numbers up to 100.” Will that give me < or <= and will it be consistent every single time, even the ten thousandth program that I write?

The language is ambiguous where the code is specific

0xpgm

However, curious programmers who develop in high level languages will dabble with assembly maybe for fun, and will be much better off for it than those who treat parts of the stack like a black box never to be opened.

sdevonoes

One difference is: to use a top notch compiler/assembler you don’t need to pay. They are open source and have a lot of support. To use the latest and greatest models (bc no one around likes to use non sota ones) you need to pay a premium price.

Multibillion dollars companies are now the gateway for every line of code you need to write. That’s dystopian. It sucks

rufasterisco

Let’s see if someone can point me towards some resources over the following.

The problem is mixing vibe-coding and agentic-eng, and switching the brain in 2 different modes (fast-feedback gratification vs deep-focus gratification).

There’s no clear cut rule on what works. Different people, different brains, and especially amongst devs some optimized low-key neurodivergence.

And then there’s waiting mode, those N seconds/minutes that agents take to think and write.

What’s the right mix? Keep a main focused project and … what do you do in the meantime? Vibe code something else? Hn? Social media? Draw lines on a paper sheet? Wood carving? Exercise? Rewatch some old tv series?

I have experimented….

There are side activities that help you go back to the task at hand in the correct mental framework for it. Not just for productivity, but for efficiency and enhancing critical thinking on the main task. Or whatever you choose to optimize for. Can anyone point me towards some people talking about this?

jtr1

I have been building an iOS app that I had kicking around in my head for years but never had time to build. I have been a frontend UX engineer for the better part of a decade and went through a handful of tutorials on Swift. The project definitely sits in this uncanny valley for me. I have test suites for every aspect of the app and have the agent using TDD to avoid cheating - this has gotten me pretty far without having to look too close at the output other than general structure. As I'm reaching a more mature stage of the project though, I'm finding that I want to tweak a lot by hand in the code to get the details right without burning tokens.

throwaway219450

The agents always do the best work IMO if you already know exactly what you want, but are too lazy to implement it. I like having the agent mock up a working solution before reimplementing it.

To split the difference, I now try to hand code as much as I can from the beginning, leave TODO comments for the agent to mop up and I'll ask it to complete the issue with reference to the current diff. It reduces the surface for agents to make stupid assumptions. If I can get it done fast on my own, win for me, if the agent finds issues or there's logic that needs checking, also a win. This way you stay sharp, but you have access to an oracle if you get stuck and it costs you fewer tokens.

imrozim

100% aggred, i learn coding by building stuff and breaking it when you let ai do everything you skip that pain and also skip the understanding.

jesterson

> Why is it a bad thing to know all the ins and outs of a programming language? To write and make all the decisions yourself? That shit is fun.

It's not just fun (i agree it is), but it is also essential for creation.

What we have done with the 'AI' is to create a lot of ignorant morons who think they can create a lot of things without knowledge. This is not gonna end well.

zx8080

> they can create a lot of things without knowledge. This is not gonna end well.

Who said "managers"

jesterson

Oh managers are not the biggest evil here. At least they know basics.

Now we have influx of people with not a single shred of technical knowledge thinking they can create something.

sdevonoes

Agree except for this part

> If you're at work and they really care about getting something out of the door, do whatever you think is best.

If you don’t mind being jobless, sure do whatever you think is best. Not all of us can simply switch companies easily. Folks need to realise that AI in a company setting works for the benefit of the company, not for the individual.

0xpgm

But do companies really know how to use AI? I think most of it is experimentation - throwing things to the wall and seeing what sticks.

It's the practitioner who eventually figures out what really works. I see this the same way the agile movement emerged. It was initiated by people who were hands-on programmers and showed enough benefit at minimizing software waste before it took a life of its own and started getting peddled by people who didn't really understand the underlying principles.

noduerme

>> The entire software development lifecycle was, it turns out, designed around the idea that it takes a day to produce a few hundred lines of code.

Yeah. I'm not sure how other people work, but I almost never need to write formal tests because I essentially test locally as I write, one method at a time, and at that moment I have a complete mental map of everything that can potentially go wrong with a piece of code. I write and test constantly in tandem. I can write a test afterwards to prove what I already know, but I already know it. This is time consuming, anal, and obsessive-compulsive, and luckily that kind of work perfectly suits my personality. The end result is perfect before I commit it.

It is a lot of fun asking LLMs to write code around my code. Make 10 charts with chartjs in an html page that show something and put it behind a reverse proxy so the client can see it. Wow. Spot on, would've taken me an hour. I can even rely on Claude to somewhat honestly reason about things in personal projects.

But knowing every implementation decision makes a huge difference when anything real is at stake. "Guilt" wouldn't begin to describe the sense I'd have id my software did something because of a piece of code I hadn't personally reviewed and fully understood, at which point I probably should have just written it myself.

etothet

Vibe Coding (and LLMs) did not create undisciplined engineering organizations or engineers. They exposed and accelerated them.

Plenty of engineers have loose (or no!) standards and practices over how they write coee. Similarly, plenty of engineering teams have weak and loose standards over how code gets pushed to production. This concept isn't new, it's just a lot easier for individuals and teams who have never really adhered to any sort of standards in their SDLC to produce a lot more code and flesh out ideas.

datsci_est_2015

Bad engineers continue being bad, good engineers continue being good.

I personally don’t know any colleagues who were good engineers just because they wrote code faster. The best engineers I know were ones who drew on experience and careful consideration and shared critical insights with their team that steered the direction of the system positively.

> Claude, engineer a system for me, but do it good. Thanks!

truncate

>> Bad engineers continue being bad, good engineers continue being good.

I don't know if good engineers can necessarily continue to be good. There is limit to how much careful consideration one can give if everything is on an accelerated timeline. Regardless good or not, there is limit on how much influence you have on setting those timelines. The whole playing field is changing.

ori_b

It's deeper. We used to mock architects that stepped back and stopped coding, because they generated trash.

There's a cycle that is needed for good system design. Start with a problem and an approach, and write some code. As you write the code, you reify the design and flesh out the edge cases, learning where you got the details wrong. As you learn the details, you go back to the drawing board and shuffle the puzzle pieces, and try again.

Polished, effective systems don't just fall out of an engineers head. They're learned as you shape them.

Good engineers won't continue to be good when vibe-coding, because the thing that made them good was the learning loop. They may be able to coast for a while, at best.

datsci_est_2015

> if everything is on an accelerated timeline

Good engineers are also capable of managing expectations. They can effectively communicate with stakeholders what compromises must be made in order to meet accelerated timelines, just as they always have.

We’ve already had conversations with overeager product people what the ramifications are for introducing their vibe coded monstrosities:

  - Have you considered X?
  - Have you considered Y?

Their contributions are quickly shot down by other stakeholders as being too risky compared to the more measured contributions of proper engineers (still accelerated by AI, but not fully vibe-coded).

If that’s not the situation where you work, then unfortunately it’s time to start playing politics or find a new place to work that knows how to properly assess risk.

andai

An old comic I like:

- I've taken a controversial new pill that accelerates my brain.

-- So you're smart now?

- I'm stupid faster!

That being said, being stupid faster can work if validation is cheap (and exists in the first place).

Turns out "eh close enough" for AGI is just stupidity in an "until done" loop. (Technically referred to as Ralphing.)

sanderjd

Hmmm, I think I disagree with this.

I estimate that I'm now spending about 10 to 30 hours less time a week in the mechanical parts of writing and refactoring code, researching how to plumb components together, and doing "figure out how to do unfamiliar thing" research.

All of those hours are time that can now be spent doing "careful consideration" (or just being with my family or at the gym or reading a book, which is all cognitively valuable as well).

Now, I suppose I agree that if timelines accelerate ahead of that amount of regained time, then I'm net worse off, but that's not the current situation at the moment, in my experience.

paulddraper

There is no limit.

Or at least, the limit is increasing by the day.

runarberg

When there is all that crap out there, good engineer may simply just carry out, call it good and leave the industry. Personally seeing the proliferation of wibe coded apps has made me hesitant of publishing and promoting my AI free apps.

embedding-shape

> I personally don’t know any colleagues who were good engineers just because they wrote code faster

Same, if anything, the opposite seems to be true, the ones that I'd call "good engineers" were slower, less panicked when production was down and could reason their way (slowly) through pretty much anything thrown at them.

Opposite experience, I've sit next to developers who are trying their fastest to restore production and then making more mistakes to make it even worse, or developers who rush through the first implementation idea they had for a feature, missing to consider so many things and so on.

ryandrake

> Same, if anything, the opposite seems to be true, the ones that I'd call "good engineers" were slower

Unfortunately, a lot of workplaces are ignoring this, believing their engineers are assembly line workers, and the ones who complete 10 widgets per minute are simply better than the ones who complete 5 widgets per minute.

sanderjd

This is true. But I find AI tools to be a huge help for all of this. Not to do any of it faster, but to remove a bunch of the tedium from the process of testing ideas and iterating on them. Instead of "I wonder if the problem is..." requiring half an hour of research, now I can do an initial check of that theory in less than a minute, and then dig further, or move onto the next one. Or say I estimate it's gonna take me an hour or more to test an idea, I might just decide I don't have time to invest in that. Well now maybe I can get a tentative answer on that by spending a minute laying out the theory and letting an agent spend ten or twenty minutes on it in the background. In this way I can explore space I just would have determined was not worth the effort previously.

To me, none of this feels like "going faster", it feels like "opening up possibilities to try more things, with a lot less tedious work".

notnullorvoid

> Bad engineers continue being bad, good engineers continue being good.

Unfortunately I have seen some really good software engineering peers regress into bad engineers through a increasing reliance on AI.

Conversely some very bad engineers (undeserving of the title) have been producing better outputs than I ever expected possible of them.

LtWorf

Good engineers need to be allowed to be good. If they are told to pump features or lose their job, they might act like bad engineers as well.

sanderjd

Aren't they more likely to leave?

nly

The best paid engineers I know seem to be the super fast hackers who write unfathomable amounts of code in short order.

Unfortunately thoughtful design and engineering doesn't get recognised

bdangubic

in my experience this is because there are very very very very few thoughtful designers and engineers, especially compared to people that are cranking out code.

jkaptur

> I personally don’t know any colleagues who were good engineers just because they wrote code faster.

However, the best engineers I know are usually among the quickest to open an editor or debugger and use it fluently to try something out. It's precisely that speed that enables a process like "let's try X, hmm, how about Y, no... ok, Z is nice; ok team, here are the tradeoffs...". Then they remember their experience with X, Y, and Z, and use it to shape their thinking going forward.

Meanwhile, other engineers have gotten X to finally mostly work and are invested in shipping it because they just want to be done. In my experience, this is how a lot of coding agents seem to act.

It's not obvious to me how to apply the expert loop to agentic coding. Of course you can ask your agent to try several different things and pick the best, or ask it to recommend architectural improvements that would make a given change easier...

datsci_est_2015

Or: depth-first search of the solution space vs breadth-first (or balanced) search of the solution space.

> Of course you can ask your agent to try several different things and pick the best, or ask it to recommend architectural improvements that would make a given change easier

The ideal solution increasingly seems to be encoding everything that differentiates a good engineer from a bad engineer into your prompt.

But at that point the LLM isn’t really the model as much as the medium. And I have some doubts that LLMs are the ideal medium for encoding expertise.

sanderjd

I really don't relate to this...

The way you apply the expert loop is to be the expert. "Can we try this...", "have you checked that...", "but what about...".

To some degree you can try to get agents to work like this themselves, but it's also totally fine (good, actually) to be nudging the work actively.

beacon294

As you practice it will be apparent, you simply keep working on the application architecture yourself.

skydhash

> However, the best engineers I know are usually among the quickest to open an editor or debugger and use it fluently to try something out

The Pragmatic Programmer book has whole chapters about this. Ultimately, you either solve the problem analogously (whiteboard, deep thinking on a sofa). Or you got fast as trying out stuff AND keeping the good bits.

Quekid5

> However, the best engineers I know are usually among the quickest to open an editor or debugger and use it fluently to try something out.

That's not my experience... mostly it's about first interrogating the actual problem with the customer and conditions under which it occurs. Maybe we even have appropriate logging in our production application? We usually do, because you know, we usually need to debug things that have already happened.

(If it's new/unreleased code, sure fine, let's find a debugger.)

undefined

[deleted]

jakevoytko

Yeah, a lot of people came of age with a "we'll fix it when it's a problem" mindset. Previously their codebases would start to resist feature development, you'd fix the immediate bottlenecks, and then you could kick the can down the road a bit until you hit the next point of resistance. You kinda refactor as you do features. The frontier models have pushed the "it's a problem" moment further back. They can kinda work with whatever pile of code you give them... to a point. So it manifests as the LLM introducing extra regressions, or dropping more requirements than it used to, but it's not really manifesting as the job being harder for you. It's just not as smooth as it was from an empty repository. Then you hit the point where it just breaks too much and you need to fix it. And the whole codebase is just fractal layers of decisions that you didn't make. That's hard to untangle. And you're not editing the code yourself, so you don't have that visceral "adding this specific thing in this specific way has a lot of tension" reaction that allows you to have those refactoring breakthroughs.

meridian-v

This is the sharpest observation in the thread. The "tension" you describe is proprioception for code — you feel where the abstractions leak, where the seams don't align, through the act of writing and refactoring. It's not a visual signal. You can't get it from reading a diff.

The risk isn't that agents write bad code. It's that developers lose the sense that tells them where code is bad. Code review is perception. Writing code is proprioception. They're different senses and one doesn't substitute for the other.

The question for the agent era isn't "is the code good enough to ship" — it's "do I still have enough coupling to the codebase to know when it isn't?"

patrick-elmore

[dead]

layoric

This is very true, I've found these tools that I am highly encouraged to use very hit and miss, which they are by nature. After using Matt Pocock's skills, I've come around to the idea that LLM's main utility is to act as the ultimate rubber ducky. The `grill-me` feature is honestly the most useful, not for guiding the follow up writing of code, but to make me write down and explore the idea I have more quickly. It's guesses of questions to ask are generally pretty good. I don't believe there is any 'understanding', so I feel the rubber ducky analogy works quite well. This isn't anything you couldn't do before with some discipline, but at least I find it helpful to be more consistent.

pydry

The first time i used LLMs it was to try and refactor behind a solid body of tests i trusted.

I figure if it cant code when it has all of the necessary context available and when obscure failures are easily detected then why would i trust it when building features and fixing bugs?

It never did get good enough at refactoring.

layoric

I agree, the mechanical refactoring of modern IDE tooling, especially with typed languages is so much faster and safer, it's not even close. These tools can be useful for sure, but I think in general they are being wayy over prescribed to different tasks.

tbrownaw

> Vibe Coding (and LLMs) did not create undisciplined engineering organizations or engineers.

Loss of discipline can be a result of panic or greed.

Perhaps believing that your own costs or your competitors' costs are suddenly becoming 10x lower could inspire one of those conditions?

(Also for greenfield projects specifically, it can plausibly be an experiment just to verify what happens. Some orgs are big enough that of course they can put a couple people on a couple-month project that'll quite likely fall flat.)

teeray

Can’t wait for the next stage of escalation when teams start to feel code review is keeping them from vibe coding utopia. It’ll probably be “AI review only, keep your human opinions to yourself” just so they can continue to check the “all changes are reviewed” box on security checklists.

adastra22

LLMs are accelerants. They elevate great engineers to ever more dizzying heights of productivity. They also multiply massively the sloppy output of shit engineers.

bitexploder

Vibe coded apps with barely no tests, invariants, etc. No wonder it turns into spaghetti. You can always refactor code, force agents to write small modular pieces and files. Good engineering is good engineering whether an agent or human wrote the code. Take time to force agents to refactor, explore choices. Humans must at least understand and drive architecture at this point still. Agents can help and do recon amazingly and provide suggestions.

mleo

I can’t understand this. The first thing I do with new agent driven project is set up quality checks. Linters, test frameworks, static analysis, etc… Whatever I would expect a developer to do, I would expect an agent to do. All implementation has to go through build success and mixed agent reviews before moving on. I might not do this with initial research/throwaway prototype, but once I know what direction to go and expect code to go to production it is vital to set guard rails.

gck1

> The first thing I do with new agent driven project is set up quality checks. Linters, test frameworks, static analysis, etc

I do this too, but then I sit and observe how agent gets very creative by going around all of these layers just to get to the finish line faster.

Say, for example, if I needlessly pass a mutable reference and the linter screams at me, I know it's either linter is wrong in this case, or I should listen to it and change the signature. If I make the lazy choice, I will be dissatisfied with myself, I might even get scolded, or even fired if I keep making lazy choices.

LLM doesn't get these feelings.

LLM will almost always go for silencing it because it prevents it from reaching the 'reward'. If you put guardrails so that LLM isn't allowed to silence anything, then you get things like 'ok, I'll just do foo.accessed = 1 to satisfy the linter'.

Same story with tests. Who decides when it's the test that should be changed/deleted or the implementation?

Quekid5

Generated tests... I mean... listen to yourself.

I can generate a lot of tests amounting to assert(true). Yeah, LLM generated tests aren't quite that simplistic, but are you checking that all the tests actually make sense and test anything useful? If no, those tests are useless. If yes, I don't actually believe you.

It's the typical 10 line diff getting scrutinized to death, 1000 line diff: Instant LGTM.

Pay attention to YOUR OWN incentives.

jillesvangurp

It's also helping the engineers that do have standards. A lot of what I put in my guard rails (crafted to get better outcomes for my prompts) is not exactly rocket science. Those guard rails just impose some sane engineering processes and stuff I care about.

As models get better, they seem to be biased to doing most of these things without needing to be told. Also, coding tools come with built in skills and system prompts that achieve similar things.

Two years ago I was copy pasting together a working python fast API server for a client from ChatGPT. This was pre-agentic tooling. It could sort of do small systems and work on a handful of files. I'm not a regular python user (most of my experience is kotlin based) but I understand how to structure a simple server product. Simple CRUD stuff. All we're talking here was some APIs, a DB, and a few other things. I made it use async IO and generate integration tests for all the endpoints. Took me about a day to get it to a working state. Python is simple enough that I can read it and understand what it's doing. But I never used any of the frameworks it picked.

That's 2 years ago. I could probably condense that in a simple prompt and achieve the same result in 15 minutes or so. And there would be no need for me to read any of that code. I would be able to do it in Rust, Go, Zig, or whatever as well. What used to be a few days of work gets condensed into a few minutes of prompt time. And that's excluding all the BS scrum meetings we'd have to have about this that and the other thing. The bloody meetings take longer than generating the code.

A few weeks ago I did a similar effort around banging together a Go server for processing location data. I've been working against a pretty detailed specification with a pretty large API surface and I wanted an OSS version of that. I have almost no experience with Go. I'd be fairly useless doing a detailed code review on a Go code base. So, how can I know the thing works? Very simple, I spent most of my time prompting for tests for edge cases, benchmarking, and iterating on internal architecture to improve the benchmark. The initial version worked alright but had very underwhelming performance. Once I got it doing things that looked right to me, I started working on that.

To fix performance, I iterated on trying to figure out what was on the critical path and why and asking it for improvements and pointed questions about workers, queues, etc. In short, I was leaning on my experience of having worked on high throughput JVM based systems. I got performance up to processing thousands of locations per second; up from tens/hundreds. This system is intended for processing high frequency UWB data. There probably is some more wiggle room there to get it up further. I'm not done yet. The benchmark I created works with real data and I added generated scripts to replay that data and play it back at an accelerated rate with lots of interpolated position data. As a stress test it works amazingly well.

This is what agentic engineering looks like. I'm not writing or reviewing code. But I still put in about a week plus of time here and I'm leaning on experience. It's not that different from how I would poke at some external component that I bought or sourced to figure out if it works as specified. At some point you stop hitting new problems and confidence levels rise to a point where you can sign off on the thing without ever having seen the code. Having managed teams, it's not that different from tasking others to do stuff. You might glance at their work but ultimately they do the work, not you.

jwpapi

> I know full well that if you ask Claude Code to build a JSON API endpoint that runs a SQL query and outputs the results as JSON, it’s just going to do it right. It’s not going to mess that up. You have it add automated tests, you have it add documentation, you know it’s going to be good.

I feel like this is just not true. An JSON API endpoint also needs several decisions made.

- How should the endpoint be named

- What options do I offer

- How are the properties named

- How do I verify the response

- How do I handle errors

- What parts are common in the codebase and should be re-used.

- How will it potentially be changed in the future.

- How is the query running, is the query optimized.

…

If I know the answer to all these questions, wiring it together takes me LESS time than passing it to Claude Code.

If I don’t know the answer the fastest way to find the answer is to start writing the code.

Additionally, whilst writing it I usually realize additional edge cases, optimizations, better logging, observability and what else.

The author clearly stated the context for this quote is production code.

I don’t see any benefits in passing it to Claude Code. It’s not that I need 1000s of JSON API endpoints.

eddieroger

> If I know the answer to all these questions, wiring it together takes me LESS time than passing it to Claude Code.

That's just not true, and if it is in your case, then you're not great at writing prompts yet.

> Take the todo_items table in Postgres and build a Micronaut API based around it. The base URL should be /v1/todo_items. You can connect to Postgres with pguser:pgpass@1.2.3.4

That's about all it takes these days. Less lines of code than your average controller.

majormajor

Every day I do something where the llm writes it ten times faster than I would with twice the test coverage.

And every day I do something else where the LLM output is off enough that I end up spending the same amount of time on it as if I'd done it by hand. It wrote a nice race condition bug in a race I was trying to fix today, but it was pretty easy for me to spot at least.

And once a week or so I ask for something really ambitious that would save days or even weeks, but 90% of the time it's half-baked or goes in weird directions early and would leave the codebase a mess in a way that would make future changes trickier. These generally suggest that I don't understand the problem well enough yet.

But the interesting things are:

1) many of the things it saves 90% of the time on are saving 5+ hours

2) many of the things I have to rework only cost me 2+ hours

3) even the things that I throw away make it way faster to discover that 'oh, we don't understand this problem well enough yet to make the right decisions here yet' conclusion that it would be just starting out on that project without assistance

so I'm generally coming out well ahead.

qingcharles

This. There is definitely a ratio. A year ago, it was 50/50. It felt better because the hard things it did fast while I sipped coffee outweighed in my mind the negatives.

Now that ratio is swinging way over towards the LLMs favor.

sarchertech

>you’re not great at writing prompts yet

How do you reconcile that with your example prompt, which demonstrates no skill requirement whatsoever. It’s the first thing any developer would think of.

vlunkr

It’s simple but contains all the necessary info. You can say “build an endpoint to get user data” and it will absolutely do something, but it might be stupid, and when you compound 1000 stupid prompts like that you get spaghetti.

apsurd

I've drank the AI koolaid so I'm not a hater, but to say "you're just not prompting right" is such a cop-out. Prompting right takes a metric fuck ton of effort. I'm actually kinda agreeing with you, if you make it to where you're dev environment is sufficiently harnessed, then you can give it one-liner magic prompts. But getting there, learning to get there, paying that cost, hot mother of god it's a lot of effort.

Communicating, in words, is extremely hard. I don't think this should be as controversial as it's seems in the prompt era.

VS: someone has mastered one of the myriad openAPI generators, and it's shipped.

phpnode

it does take a little while to get good at this new skill, yes. Just like, say, learning a new programming language and the ecosystem around it takes some effort. After you get over the hump it's really very straightforward and mostly a matter of knowing the kinds of mistakes the LLM is likely to make ahead of time, and then kindly asking it to do something smarter. If you've successfully mentored junior engineers you already have this skill.

xmprt

I'll go in the other direction and say that if you're spending a lot of your time learning to prompt better then you're wasting it because LLMs are only going to get better at understanding your intent regardless of "prompt engineering". The JSON API example to wire up a database can be one-shot pretty easily by the latest models without much context and without setting up any harness. The more time you spend perfecting your harness, the more time you would have wasted when the next model comes out to make it obsolete.

eddieroger

I disagree it's a cop-out, but I agree it's hard to get good at writing prompts and takes a lot of effort. But so is programming. We're trading one skill set for another and getting a bigger return on it.

I started as a skeptic and have similarly drank the kool-aid. The reality is AI can read code faster than I can, including following code paths. It can build and keep more context than I can, and do it faster as well. And it can write code faster than I can type. So the effort to learn how to tell it what to do is worthwhile.

yakbarber

this seems disingenuous. even if your premise is true (which i don't think it is), it only really holds for the first few endpoints. most systems have many, and the models are very good at copying established patterns to the point that you wouldn't normally have to re-explain every detail for every endpoint. so you might be right for the first (you're not), but you're definitely wrong for the next 50.

sdevonoes

I have worked with people like you. Worst colleagues ever.

weird-eye-issue

> If I know the answer to all these questions, wiring it together takes me LESS time than passing it to Claude Code

How so?

jwpapi

Like writing code to me is not slower than writing text?

When I write code every character I type in my computer has less ambiguity than when I write it in human language? I also have the help of LSPs, Linters and Auto-completes.

dreambuffer

This assumes:

- that you spend no amount of time looking things up, reorganising, or otherwise getting stuck

- that you have a solution to the problem ready to go at all times

- that your solution is better than the LLM's solution

I highly, highly doubt that all 3 of these are true. I doubt even 1 of them is true, I think you just don't know how to use LLMs in a focused way.

jameson

I have a similar sentiment. Subject that makes the claim that AI writing code is fast is going to matter a lot because some programmers heavily use "LSPs, Linters and Auto-completes", key bindings, snippets, CLI commands, etc to speed up writing code

spoiler

It's not much to go on by, but I kinda feel ya. I think one exception I'd perhaps make is doing a large mechanic refactor. I find them incredibly daunting. So, I'll just ask AI for that. I mean it probably takes me a similar time to do, but it feels less daunting.

I've been trying to get into agentic coding and there are non-refactoring instances where I might reac for it (like any time I need to work on something using tailwind; I'm dyslexic and I'd get actual headaches, not exaggerating, trying to decipher Tailwind gibberish while juggling their docs before AIs came around)

weird-eye-issue

I use voice to text and for me coding is way faster now. You don't need to sit down and type up a perfect spec lol. I give it terrible prompts with poor grammar and typos from incorrect transcriptions and it does an amazing job. Definitely not perfect I iterate with it a ton but it's still faster than typing it out by hand

fragmede

You're still typing? I don't know how fast you can type, but I can speak way faster than I can type. Somewhere in the neighborhood of 300 wpm. Speech-to-text is pretty good now, and prompting an AI means I'm not trying to speak curly brace semicolon new line.

cyral

This may have been a problem a year or two ago but any premium model will be exploring the codebase to check similar routes to answer all these questions, if you don't specify them.

rufasterisco

Exactly. As long as the codebase is consistently following some given patterns, LLMs nowadays stick to it.

Understanding that limiting number of “design patterns” in a codebase made it better (easier to code and understand) was a good proxy for seniority before LLMs.

Now it’s even better: if all of a sudden “unusual code” is in a PR, either the person opening the PR or the one reviewing it has lost touch with the codebase. Very important signal, since you don’t want that to happen with code you care about.

eric_cc

You can also just talk it out loud to Claude while you’re on a walk getting some sunshine. Done.

nozzlegear

Now you're working when you should be taking a break and enjoying your surroundings. Not good!

jwpapi

Yeah I can and I’ve done it and for fun project it’s fun and cool. But its like using templates to build your website. You’ll be annoyed and at one point your project goes in the endless graveyary of abandoned projects

jmilloy

I think most people are finding the opposite. Claude Code is not only reducing how many projects get abandoned, it's also resurrecting projects from the graveyard.

dodu_

I'd rather just be an actual schizophrenic at that point. It seems like less of a mental illness.

Just be outside and present.

slashdave

You forgotten the important part: permissions

yieldcrv

I’ve seen the best REST APIs since Claude Code has taken the wheel

Every verb implemented, and implemented correctly according to the obscure IETF and most compatible way when the IETF never made it clear

Intuitively named routes, error, authentication all easily done and swappable for another if necessary

I feel like our timeline split if you’re not seeing this

jwpapi

I don’t want every verb implemented, I also dont want an IETF standard. I want as little as possible, so I have to worry about as little as possible in the future.

Use-cases differ, you described a complete REST API, which can be as much of a problem as a too little.

hnuser123456

I see you haven't encountered an API where a GET command can modify the database.

yieldcrv

Then just tell it to do that

It'll even suggest it

You want a single RPC websocket go for it

theteapot

the obscure IETF? Which standard is that exactly? Who cares guess - Claude do that stuff.

zarzavat

Perhaps I've missed a few weeks worth of progress, but I don't think that AIs have become more trustworthy, the errors are just more subtle.

If the code doesn't compile, that's easy to spot. If the code compiles but doesn't work, that's still somewhat easy to spot.

If the code compiles and works, but it does the wrong thing in some edge case, or has a security vulnerability, or introduces tech debt or dubious architectural decisions, that's harder to spot but doesn't reduce the review burden whatsoever.

If anything, "truthy" code is more mentally taxing to review than just obviously bad code.

xantronix

I know there are good uses of LLMs out there. I do. But.

The current fever pitch mandates from above seem to want it applied liberally, and pushing back against that is so discouraging and often career-limiting as to wear the fabric of one's psyche threadbare. With all the obvious problems being pointed out to people, there are just as many workarounds; and these workarounds, as is often revealed shortly thereafter, have their own problems, which beget new solutions, ad infinitum.

At some point it genuinely seems like all this work is for the sake of the machine itself. I suppose that is true: The real goal has become obscured at so many firms today, that all that remains is the LLM. Are the people betting the farm and helping implement the visions of those who have done so guaranteed a soft exit to cushion them from the consequences, or is rationality really being discarded altogether?

Sure, sound engineering principles can help work around these problems, but what efficiency is truly gained, in terms of cognitive load, developer time, money, or finite resources? Or were those ever an earnest concern?

steventhedev

The dirty secret if you work inside BigCorp and look around at the projects they're showcasing:

1. They're low stakes to get wrong.

2. The most common is MCPs or similar ai-tooling.

3. Making them look good takes time and effort still. It's a multiplier, not a replacement.

4. Quality and maintainability require investment. I had to restart an agentic project several times because it painted itself into a corner.

Daishiman

There's two sides to the AI mandates.

The degenerate side is clueless upper management and fad-driven engineering. We have talked extensively about this.

There is a more rational side to it that I've seen in my org: some engineers absolutely refuse to use AI and as a consequence they are now, clearly and objectively, much less productive than other engineers. The thing is, you still need to learn how to use the tool, so a nontrivial percentage of obstinate engineers need to be driven to use this in the same way that some developers have refused to use Docker or k8s or whatever.

callc

Ah yes, we must force these obstinate engineers to the right path! Only after getting everyone to see the light will they understand and thank us for boundless productivity!! /s

Perhaps these “obstinate” engineers have good reason in their decision. And it should be their decision!

To be so confident in what is “the right way (TM)” and try to force it onto others is... revealing.

user34283

In my opinion you are just wrong.

It’s an absolute game changer, and it can now multiply your productivity fivefold if it’s a solo greenfield project.

Maybe half a year ago it was as you said. You had to wait for the agent to finish, you had to review carefully, and often the result was not that great. You did not save a lot of time.

Now I can spin up 3+ parallel conversations in Codex, each in a git worktree. My work is mainly QA testing the features, refining the behavior, and sometimes making architectural decisions.

The results are now undeniable. In the past I could not have developed a product of that scope in my free time.

That is what is possible today. I suspect many engineers have not yet tried things that became feasible over the last months. Like parallel agents, resolving merge conflicts, separating out functionality from a large branch into proper PRs.

atomicnumber3

"many engineers have not yet tried things that became feasible over the last months"

I have heard this statement every single day for 2 years and yet we still have no companies compressing 10 years into 1 year thus exploding past all the incumbents who don't "get it".

xantronix

The thing is, I don't care any longer. I sincerely believe velocity without direction is not a good strategy for delivering quality in the long term. And that's the thing about it: How sustainable is this velocity, in terms of socioeconomic concerns, product strategy, and mental health?

nananana9

> and it can now multiply your productivity fivefold if it’s a solo greenfield project.

Why do I not see 5x as many interesting greenfield projects than before?

valcron1000

> if it’s a solo greenfield project

That's a big if. I don't have numbers but most professional engineers are not working on such projects

hintymad

> I don't think that AIs have become more trustworthy, the errors are just more subtle.

Honest question: what about the counter-argument that humans make subtle mistakes all the time, so why do we treat AI any differently?

A difference to me is that when we manually write code, we reason about the code carefully with a purpose. Yes we do make mistakes, but the mistakes are grounded in a certain range. In contrast, AI generated code creates errors that do not follow common sense. That said, I don't feel this differentiation is strong enough, and I don't have data to back it up.

chromacity

One answer, as another person pointed out, is that LLM mistakes are just different. They are less explicable, less predictable, and therefore harder to spot. I can easily anticipate how an inexperienced engineer is going to mess up their first pull request for my project. I have no idea what an LLM might do. Worse, I know it might ace the first fifty pull requests and then make an absolutely mind-boggling mistake in the 51st one.

But another answer is that human autonomy is coupled to responsibility. For most line employees, if they mess up badly enough, it's first and foremost their problem. They're getting a bad performance review, getting fired, end up in court or even in prison. Because you bear responsibility for your actions, your boss doesn't have to watch what you're up to 24x7. Their career is typically not on the line unless they're deeply complicit in your misbehavior.

LLMs have no meaningful responsibility, so whoever is operating them is ultimately on the hook for what they do. It's a different dynamic. It's probably why most software engineers are not gonna get replaced by robots - your director or VP doesn't want to be liable for an agent that goes haywire - but it's also why the "oh, I have an army of 50 YOLO agents do the work while I'm browsing Reddit" is probably not a wise strategy for line employees.

wilsonnb3

> I can easily anticipate how an inexperienced engineer is going to mess up their first pull request for my project.

Isn’t this just because you have seen a lot of PRs from inexperienced engineers? People learn LLM behavior over time, too.

sumeno

Humans can't make mistakes at the sheer scale that AI can.

Yes, as an engineer I make mistakes, but I could never make as many mistakes per day as an LLM can

undefined

[deleted]

throwuxiytayq

Obviously, the measure isn’t mistakes per day, it’s mistakes per LOC. And that’s not the whole story either - AI self-corrects in addition to being corrected by the operator. If the operator’s committed bugs/LOC rate is as low as the unaugmented programmer’s bugs/LOC, you always choose the AI operator. If it’s higher, it might still be viable to choose them if you care about velocity more than correctness. I’m a slow, methodical programmer myself, but it’s not clear to me that I have a moat.

BoorishBears

This is like having a coworker who's as skilled as you if not more skilled, but also an alien.

Their mental model doesn't map cleanly enough to yours, and so where for a human you'd have some way to follow their thought patterns and identify mistakes, here the alien makes mistakes that don't add up.

Like the alien has encyclopedic knowledge of op codes in some esoteric soviet MCU but sometimes forgets how to look for a function definition, says "It looks like the read tool failed, that's ok, I can just make a mock implementation and comment out the test for now."

AndrewKemendo

Some of my favorite peer engineers work exactly like that

People used to like them and they used to be legends (even if not everyone liked them)

Notch, Woz, Linus and Geohot come to mind

The Metasploit creator Dean McNamee worked for me and he was just like that and a total monster at engineering hard tech products

wilsonnb3

Dealing with the alien coworkers has always been the job, that is what software is to most people.

Software developers get paid big money because they can speak alien, the only thing that is changing is the dialect.

sanderjd

Yeah I relate to this. I think working in smaller chunks helps a lot. (Just like how it is for work done by humans!)

asdfman123

You can direct LLMs to do test-driven development, though. Write several tests, then make sure the code matches it. And also make sure the agent organizes the code correctly.

CharlieDigital

The LLM obliges and writes a lot of useless tests. I have asked devs to delete several tests in the last day alone.

seanw444

"I don't trust this giant statistical model to generate correct code, so to fix it, I'm going to have this giant statistical model generate more code to confirm that the other code it generated is correct."

I swear I'm living through mass hysteria.

asdfman123

Well, yeah, you don't just make it bang out a bunch of useless code without monitoring it.

You instruct it to write the code you want to be written. You still have to know how to develop, it just makes you faster.

christoff12

This has generally been the case, though. As mentioned in the post, "You want solutions that are proven to work before you take a risk on them" remains true and will be place where the edges are found.

zarzavat

It's about responsibility.

If I get pwned because my AI agent wrote code that had a security vulnerability, none of my users are going to accept the excuse that I used AI and it's a brave new world. I will get the blame, not Anthropic or OpenAI or Google but me.

The same goes for if my AI generated code leads to data loss, or downtime, or if uses too many resources, or it doesn't scale, or it gives out error messages like candy.

The buck stops with me and therefore I have to read the code, line-by-line, carefully.

It's not even a formality. I constantly find issues with AI generated code. These things are lazy and often just stub out code instead of making a sober determination of whether the functionality can be stubbed out or not.

You could say "just AI harder and get the AI to do the review", and I do this a lot, but reviewing is not a neutral activity. A review itself can be harmful if it flags spurious issues where the fix creates new problems. So I still have to go through the AI generated review issue-by-issue and weed out any harmful criticism.

jaggederest

I think there's a couple levels here:

First of all, building a system that constrains the output of the AI sufficiently, whether that's typing, testing, external validation, or manual human review in extremis. That gets you the best result out of whatever harness or orchestration you're using.

Secondly, there's the level at which you're intervening, something along the hierarchy of "validate only usage from the customer perspective" to "review, edit, and validate every jot and tiddle of the codebase and environment". I think for relatively low importance things reviewing at the feature level (all code, but not interim diffs) is fine, but if you're doing network protocol you better at least validate everything carefully with fuzzing and prop testing or something like that.

And then you've got how you structure your feedback to the LLM itself - is it an in-the-loop chat process, an edit-and-retry spec loop, go-nogo on a feature branch, or what? How does the process improve itself, basically?

I agree with you entirely that the responsibility rests on the human, but there are a variety of ways to use these things that can increase or decrease the quality of code to time spent reviewing, and obviously different tasks have different levels of review scrutiny, as well.

user34283

On the other hand, I don’t need to review carefully every line of code in my thumbnail generator and associated UI.

My nonexistent backend isn’t going to be pwned if there is a bug in the thumbnail generation.

After the QA testing on my device, a quick scroll through of the code is enough.

Maybe prompt „are errors during thumbnail generation caught to prevent app crashes?“ if we‘re feeling extra cautious today.

And just like that it saved a day of work.

devin

> If you can go from producing 200 lines of code a day to 2,000 lines of code a day, what else breaks? The entire software development lifecycle was, it turns out, designed around the idea that it takes a day to produce a few hundred lines of code. And now it doesn’t.

It is so embarrassing that LOC is being used as a metric for engineering output.

ilikebits

LOC is useful here not because it's a metric for output but because it's a metric for _understandability_. Reviewing 200 lines is a very different workload than reviewing 2000.

moregrist

It’s still a bad metric.

I have worked with code where 1000s of lines are very straightforward and linear.

I’ve worked on code where 100 lines is crucial and very domain specific. It can be exceptionally clean and well-commented and it still takes days to unpack.

The skills and effort required to review and understand those situations are quite different.

One is like distance driving a boring highway in the Midwest: don’t get drowsy, avoid veering into the indistinguishable corn fields, and you’ll get there. The other is like navigating a narrow mountain road in a thunderstorm: you’re 100% engaged and you might still tumble or get hit by lightning.

jimbokun

The number of bugs tends to be linear to lines of code written meaning fewer lines of code for the same functionality will have fewer bugs.

So I’m pretty skeptical that reviewing 2000 lines of code won’t take any more time than reviewing 200 lines of code.

Furthermore how do you know the AI generated lines are the open highway lines of code and not the mountain road ones? There might be hallucinations that pattern match as perfectly reasonable with a hard to spot flaw.

lelandfe

There’s still a limit on how far one can drive in a day, no matter the road.

undefined

[deleted]

jazzypants

That's assuming the 200 lines are logical and consistent. Many of my most frustrating LLM bugs are caused by things that look right and are even supported by lengthy comments explaining their (incorrect) reasoning.

mcmcmc

Ok? No one is saying that all LOC are equal. Ceteris paribus, 2000 lines is 10x more time consuming to review than 200

mrbnprck

Its still posssible to run any LLM in a loop and optimize for LoC while preserving the wanted outcome.

keeda

LoC is perfectly fine as a metric for engineering output. It is terrible as a standalone measure of engineering productivity, and the problems occur when one tries to use it as such.

It's still useful, however, because that is the only metric that is instantly intuitively understandable and comparable across a wide variety of contexts, i.e. across companies and teams and languages and applications.

As we know, within the same team working on the same product, a 1000 LoC diff could take less time than a 1 line bug fix that took days to debug. Hence we really cannot compare PRs or product features or story points across contexts. If the industry could come up with a standard measure of developer productivity, you'd bet everyone would use it, but it's unfeasible basically for this very reason.

So, when such comparisons are made (and in this case it was clearly a colloquial usage), it helps to assume the context remains the same. Like, a team A working on product P at company C using tech stack T with specific software quality processes Q produced N1 lines of code yesterday, but today with AI they're producing N2 lines of code. Over time the delta between N1 and N2 approximates the actual impact.

(As an aside, this is also what most of the rigorous studies in AI-assisted developer productivity have done: measure PRs across the same cohorts over time with and without AI, like an A/B test.)

faizshah

I experimented with vibe coding (not looking at the code myself) and it produced around 10k LOC even after refactors etc.

I rewrote the same program using my own brain and just using ChatGPT as google and autocomplete (my normal workflow), I produced the same thing in 1500 LOC.

The effort difference was not that significant either tbh although my hand coded approach probably benefited from designing the vibe coded one so I had already though of what I wanted to build.

embedding-shape

Sounds like a great oppurtunity to understand your own development process, and codify it in such detail that the agent can replicate how you work and end up with less code but doing the same.

My experience was the same as you when I started using agents for development about a year ago. Every time I noticed it did something less-than-optimal or just "not up to my standards", I'd hash out exactly what those things meant for me, added it to my reusable AGENTS.md and the code the agent outputs today is fairly close to what I "naturally" write.

8note

or go with this, and use the agent to prototype ideas, and write it yourself once you know what you want

jwpapi

I deleted 75000 lines of code of my codebase in the last 2 months and that was tremendously more useful to by business than the 75000 AI has written the 2 months before...

mcmcmc

Is it? The whole point of the article is that the rate of output for writing code has surpassed the rate at which it can be reviewed by humans. LOC as an input for software review makes a lot of sense, since you literally need to read each line.

adtac

LOC is the worst metric for engineering output, except for all the others - Churchill

deadbabe

The amount of times an engineer says what the fuck while reading code still seems like a reliable metric for code quality assessment.

dyauspitr

We won’t be doing that for much longer, enjoy it while you can.

AnimalMuppet

Somewhat reliable, yes. Not objective, though, and hard to reproduce.

np1810

I just read somewhere on HN that "code is a liability, not an asset, the idea behind the code/final product is the actual asset." And, I can't agree more...

> It is so embarrassing that LOC is being used as a metric for engineering output.

In one of my previous org, LOC added in the previous year was a metric used to find out a good engineer v/s a PIP (bad) engineer. Also, LOC removed was treated as a negative metric for the same. I hope they've changed this methodology for LLM code-spitting era...

root_axis

He's not using LOC as a metric, he's making an observation about the impact of a change in the typical volume of LOC.

dataviz1000

Have you noticed that the coding agents get really close to the solution on the first one shot and then require tons of work to get that last 10% or 5%?

If we shift the paradigm of how we approach a coding problem, the coding agents can close that gap. Ten years ago every 10 or 15 minutes I would stop coding and start refactoring, testing, and analyzing making sure everything is perfect before proceeding because a bug will corrupt any downstream code. The coding agents don't and can't do this. They keep that bug or malformed architecture as they continue.

The instinct is to get the coding agents to stop at these points. However, that is impossible for several reasons. Instead, because it is very cheap, we should find the first place the agent made a mistake and update the prompt. Instead of fixing it, delete all the code (because it is very cheap), and run from the top. Continue this iteration process until the prompt yields the perfect code.

Ah, but you say, that is a lot of work done by a human! That is the whole point. The humans are still needed. The process using the tool like this yields 10x speed at writing code.

nichochar

This was often true when writing code manually to be fair.

You could get to "something that works" rather fast but it took a long time to 1) evaluate other options (maybe before, maybe after), 2) refine it, 3) test it and build confidence around it.

I think your point stands but no one really knows where. The next year or so is going to be everyone trying to figure that out (this is also why we hear a lot of "we need to reinvent github")

SV_BubbleTime

When I hire fresh out of college… I can see them coming in and not having the slightest comprehension of the difference of the things that they did in school to get a grade and never touch it again versus a product that is supposed to exist and work for 10+ years.

tyyyy3

The problem of life in general is the last 5-10% is always the hardest. And it makes no economic sense in many cases to invest in trying to make that last part mechanised.

I believe the llm providers went with the wrong approach from the off - the focus should’ve been on complementing labour not displacement. And I believe they have learned an expensive lesson along the way.

randyrand

I can go long session with it making great code.

But the first time I say “No, it should be …” it’s nearly game over. If you say it 3+ times in a row, you’re basically doomed.

Sure, you can get it to fix the bug, but it comes at the cost of future prompts often barely working.

NickNaraghi

Yes! Anthropic team calls this “regenerate, don’t fix.”

The person who builds an agentic IDE or GitHub alternative that natively does the process you describe will be a multibillionare.

dataviz1000

> https://github.com/adam-s/agent-tuning

Do you want a demo of what this is capable of?

skybrian

I tend to get something working and refactor my way out, which does work and you can use a coding agent to do it, but it takes time. Maybe starting over would have been better, but I didn’t know what I wanted the architecture to look like at the beginning.

deadbabe

That will not work as cleanly as you described once a lot of code has been committed to the code base. You cannot just blow away an entire working code base and start over just because an LLM is struggling to make a feature work with existing architecture.

gck1

This happened on every single greeenfield project that I've started with AI, no matter how rigorous process I've had defined.

And it's not just easier because it's cheap, it's easier because you're not emotionally attached to that code. Just let it produce slop, log what worked, what didn't, nuke the project and start over.

It just gets incredibly boring.

deadbabe

People will get attached to code that works just right and they don’t want to mess with it too much.

throwaway613746

[dead]

kelnos

Yup, the normalization of deviance here is a real thing. I still review all the code the LLM generates (well, really, I have it generate very little code: I use it more for planning, design, rubber-ducking, and helping track down the causes of bugs), but as time goes on without obvious errors, it gets more and more tempting to assume the code is going to be fine, and not look at it too closely.

But resisting that impulse is just another part of being a professional. If your standards involve a certain level of test coverage, but your tests haven't flagged any issues in a long time, you might be tempted to write fewer tests as you continue to write more code. Being a professional means not giving in to that temptation. Keep to your quality standards.

Sure, standards are ultimately somewhat arbitrary, and experience can and should cause you to re-evaluate your standards sometimes to see if they need tweaking. But that should be done dispassionately, not in the middle of rushing to complete a task.

And hell, maybe someday the agents will get so good that our standards suggest that vibe coding is ok, and should be the norm. But you're still the one who's going to be responsible when something breaks.

peterbell_nyc

For me the distinction is the quality and rigor of your pipeline.

Vibe coding: one shot or few shot, smoke test the output, use it until it breaks (or doesn't). Ideal for lightweight PoC and low stakes individual, family or small team apps.

Agentic engineering: - You care about a larger subset of concerns such as functional correctness, performance, infrastructure, resilience/availability, scalability and maintainability. - You have a multi-step pipeline for managing the flow of work - Stages might be project intake, project selection, project specification, epic decomposition, d=story decomposition, coding, documentation and deployment. - Each stage will have some combination of deterministic quality gates (tests must pass, performance must hit a benchmark) and adversarial reviews (business value of proposed project, comprehensiveness of spec, elegance of code, rigor and simplicity of ubiquitous language, etc)

And it's a slider. Sometimes I throw a ticket into my system because I don't want to have to do an interview and burn tokens on three rounds of adversarial reviews, estimating potential value and then detailed specification and adversarial reviews just to ship a feature.

Aurornis

If your slider only goes between vibe coding or agentic engineering you're missing an entire range of engineering where the human is more involved.

I've been using Opus, GPT-5.5, and some lesser models on a daily basis, but not having them handle entire tasks for me. Even when I go to significant effort to define and refine specs, they still do a lot of dumb things that I wouldn't allow through human PR review.

It would be really easy to just let it all slide into the codebase if I trusted their output or had built some big agentic pipeline that gave me a false sense of security.

Maybe 10 years from now the situation will be improved, but at the current point in time I think vibe coding and these agentic engineering pipelines are just variations of a same theme of abdicating entirely to the LLM.

This morning I was working on a single file where I thought I could have Opus on Max handle some changes. It was making mistakes or missing things on almost every turn that I had to correct. The code it was proposing would have mostly worked, but was too complicated and regressed some obvious simplifications that I had already coded by hand. Multiply this across thousands of agentic commits and codebases get really bad.

transcriptase

Next time give it the context required for the task, eg an explanation of why you have those hand coded simplifications, and be amazed at how proper use of a tool works better than just assuming your drill knows what size bit to pick.

bryan0

I agree, vibe coding does not have quality gate checks at each stage, while agentic engineering does. Dev teams get into trouble when they try build to build without a proper process of design, tests, and reviews. This was true before agentic coding, but it's especially true now. The teams that understand how to leverage agents in this process are the ones that will be most successful.

ofrzeta

"I want professionally managed software companies to use AI coding assistance to make more/better/cheaper software products that they sell to me for money.” (Simon Willison herein quotes Matthew Yglesias) - this is such a naive and sloppy take. What do you want? "better software"? not going to happen. "cheaper software"? not going to happen either. "more software"? for sure, but is it really what you want?

If I hire a plumber it's certainly not cheaper than doing it myself but when I am paying money I want to make sure it is better quality than what I am vibe plumbing myself.

vmaurin

> The entire software development lifecycle was, it turns out, designed around the idea that it takes a day to produce a few hundred lines of code. And now it doesn’t.

No, it was never designed around that. All methodologies of software dev don't focus too much on writing the code, but on everything else: requirement definition, quality, maintenance, speed of integrating feature, scaling the work, ...

Personally with 20 years of experience, I never seen a single company were writing the code was a bottleneck

cultofmetatron

2 days ago, we updated a stripe library which broke everything. With AI, I was able to one shot wrapping all of the calls into a shared service, patched the broken api contract across the entire app and got our signup and payment flows working again. solid day and a half of work. this would have taken a days of back and forth debugging previously. AI is not a panacea for everything but its doign valuable work right now.

bamboozled

What does this have to do with the article?

I'd say if you're a semi-competent developer, as probably many people reading the article and commenting already are, this comment adds nothing new to the discussion and would already be a very vanilla usage example of "AI".

I think the point is that while you can "do things" like extracting the stripe integrations out into their own service in ten minutes, you're not stepping into other problems, such as how do you handle failures, how do you scale the stripe service, how do you structure all your other micro services so they can communicate in a coherent way, basically you're speed running yourself into harder decisions when using AI.

Daily Digest email

Get the top HN stories in your inbox every day.