Specsmaxxing – On overcoming AI psychosis, and why I write specs in YAML

Daily Digest email

Get the top HN stories in your inbox every day.

brendanmc6

Author here, if you don't want to read all that, I'll post one excerpt that I think sums it up nicely:

> My point is, the spec must live somewhere, even if you don’t write it down. The spec is what you want the software to be. It often exists only in your head or in conversations. You and your team and your business will always care what the spec says, and that’s never going to change. So you’re better off writing it down now! And I think that a plain old list of acceptance criteria is a good place to start. (That’s really all that `feature.yaml` is.)

K0balt

I independently converged on something similar. I use two to three specification docs for my c++ work: a firmware manual (describes features and interfaces)) , an implementation plan (order of implementation, mechanisms where specified - new features go in here) and a product manual ( user story, external effects) I start with a user story, build an implementation plan, write the code, write the firmware manual, check the 3 documents +code for consistency and coherence. Either change the code or the documentation to reflect a coherent unified truth. (Implementation plan gradually becomes as-built) I also have the code comprehensively commented so that it is difficult to misinterpret. “Correct, coherent, consistent, commented”

We iterate feature by feature through this process, and occasionally circle back on the original product manual to identify drift.

After the original documentation is drafted, I have the agent write up placeholder files and define all of the interfaces we expect to need (we will end up adding a lot later, but that’s ok) every file should reflect a clear separation of concerns, and can only be reached into through its defined interface, all else is private. I end up with more individual files than I would by hand, but by constraining scope at file granularity, and defining an inviolate interface per file, I avoid the LLM tendency to take shortcuts that create unmaintainable code.

I also open each new context with an onboarding process that briefly describes the logos and the ethos of the project, why the agent should be deeply invested in the success of the project, as well as learnings.md which the agent writes as it comes across notable gotchas or strong preferences of mine.

Needless to say, I use one million context , and it’s a token fire… but the results are solid and my productivity is 5-10x

jacquesm

You have rediscovered the job of Software Analyst, which until the early 90's was a thing. Then that all got upended and we ended up with a mix between product owners, project managers and developers / devops but I think that that ignores the fact that Analyst is a different set of skills.

ebiester

There is a lot of room to reevaluate the lessons of software development pre-web in the context of the current environment.

Like, if waterfall of a project can be done in 2 weeks, is it agile now?

mpyne

> Like, if waterfall of a project can be done in 2 weeks, is it agile now?

Sure. The thing is, the waterfall guys would tell you it's impossible to do it in 2 weeks because you need to have written down everything first. "Thousands of pages" was the terms they used.

Agile guys would point you to the Agile manifesto which would lead you to "working code over documentation" and "people over process".

A 2 week period to go from initial spec to product in a user's hands to capture feedback and make changes from there is much closer to agile than to waterfall. In fact it's more or less exactly some older versions of Scrum (which didn't permit deviating from the planned sprint user stories midway through the sprint, instead changes influenced the subsequent sprint).

jacquesm

That doesn't do justice to either waterfall or agile.

moffers

I came up as a software requirements analyst before the weird transition between business analyst to product owner to product manager to technical product manager. But living in requirements for 15+ years really gave me a leg up on these “let’s go back to requirements!” efforts.

jacquesm

It always amazes me how bad the software world is at keeping lessons learned as learned, especially when compared to say engineering. It's as if every 20 years or so we throw away the books and reinvent it all from first principles, hopefully this time with fewer mistakes overall but usually we end up finding both new ones and re-do some old ones.

user3939382

Or referring more to the process of building the specs, requirements engineer(ing). Imho agile became a way of hand waving most of this critical process and responsibility, in place of a new inefficient and ill-defined process.

Yes, you don’t know the nuances of all specs upfront and revision will be necessary. Turning the ship with arbitrary degrees of freedom outside of bullet points on a roadmap is not an efficient way to resolve that for many projects.

Twey

The traditional name for this spec is ‘source code’ — a canonical source of truth for the behaviour of a system that is as human-readable as we know how to make it, that will be processed by automated tools into a less-readable derived artefact for a computer to execute.

Checking the compiled artefact into the codebase without checking in its source code has always been a risky move!

benterix

A specification, whether formal or less formal, is very different from the source code.

arikrahman

I agree with you, I think the replies are misunderstanding the basis for code and specs and making semantic distinctions. Code is specs, just in a different syntax for machines to understand. This is a pillar of the discipline of the discipline of requirement specifications that Uncle Bob talks about in Clean Agile.

sltr

spec isn't code. There's a C language specification and many implementations. There are a handful of browsers each implementing HTML, JS, and CSS specs in their own way.

Twey

And given a C description of a program, a C runtime can implement that program in various different ways — interpreted vs compiled, explicit memory management vs garbage collection, different pointer sizes and memory layouts, parallelism at various points or not. It's turtles all the way down :) It just becomes ‘code’ at the point where a computer can execute it (in one way or another) without further human intervention.

DeathArrow

>The traditional name for this spec is ‘source code’

Specs are the end goal, not how the software look at a moment in time.

Twey

Specs also evolve over time. There's no ‘end goal’ because requirements are always changing.

Specs are traditionally more forward-looking only because, by removing a lot of the implementation details that are required to write code, the specification can be written to be much broader in scope than code in an equivalent time period. But periodically we invent software that lets us automatically fill in more details of the software that now don't need to be specified by humans, and a level of specification that was previously ‘spec’ turns into ‘code’.

fragmede

Technology evolves and traditions change. What persists is the role, not the filename and its extension. Weddings are still weddings even after things went from painted portraits to film cameras to camcorders to smartphones to livestreams. Same with birthdays. Cards became phone calls, Facebook wall posts, group chats, shared albums, or generated videos (Sora, RIP).

The tradition of having a deck of punch cards evolved to having assembly, to Pascal, Fortran, C, basic. The important part is a human-auditable directive, not an opaque, generated artifact as the thing that matters.

have evolved and adapted. Photography, film cameras, polaroids, camcorders, digital cameras, smartphones, social media, Zoom/virtual attendees. Same with birthdays. Handwritten cards, to phone calls to e-cards, Facebook wall posts, video calls, shared photo albums and Sora (RIP) videos.

Yokohiii

> The important part is a human-auditable directive, not an opaque, generated artifact as the thing that matters.

Your arguments create a false dichotomy. You look at it from consumer perspective, while coding and it's artifacts are usually done by suppliers. If you change camcorder to tv advertisement, the requirements shift. The human auditable directive and the outcome matter. Coca Cola probably has very high standards for their IP (the directive) and doesn't care about the outcome (AI slop ads). The result is disgruntled consumers.

If you don't care about the "opaque" generated artifact, then you are Coca Cola.

beshrkayali

I wrote something similar recently about how agent-generated code lacks the institutional memory that human-written code has. There's nobody to ask why a decision was made (1).

“Specsmaxxing” is basically the right response to this. When you can't rely on authorial memory, you have to put the intent somewhere durable. Specs become the source of truth by default if we continue down the road of AI generated code.

1: https://ossature.dev/blog/ai-generated-code-has-no-author/

bizzletk

I've been attaching to my commit messages a Git Trailer [1] of the Session UUID from the Claude Code conversation that created that commit.

It allows Claude to look back into the session where a change was made and see the decisions made, tradeoffs discussed and other history not captured by code, tests.

[1] https://git-scm.com/docs/git-interpret-trailers

chickensong

A few questions:

- Does Claude leverage the trailers automatically, or is usage initiated by you?

- How often are you using the trailer lookups?

- Any idea how this relates to token usage? If you're frequently busting cache on old sessions, it might be cheaper to read a local doc.

nicbou

I had a similar experience refactoring a large codebase• The only thing that made it possible was that each commit message had a JIRA ticket number tying it to a requirement or task. I could find the people behind the business logic and ask them about it.

undefined

[deleted]

try-working

the recursive-mode workflow has full traceability, including why decisions were made, what the original requirement was, what the previous state was, etc. https://recursive-mode.dev/introduction

beshrkayali

[dead]

rogermarley

This ultimately converges on what source code is though.

The most common form of what you'd call a "spec" is the acceptance criteria on a work ticket, which is an accretive spec i.e. a description of desired change -- "given what already exists, change it as follows". I.e. if you somehow layered and summarized and condensed all tickets that have been made since product started, you'd have your "spec".

But it's the devs who were doing that condensing via understanding each desired spec addition vs reality of existing codebase.

So the gap between what people are currently calling "specs" what the code was already doing is not big and will not stay big, but for the fact you're effectively adding another (quasi) compile step underneath - and in this case its a non-deterministic one.

hansmayer

> will always care what the spec says, and that’s never going to change

Did I miss something or is everyone back in 1970s, working in waterfall processes now?

scandox

All through the agile era I wrote detailed specs for projects and then followed an agile process. The most successful parts of every project were the ones that we were able to spec best even when they diverged significantly from the original spec.

You don't plan to follow the plan. You plan in order to understand the whole problem space. Obviously no plan survives contact with reality.

Yokohiii

> You plan in order to understand the whole problem space.

I like to do spikes to understand problem spaces before planning. The planning is then usually effortless and just to get in sync with stakeholders.

But in that regard AI coding is really backwards. We don't necessarily need hard separation of planning and coding, but we need a deliberate separation of experimental/explorative coding and the code that is supposed to make it into prod. AI coding does all that in the same place, I don't even want to know how hard it is to "fix" AI code that started on behalf of a completely wrong premise. AIs certainly don't have a good measure when to refactor something completely messed up.

fsloth

Agree!

Another point of view is that LLM:s perform to an extent on the same level as outsourcing does. This interface requires a bit more contract mass than doing everything within single team.

chrisweekly

"Plans are worthless, but planning is essential."

hypendev

We never left waterfall in the end. Working with and for dozens, collaborating with probably a hundred software companies in different scales, every single one said:

We do agile

Guess what? Every single one of them was doing waterfall.

Their agile included preplanning and pre-specifying the full spec and each task, before the project kicked off. We'd have meetings where we'd drill down into tasks, folks would write them down so detailed that there would be no other way than doing that. Agile would be claimed, but the start date, end date, end spec and number of developers was always concrete.

Sometimes, the end date was too late, so a panic would ensue. Most of the time, the date was too late because developers had "unknowns" which then had to be "drilled down and specced so they wouldnt be unknowns". Sometimes, nearly 50% of the workweek was spent on meetings.

A few times, a project was running late - so to make sure we are _really_ doing it agile, we'd have morning standups, evening standups, weekly plannings, retrospectives, and backlog refinement. It would waste the time, and the "unknowns" aka "tickets to refine" were again, as always, dependant upon the PM/PO/CEO's wishes, which wouldn't get crystallized until it was _really last minute_.

One customer wanted us to do a 2 year agile plan on building their product. We had gigantic calls with 20+ people in them, out of which at least half had some kind of "Agile SCRUM Level 3 Black belt Jirajitsu" certificates.

To them, Agile was just a thing you say before you plan things. Agile was just an excuse to deal with project being late by pinning it on Agile. Agile was just a cop out of "PM didn't know what to do here so he didnt write anything down". Agile was a "we are modern and cool" sticker for a company.

And unfortunately, to most of them, agile was just a thing you say for the job, as their minds worked in waterfall mode, their obligations worked in waterfall mode, companies worked in waterfall mode, and if they failed their obligation to the waterfall, their job would go down one.

So while we were doing the Agile ceremonies, prancing around with our Scrum master hats, using the right words to fit into the Agile™ worldview - we were doing waterfall all along.

And after 15 years, I'm not even sure - did agile really ever exist?

OHfsfuiohsef

Continuous integration and demos to stakeholders (devs, designers, product managers etc) every 2 weeks - these practices are now engrained :-) It's frequent to then do corrections after these demos, and that really helps ensuring the product manager is getting what their customers need.

Easy to forget waterfall in 1970s / 80s really meant teams working on their own for months and then realizing there is no way to assemble the whole product from the parts. Or that the industry has moved on and the product is obsolete.

Agile as "devs can do what they want" never really existed ;-) Managers always have to plan / T-Shirt size resources (time, devs) to some degree. For stuff that's really hard to break into tasks, the magic word is "the plan is to do a POC first".

Coming from someone who also doesn't like teams being asked to break their unknowns into 30 known tasks. It's a compromise... I agree with all your points on how Agile is abused / misunderstood. Yet i believe in the progress from continuous integration and regular demos to stakeholders as a sign we did change something....

alemanek

Specs weren’t the problem with waterfall. The difficulty in changing them to match reality was.

The waterfall process I experienced went like this:

- Product folks created requirements

- architects produced detailed specs

- project managers created tickets based on specs

- lengthy estimation ensued.

- Then finally developers proceeded with implementation.

- QA tested it.

Each step above involved lengthy review with like 5-10people. If the devs found an issue with the spec or god forbid the requirement it triggered a massive cascade of work for everyone above. Things needed to be reviewed again, customers may need to get contacted, …etc.

I think we can learn from that and optimize for change. Specs as living documents close to the code should be less cumbersome. But, just like anything else large corporations will probably fumble this like they did with “agile” (SAFe I am looking at you).

This is a long way to say specs aren’t bad. Specs that are difficult to change are though.

wuiheerfoj

Sort of, but the downside of waterfall was you build the wrong thing and waste a shitload of time rewriting it.

When rewriting the entire codebase is very quick and cheap, why bother iterating on small components?

lnrd

> When rewriting the entire codebase is very quick and cheap, why bother iterating on small components?

We are nowhere near this scenario tbh. Token cost is very high and is currently heavily subsidized by VC money to gain market share. Also this realistically only applies to small projects, small codebases and mostly greenfield ones. No way you can rewrite the whole codebase quickly and cheaply in any mid-sized+ projects

But even assuming token cost plummets, any non-trivial piece of software that is valuable enough to generate income for the company is also big, complex, interconnected enough that cannot be rewritten quickly even by AI, also for business reasons too. If a piece of code works, is stable and is tested, then rewriting it will always bring a high degree of risk and uncertainty that in a lot of business critical applications is just not worth it. A stable system can stay untouched for years besides minor dependencies updates.

0123456789ABCDE

waterfall is not the sole purveyor of written docs

distributed teams do well when proposals, decision, etc, are written down, and can be easily found and referenced

it doesn't mean docs are frozen in time and can't be patched like code

nlnn

I read that as "the business caring about what the spec says will never change" rather than "the spec will never change".

JohnHaugeland

waterfall doesn’t mean writing down decisions

beshrkayali

[dead]

nalpha

What's the difference between this and Jira. Your specs already live somewhere, it's where you defined them. That's why it's nice to put the Jira ticket number in your code / commit, so you can refer back to the spec when something breaks

mike_hearn

A specification isn't a series of change requests! Using Jira as your source of truth is no different to just recording all your prompts. There's nothing you can easily review to spot contradictions or how things interact with one another.

I've been doing "specmaxxing" for a few months now. Unlike the author I don't use Yaml, I use a mix of Markdown and Gherkin. If you haven't encountered Gherkin before, it's not new and you might know it under the name Cucumber or BDD.

https://cucumber.io/docs/

Gherkin is basically a structured form of English that can be fed into a unit testing framework to match against methods.

The nice thing about writing acceptance criteria this way is that they become executable and analyzable. You write some Gherkin and then ask the model to make the tests execute and pass. Now in a good IDE (IntelliJ has good support) you can run the acceptance criteria to ensure they pass, navigate from any specific acceptance criteria to the code which tests it (and from there to the code that implements it), you can generate reports, integrate it into CI and so on.

And when writing out acceptance tests that are quite similar, the IDE will help you with features like auto-complete. But if you need something that isn't implemented in the test-side code yet, no big deal. Just write it anyway and the model will write the mapping code.

There's a variant of Gherkin specifically designed for writing UI tests for web apps that also looks quite interesting. And because it's an old ecosystem there's lots of tooling around it.

Another thing I've found works well is asking the models to review every spec simultaneously and find contradictions. I've built myself a tool that does this and highlights the problems as errors in IntelliJ, like compiler errors. So I can click a button in the toolbar and then navigate between paragraphs that contradict each other. It's like a word processor but for writing specs.

Once you're doing spec driven development, you don't need to write prompts anymore. Every prompt can just be "Update the code and tests to match the changes to the specs."

eterps

I agree, Cucumber works really well with LLMs.

> I use a mix of Markdown and Gherkin

Gherkin also has a Markdown based syntax that is not well known:

https://github.com/cucumber/gherkin/blob/main/MARKDOWN_WITH_...

I prefer that to the 'verbose' original syntax. MDG also renders nicely in code forges.

MoreQARespect

The problem with gherkin is that it was a badly designed language.

The general idea of "readable specification language" was an inspired one but it failed on execution - it has gnarly syntax, no typing and bad abstractions.

This results in poor tests which are hard to maintain and diverge between being either too repetitive to be useful or too vague to be useful.

The ecosystem is big but it's built on crumbling foundations which is why when most people used it most of them got frustrated and gave up on it.

Annoyingly there's a certain amount of gaslighting around it too ("it didnt work for you coz you werent using it correctly") which is eleven different kinds of wrong.

try-working

I solved this five months ago with recursive-mode: recursive-mode.dev/introduction

cowanon77

Jira is only a set of changes though. What happens on a long (10+ year) and complex (10+) developer project with many changes and revisions? Eventually you need an explicit specification that itself has a "current state", and a change log. Theoretically you could generate this from Jira, but in my experience it eventually became a mess on any larger project that didn't have explicit and maintained writen requirements.

foobarbecue

Jira has current state and a change log. The proposal here is "use yaml instead of jira." Same damn thing, same damn mess.

Diti

What about when you migrate away from Jira, or when there’s a Cloudflare outage?

foobarbecue

1) export 2) backup

gnat

Nice! Your spec-maxxing is very resonant. I've been doing working with explicit requirements: elicit them from conversation with me or introspecting another piece of software; one-shot from them; and keep them up-to-date as I do the "old man shouts at Claude" iterations after whatever one-shotting came up with.

Unlike you, I wish for the LLM to do as much of the work as possible -- but "as possible" is doing a lot of work in that sentence. I'm still trying to get clear on exactly where I am needed and where Opus and iterations will get there eventually.

It has really challenged me to get clearer on what a requirement is vs a constraint (e.g., "you don't get to reinvent the database schema, we're building part of a larger system"). And I still battle with when and how to specify UI behaviours: so much UI is implicit, and it seems quite daunting to have to specify so much to get it working. I have new respect for whoever wrote the undoubtedly bajillion tests for Flutter and other UI toolkits.

gnat

Forgot to add: I get several benefits from doing this.

1. Specifications that live outside the code. We have a lot of code for which "what should this do?" is a subjective answer, because "what was this written to do?" is either oral legend or lost in time. As future Claude sessions add new features, this is how Claude can remember what was intentional in the existing code and what were accidents of implementation. And they're useful for documenters, support, etc.

2. Specifications that stay up to date as code is written. No spec survives first contact with the enemy (implementation in the real world). "Huh, there are TWO statuses for Missing orders, but we wrote this assuming just one. How do we display them? Which are we setting or is it configurable?" etc. Implementer finds things the specifier got wrong about reality, things the specifier missed that need to be specified/decided, and testing finds what they both missed.

I have a colleague working on saving architecture decisions, and his description of it feels like a higher-abstraction version of my saving and maintaining requirements.

try-working

Specifications doesn't tell you what to do, they say what the end state should be. In between that you need a codebase analysis step and an implementation plan.

My recursive-mode workflow handles all of that and more and gives you full traceability: https://recursive-mode.dev/introduction

energy123

I do (1) the same but (2) differently. In my workflow, (2) are AI generated specs using human written (1) as the input. It's an intermediate stage between (1) and the codebase, allowing for a gradual token expansion from 30k to 250k to the final code which is 2-3M. The benefit I've found with this approach is it gives the AI a way to iterate on the details of whole system in one context window, whereas fitting the whole codebase into one prompt is impossible. The code is then nothing more than a style transfer from (2).

chrisldgk

At this point, why not just write the code yourself? Defining exactly what the product is supposed to do is the hard part, writing code is the easy part. Write your specs as code and you have your product - why let your LLM do the fun part?

energy123

I do this because I'm wagering that LLMs will keep getting better. I'm wagering that specs will maintain value while code will degrade in value (become commoditized).

Code lacks the surrounding theory that situates the code in the world [1]. My specs contain the theory that the code lacks, which makes specs more valuable in the future. Specs are proprietary data. Data holds value in a post-AGI world, not code.

I am defining specs to be more than just an architectural spec, to me it's more like I'm writing a booklet about a subject, and I'm using it to teach the LLM via in-context learning. It might need a different word than "specs".

[1] https://pages.cs.wisc.edu/~remzi/Naur.pdf

Stehfyn

Piggybacking here; I'd describe it like a fish ladder. Instead of "teach" I'd say "orient." LLMs are a force whose magnitude is undeniable and increasing, but it's up to us humans to provide the theory that exerts the magnetic forces to naturally encourage them in the right directions.

joshribakoff

> I’m wagering

So isn’t that gambling, not engineering?

Stehfyn

It's a reductive inverse corollary, but highly skilled Blackjack players are known to hesitate hitting on 18

Stehfyn

Just because I am capable of "writing all that code", doesn't make the option preferable to defining a vast majority of spec up front and having an LLM generate an implementation. I am already going to spend the brain power on reviewing the code. I am already going to spend the brain power on pontificating edge cases, external module interactions, and next steps. Why not fast forward to that point and save 80% of the time (and brain power/attention/motivation to boot)?

moregrist

If you can define the spec up front, this is probably true.

For anything large, the spec becomes increasingly more complicated. Look at software schedules in the old waterfall days of the 80s/90s: the spec / planning period was maybe 30-70% of the project.

Unless you’re working on pretty routine stuff, the real problem is that the customer (which might be you) almost never knows what they want. The spec will change the minute a customer gets something to play with.

This was the real value of agile in my mind: letting a customer change their mind as early as possible.

lelanthran

> I am already going to spend the brain power on reviewing the code.

Very few devs are actually reviewing any generated code.

> Why not fast forward to that point and save 80% of the time

If you are saving 80% of time, you aren't actually reviewing the code.

Stehfyn

I use a multimodal approach to defining my spec: different layers of criteria for how the software looks, behaves, what it produces, and under what constraints.

For the literal code:

• A healthy cocktail of /WX + /Wall, plus clang-tidy with very few suppressions

• An extremely opinionated mix of clang-format and LLM-generated bespoke formatting that AST-based tools cant express

• Hungarian notation; all stack locals pre-hoisted, declared in order of appearance, and separated from subsequent assignments

• Enforced dataflow: all memory accesses are bounded independent of branch resolution, with only data-oblivious indexing

• Functions have a single point of return

In a C89 workflow, this pushes agents to produce code where wrong business/domain decisions are unmistakably obvious, while eliminating the vast majority of bug classes before I ever read it.

So yeah, Ill reassert 80%, if not more.

surgical_fire

> Very few devs are actually reviewing any generated code.

Just because very few devs are qualified at doing their fucking job, it doesn't make someone trying to use AI properly wrong.

> If you are saving 80% of time, you aren't actually reviewing the code.

The idea is that if you spend time in specification ahead of time, reviewing and validating will be easier and less time consuming later.

I haven't tried it myself, but the idea rings true to me.

codebolt

I'm with you all the way here. I derive zero pleasure from simply typing out the code once the spec is clear. Having a fast forward button to skip that phase is a pure win in my book.

honr

I do get pleasure from typing out the code in some languages (and not in others; hello javascript, java!). Similarly, I love writing text with a calligraphy or fountain pen. However, I can't dedicate too of the much work / business time to whatever is more pleasurable.

So, I "doodle" some text / ideas / planning with a calligraphy pen, and type in some code, occasionally, both mainly for the fun aspects. There are side benefits to both, too. Writing some plans slowly and "beautifully" drags them out and I get to think longer on them, so the sporadic "nice looking plans" are often more well thought. And doing the coding all by myself stops my brain from losing the ability. I was initially in the 100% AI-writes-all-code camp for a while and noticed I am getting notably slow in some personal coding skills. It is too early to treat specs as the new code and old languages as assembly (but I admit we might get there some day).

In other words, I think AI doing 90-99% of the coding, depending on the language verbosity and AI accuracy for the code at hand, is quite reasonable.

jaredcwhite

100% the opposite here. I derive all the pleasure from writing code, which is why I'm still writing code.

sigmonsays

a spec can be wrong until you prove it is right..

BoredPositron

Not a developer would come to mind.

enraged_camel

>> Defining exactly what the product is supposed to do is the hard part, writing code is the easy part.

There is a massive difference between a spec, which defines what the product should do, and code, which defines exactly how it should do it. Moving from the former to the latter is not "the easy part". Anyone who genuinely believes that either works on easy and straightforward problems, or is some sort of programming god. Because translating specs to code can still be difficult and exhausting.

maxnevermind

> Defining exactly what the product is supposed to do is the hard part, writing code is the easy part.

> There is a massive difference between a spec, which defines what the product should do, and code, which defines exactly how it should do it.

He states: The difficult part is figuring out the details so LLM doesn't save much time. You state: If LLM is able to correctly assume the details that saves you a lot of time.

Case 1: Part of the spec describe some basic feature based on a popular framework and industry standards, everything is trivial. You are right, he is wrong.

Case 2: Part of the spec describe some niche feature and/or uses some not popular framework and/or require deviation from industry standards and/or cutting edge performance/latency requirements and/or uses a bunch of proprietary non-googlable data. You are wrong, he is right.

The more senior engineer are the less time they spend on case 1, those are easy, they don't spend much time on it, it is the 2nd which is much more time consuming.

procgen

[dead]

jFriedensreich

Where is the part where the author overcomes ai psychosis? Reads like digging in deeper and deeper.

brendanmc6

Fair, I could have made that point clearer. It's a couple things. First is that I finally stopped experimenting with TUIs, harnesses, models, subagents, roles, skills, mcp, md libraries etc. and have mostly settled on this approach, and got back to building other things with it. I'm sure that won't last forever though.

Second is that I'm doing a lot less "seat of my pants prompting" and doing more engineering and ideating, which was a big goal of mine. So I'm feeling less psychotic there too.

And sort of tangentially to that, I think a significant subset of devs actually are willing to just prompt their way to nirvana, day in and day out. I'm not. I think the spec will carry a lot of weight for a long time. Maybe they will get further than I give them credit for? Maybe the whole digital world becomes a single chat box?

dgellow

I don’t understand how that relates to AI psychosis?

brendanmc6

I guess I misappropriated the term then, woops. AI OCD? AI obsession? Whatever you call the behavior that I saw myself and others falling in to. Getting obnoxiously fixated on the tooling and the models to a counterproductive degree.

khalic

Some people seem to give very little thought to semantics and semiotics lately, to the point where people use words vaguely without even looking it up.

aliceryhl

This is not psychosis.

wiseowise

That’s the best part: you don’t. “You would extend the prompt to improve it”. They’ll just ask Claude to write an AI tool to overcome psychosis (the program will spam Anthropic servers with racial slurs which will promptly cause ban of the user, success).

stevefan1999

So...is this just Cucumber cough cough behavior driven design again, but stored in YAML so that LLMs can read it easier by loading the AST instead of tokenizing the text?

cube2222

Love the writing style!

> Nothing beats an organic, pasture-raised, hand-written spec.

Hah, I strongly empathize with the wording. I’ve been starting my design docs for fellow humans with “100% hand-written, organic content”, I might steal a part of yours.

Overall, cool idea. I don’t see myself using your SaaS, but the approach of tagging the requirements and constraints to make them easier to find sounds good.

One project you didn’t mention which I think is also, I think, a cool perspective on this is codespeak.dev , but I haven’t given it a go yet.

All in all, I feel like maintaining specs, and having agents translate spec diffs into code diffs is a promising area for the future. Good thing I enjoy writing!

hintymad

Maybe it's just me, but isn't it exhausting that we have to do all kinds of work like a Shaman to combat the probabilistic nature of LLM, while knowing from the bottom of our heart that LLM can still screw up the code in the most creative way?

akomtu

AI makes us believe that instead of working towards a goal, one can "win" that goal with a lucky prompt. AI replaces thinking with gambling, in other words, and it's very tempting to many.

jeffreygoesto

Old ist new I guess. This is independent of whether A"I" or a human executes, the point is that you need this if specifying and execution lie apart, be it in time or space. This is basically the whole point of the V-Model and processes (if used correctly as a tool and not preferred as goals) and was already researched an formalized in the 60s and 70s.

arikrahman

I use OpenSpec for my spec management, and I scrolled down to the comparison. The gripe seems to be with a semantic difference. Specs describing a current system is the basis for AS/IS Gap Analysis.

Also, I mainly pursue these tools so that I can have AI accelerate this process and broker an agreement after negotiating specs with the agent.

jochem9

I'm also doing openspec for a few months now and it's really good if you invest enough in the specs (in the beginning I skimmed over much, now I pay attention to all details and fix anything that's wrong or where I see a gap).

The one thing I like that OP brings is to tie specs and code together. The openspec flow does help a lot in keeping code synced with specs, but when a spec changes, AI needs to find the relevant code to change it. It's pretty easy to miss something in large codebase (especially when there is lots of legacy stuff).

Being able to search for numbered spec tags to find relevant bits of code makes it much more likely to find what needs to be changed (and probably with less token use too).

alasano

I enjoy the OpenSpec format but I think maintaining the main specs is not worth it.

I've stopped doing it entirely and just archive directly after implementation.

When you do the sync process, it just keeps drifting and drifting until you have duplication and contradictions across specs.

I agree that tying the specs and code together helps for that but it still seems like extra overhead, even if the value is better justified here.

energy123

I can see one benefit to a structured yaml for specs like the OP is doing: it gives you more control over what you include in the context window. But coming up with a good schema that doesn't handicap you or add cognitive burden, compared to the freeform flexibility of md/txt, is a challenge.

arikrahman

If the selling point is a new file format for spec management, it would be more interesting to provide an offering with org-mode. The author admits they were unaware of other pre-existing solutions before this project so I am providing context to their critique of OpenSpec.

jwpapi

And once you’ve written all these specs you realize it became so slow that it’s faster to do it yourself in editor

girvo

People don’t actually track wall clock time, I’ve noticed.

oblio

Nope, they never do.

That's how you end up with those cooking recipes that only "take 5 minutes". Sure, if you don't count buying all the ingredients, cleaning and preparing them, cleaning up the pots and pans (and probably the worktop, stove, etc), a lot of things can take 5 minutes. Even trivial stuff like scrambled eggs don't actually take 5 minutes when you take everything into account.

Reminds me when I automated a manual service deployment that only "took 5 minutes". Sure, copying the binaries only took 5 minutes, but coordinating between various departments to deactivate the relevant monitoring bits, turning off the services, invalidate the caches, etc, etc actually took half a day with humans involved. Once automated and parallelized the thing took about 10 minutes for a data center.

wiseowise

But have you thought about “fun factor”? It’s where you sit like an addict in a casino for weeks and burn tokens in a hope of winning a software that you could’ve written? Who doesn’t consider “fun” thinking about work crap all the time, writing to your agent, verifying walls of slop?

motoxpro

at which point you realize you never had a plan written down and you are using the code as a spec

jbjbjbjb

Which takes us back to this:

https://haskellforall.com/2026/03/a-sufficiently-detailed-sp...

smokel

Quite some people here dismiss requirements engineering as something non-agile and ancient.

If you are in this camp, consider educating yourself a bit on the V-model [1] and notice that this is not only used in the waterfall model, but that it is a way to decompose problems and verify that everything works properly.

This may not be required for a small hobby project, but if you start working at something with multiple companies in various technologies, it soon becomes extremely useful.

[1] https://en.wikipedia.org/wiki/V-model

ffsm8

Why is the vibecoding crowd still holding onto the idea that markdown (or here yml) is a better spec then code?

Seriously, it's just not

Write your code like it's your spec and your software will be more stable, maintainable clearer to read.

Code is not transient, it is your friggin spec itself

And if your code isn't structured like it's a spec, then your code is garbage from the perspective of LLM driven development

brendanmc6

I think you are confusing the spec as "this is how it must be built", as opposed to, "this is what the software must do and must not do to be acceptable".

To me saying "the code is the spec" is like saying "the business wants it this way because that's how the code is written". Which is obviously backwards.

Does the business mandate we use a cache for this hot path? No, but the business set performance targets, and the cache was a sensible way to satisfy them. See the difference?

I believe that the 'musts' and 'must nots' deserve special attention, and need to be recorded well before I decide on the 'how'. Every team does this differently. I find that writing itemized, functional acceptance criteria is practical way to marry the two domains. I also think the process matters a lot more now, because the temptation to let an agent ship it is increasing and the tedium of maintaining these specs is decreasing.

kibwen

> Does the business mandate we use a cache for this hot path? No, but the business set performance targets, and the cache was a sensible way to satisfy them. See the difference?

This seems confused. Specs are free to include as much or as little detail as they deem necessary. If a spec only wants to suggest vague performance goals and handwave the details, that's permitted. But if specs want to specify the exact means by which performance will be guaranteed, that's also permitted. And this isn't an anti-pattern, this is often very useful. For example, plenty of APIs in the real world specify algorithmic upper bounds for time and space consumption, which is useful in that they allow downstream consumers to have a greater understanding of what sort of performance their own systems will exhibit despite the API itself being a black box in other respects.

So the answer to the original question definitely isn't "no", it's "maybe, depending on the sort of guarantees we want to provide to our users".

brendanmc6

I think we are actually saying the same thing! We could think of situations where the cache would be verboten (sensitive info), or where it would be mandatory, like in your example, or optional like in my example.

My aim was to voice disagreement with the "code is spec" crowd, whom I think are using a different (and in my opinion tautological / useless / counterproductive) definition of spec. Probably because they are mad that I use trigger words like Vibe and Maxxing, and they assume I can't even read the code I'm shipping. I digress.

In your "time complexity is a downstream requirement" example, which is a great one, I think you would prefer to have well-maintained written documentation of that criterion that lives outside of the procedural code itself, would you not? How much attention that doc gets is a matter of process and preference, but I'm advocating it should get more (spec-first).

undefined

[deleted]

lelanthran

> I think you are confusing the spec as "this is how it must be built", as opposed to, "this is what the software must do and must not do to be acceptable".

You can't enforce a "do not do this" to an LLM. Just putting it in the context by saying "don't do this" makes it more likely that it will eventually do that.

jychang

Yes, I agree. If you tell humans "do not think of pink elephants", they are more likely to think about pink elephants.

Therefore, you must not use humans for any important work.

undefined

[deleted]

mpalmer

> I think you are confusing the spec as "this is how it must be built", as opposed to, "this is what the software must do and must not do to be acceptable".

You are confusing code with application code. The latter thing you describe is a test, which is expressible in code.

locknitpicker

> To me saying "the code is the spec" is like saying "the business wants it this way because that's how the code is written". Which is obviously backwards.

Not only is it backwards, it is a belief that is completely wrong and detached from reality. More often than not, implementations contrast with business requirements both in terms of bugs and gotchas.

Also, it's laughable how code is depicted as the realization of any spec when the whole software development sector is organized around processes that amount to improvising solutions in short iterations.

brap

Software engineering is different from other engineering disciplines in that the most explicit spec of the thing you’re building is the actual thing itself.

When you want to build a bridge you finalize all the blueprints and then someone goes and actually pours concrete, in software the blueprint is the code, and the code is also the bridge.

However there are different levels of abstraction for writing specs and code is just the most explicit form. With LLMs more of our time can be spent in those higher levels of abstraction and free us from work that is often repetitive and mundane.

I think the (distant) future of software engineering is not code writing but mostly requirements writing, and so it makes sense to build frameworks, “IDEs”, etc. around this new form of “programming”.

I don’t know if ACAI is the right one but the direction is interesting.

oblio

> When you want to build a bridge you finalize all the blueprints and then someone goes and actually pours concrete

Construction has plans "as designed" and "as built".

anonzzzies

Most 'programmers' cannot read or write code very well (or reason about structure or architecture) and so they want to 'program in english'.

musebox35

Not all parts of the code is equal in this respect. Those parts pertaining to the user visible portion (API of a library, command args of a CLI, UI of a GUI/TUI app, endpoints in a web service, etc.) are closely related to the spec. The rest is more fluid as long as it does not change user visible behavior. The choices still affect maintenance and debugging costs, so there is some pressure to not YOLO these portions. I think the most difficult design decisions relate to how to separate the two and how to ensure a smooth evolution of both user facing and programmer facing design decisions.

What is different now is that maintainability and debugging design decisions were made w.r.t. human coders or teams in the past which is not necessarily the case anymore. Should we just specify the API and let agents figure the rest or do we still want to control the rest to ensure maintenance and security? A year ago I definitely thought so. Now it is more murky as the agents are faster browsers of codebases and can explore runtime effects faster than I can type and parse output. Strongest empirical observations depend on the runtime behavior so they have an edge there.

smokel

No, code is not the same as a specification. There are a lot of accidental implementation details that the customer may not require. By making the code the specification, you end up with those details becoming requirements, and you can then no longer change that implementation.

Having functional behavior and implementation details separate can be really useful, even though it is typically a pain to keep them in sync.

ffsm8

Your comment reads like you haven't understood my point.

I'm not saying that you can just take any code out there and call it your spec.

I even pointed out in my last paragraph that code not written as a spec is garbage for LLM driven development - precisely because you end up with unintended implementation details becoming your spec.

There are a lot of ways to address this, but ultimately it is down to how you structure your code, where you place your comments, and what you write into your agents.md in the modules and what kind of QA agents you configure to go through your code before reviewing the code changes/spec adjustments yourself

A more heavy handed approach could also be domain driven design, where your domain/core package becomes the spec, and the unspecified parts get extracted to less specified modules with less explicit structuring.

rogermarley

Exactly. There's little gap between a spec that's been written to the level of detail needed and just code. There's some, but it's not a big gap after decades of umpteen new frameworks and languages and new forms of abstraction.

The core of the misunderstanding is between new builds and making changes to existing builds (where most software dev work actually happens). Yes, you'll get a great headstart with a detailed spec for a new build. The issue is in the hundreds of changes that'll follow that.

Do people think that the desire to make shortcuts and do minimum effort changes is going to stop just because you've got a bit-more-natural-language-looking spec? And then with an AI underneath making probabilistic changes to code that's now basically a compile target - they really think the dev pace isn't going to collapse, but just faster and with a big ongoing inference bill?

The LLM's do not form mental models. You are not going to get a better results from an LLM vibe coding against spec diffs vs a dev prompting it from a position of understanding the codebase and the requested change.

alasano

Because I don't trust LLMs to fully implement what I want on the first try unless I babysit them. And it's the hand holding that burns people out.

I use detailed specs to implement but I don't maintain those specs as the source of truth afterwards, the code is indeed the source of truth.

I've built a library (and products on top) that takes in requirements (programmatic or various spec formats) and forces an externally orchestrated implement -> review -> fix loop that doesn't stop until all requirements are met.

So I'll write a detailed spec then I'll have GPT 5.5 implementing and a mix of opus 4.7 / GPT 5.5 / DeepSeek v4 pro reviewing at every phase until it produces the quality I want.

I can let it run overnight or just during the day while I'm doing stuff that doesn't burn me out and that I actually enjoy.

So tldr spec first for me but not as the source of truth afterwards.

I'll be open sourcing and launching soon https://engine.build

bdcravens

It needn't be an either/or. I find writing pseudocode in a structured, but flexible, manner highly effective.

wesselbindt

I'm still confused as to why folks don't just write executable specs.

eterm

Ambiguity is the grease that keeps everything turning.

mike_hearn

Some of us do! That's called Gherkin.

cenamus

So basically tests?

MoreQARespect

Yes, except a test can be turing complete - i.e. code.

An executable spec like gherkin or hitchstory is config - it has no loops or conditionals. There are a number of rarely recognized benefits to this.

carlbarrdahl

Could you expand on this?

booi

code

arikrahman

Literate programming would provide specs and code instead of working backwards from hard coded functions to figure out specs.

fudgeonastick

If you're confused, and have tried Opus for coding, I'm keen to hear what problems or workflows it's not good at.

If you're genuinely confused, and haven't tried Opus for coding, then it's not surprising you're confused!

It is also okay for you to just not like the idea of LLMs for coding (but say that!).

wiseowise

I’m using Opus 4.6 and I’m so confused! Maybe I should try Opus 4.7, which is almost twice as expensive to get some clarity (but not too much, I need to save money for Opus 4.8)?

oytis

That's what the article is about - overcoming problems with AI cooding tools using specs in Yaml. If we've got that far, it might be better to write specs in a proper programming language instead and skip the AI layer altogether

adi_kurian

Think the idea is to still get monumental acceleration between fancy YAML specs (bullet points with some indentation that an intelligent technical manager could write) and production ready code.

Daily Digest email

Get the top HN stories in your inbox every day.