An AI agent deleted our production database. The agent's confession is below

Daily Digest email

Get the top HN stories in your inbox every day.

juliansark

Guy gives non-deterministic software root access, desaster happens. Movie at eleven.

Also, it's not a "confession". It's an LLM stringing together some tokens that form words trying to make a pleasing-sounding answer. Plus, the first sentence and the context implies that someone gave it a prompt that told it to never guess around but get stuff done. OP branding this as a confession tells you everything you need to know: total and absolute failure of guard rails, but these guard rails can not be expected to be in an LLM.

pier25

> Guy gives non-deterministic software root access, desaster happens.

I agree the guy is an idiot for trusting these AI models.

OTOH AI companies keep running and marketing their services with zero accountability for mistakes.

CivBase

Exactly.

Prompts are just weights on a graph traversal. They don't guarantee anything. The LLM does not "understand" the prompts and so it cannot fully adhere to them. They only improve the liklihood it will output what you want.

Never ever ever give an LLM access to something you can't afford to break. And stop thinking of them like people.

aeve890

>total and absolute failure of guard rails

It seems here the guard rails at failure were the llm users right? Whatever guard rails you can think may be useless against the superior human stupidity.

Also, what's the LLM use policy at the SD-6?

ad_hockey

Minor point, but one of the complaints is a bit odd:

> curl -X POST https://backboard.railway.app/graphql/v2 \ -H "Authorization: Bearer [token]" \ -d '{"query":"mutation { volumeDelete(volumeId: \"3d2c42fb-...\") }"}' No confirmation step. No "type DELETE to confirm." No "this volume contains production data, are you sure?" No environment scoping. Nothing.

It's an API. Where would you type DELETE to confirm? Are there examples of REST-style APIs that implement a two-step confirmation for modifications? I would have thought such a check needs to be implemented on the client side prior to the API call.

alecco

Guys, did you bother checking the poster's profile? https://xcancel.com/lifeof_jer. SEE THE TWEET BELOW. Smells like a ragebait post to me. Also search online for his alleged "PocketOS" company with software for car rental businesses. I couldn't find anything on Google. (Of course, I might be wrong)

"The future of SEO is AIO" https://xcancel.com/lifeof_jer/status/2034409722624061772 March 18

juhanima

There seems to be quite a lot of stuff here [1]

Seems legit to me. The oldest news item is from 2021. The domain name is new, but there seems to have been some rebranding lately. The product used to be called Pocket RentalOS and even that seems to be fairly recent rebranding [2]

[1] https://pocketos.ai/ [2] https://pocketos.ai/news/pocket-rebrands-its-luxury-rental-m...

motbus3

Interesting. Indeed there are some sketch stuff

dpark

Eh, it seems to be real, but all vibe coded.

https://pocketos.ai/

kokada

I don't think this is a minor point. It seems clear by this point that the author is clueless how even API works and are just trying to shift blame for third-parties instead assuming that they're just vibecoding their whole product without doing proper checks.

Yes sure, there seems to be lots of ways this issue could have been mitigated, but as other comments said, this mostly happened because the author didn't do its proper homework about how the service they rely their whole product works.

whartung

It's also moot.

If the API replied "Are you sure (Y/N)?" the AI, in the mode it was in, guardrails completely pushed off the side of the road, it would have just said "Yes" anyway.

If you needed to make two API calls, one to stage the delete and the other to execute it (i.e. the "commit" phase), the AI would have looked up what it needed to do, and done that instead.

It's a privilege issue, not an execution issue.

kokada

Exactly, that just reinforces the fact that the author is just blaming others instead of getting any valuable insights about this "postmortem analysis".

vasco

He also seems to be lying, he wrote on Twitter the agent was in plan mode. That part has to be exaggerated.

eloisius

I can’t say for sure, but I think Claude’s mode is nothing more than part of the system prompt. I don’t think it actually takes away web request or file write tools. I say this because I could swear I’ve seen Claude go ahead and make some changes even while we’re in plan mode. Web requests certainly, because it can fetch docs and so forth.

hacker161

“Plan” vs “execute” modes seem more like suggestions the models _mostly_ follow. I have absolutely had models (Codex and Sonnet/Opus) perform actions in plan mode they should never have been able to take like editing files or starting to work on a plan that was just created.

falcor84

It's not common, but I've personally built APIs where requests for dangerous modifications like this perform a dry run, giving in the response the resources that would be deleted/changed and a random token, which then needs to be provide to actually make the change. The idea was that this would be presented in the UI for the user to confirm, but it should be as useful or more by AI agents. Also, you get the benefit that the token only approves that particular modification operation, so if the resources change in between, you need to reapprove.

merelysounds

I guess we don’t know what the agent would do after seeing these warnings and a request for extra action.

Perhaps it would stop and rethink, perhaps it would focus on the fact that extra action is needed - and perform that automatically.

I suppose the decision would depend on multiple factors too (model, prompt, constraints).

ErroneousBosh

Measure twice cut once seems to be forgotten these days.

ykvch

As well as: A computer can never be held accountable

zimpenfish

"Measure twice, THINK ONCE, cut once" is even better[0].

[0] Why yes, I have measured twice, cut once, and made a right old balls up.

easton

AWS actually has a thingy on some services called “deletion protection” to prevent automation from accidentally wiping resources the user didn’t want it to (you set the bit, and then you need to make a separate api request to flip the bit back before continuing).

I think it’s designed for things like Terraform or CloudFormation where you might not realize the state machine decided your database needed to be replaced until it’s too late.

chrisandchris

And then, someone added IAM so you could actually restrict your credentials from deleting your database.

First mistake is to use root credentials anyway for Terraform/automated API.

Second mistake is to not have any kind of deletion protection enabled on criticsl resources.

Third mistake is to ignore the 3-2-1 rule for backups. Where is your logically decoupled backup you could restore?

I am really sorry for their losss, but I do have close to zero empathy if you do not even try to understand the products you're using and just blindly trust the provider with all your critical data without any form of assessment.

throwaway041207

GCP Cloud SQL has the same deletion protection feature, but it also has a feature where if you delete the database, it doesn't delete backups for a certain period of days. If someone is reading this and uses Cloud SQL, I highly suggest you go make sure that check box is checked.

andy81

Agents will happily automate away intentional friction like a confirm prompt, even if you organise it as multiple API calls.

The fix needs to be permissions rather than ergonomics.

causal

There's also a cooldown period on some deletes (like secrets) to make sure you don't accidentally brick something

jeremyccrane

This should be the solution. All destructive actions require human intervention.

Someone1234

If we take that literally, then just remove all destructive API endpoints. Because then, it they no real purpose, you cannot automate the removal of anything.

I think some other suggestions are saner (cool-down period, more fine-grain permissions, delete protection for certain high-value volumes). I don't think "don't allow destructive actions over the API" is the right boundary.

gizmondo

A human representing the company should be physically present in the provider's office to perform such an action or what? Otherwise you would just grant your agent a way to impersonate a human.

rdevilla

The stupidity of people sinks to new lows every day. It's astonishing just how ignorant people are of table stakes, basic technological concepts.

You just gave an AI destructive write access to your production environment? Your production DB got dropped? Good. That's not the AI's fault, that's yours, for not having sensible access control policies and not observing principle of least privilege.

dabinat

I agree that this is the author’s fault considerably more than it is Railway’s, however I have learned from experience that no matter how many “are you sure you want to do this” prompts you have, sometimes users delete stuff they didn’t intend to delete and it’s better to not delete immediately but put it in a queue for deletion in a few hours and offer a way to reverse it. Even if it’s 100% user error, the user is very happy they didn’t lose data and the cost of storing it for an extra 5 hours or so is tiny.

gizajob

Funny how he points the finger at everyone but himself.

saidnooneever

the kind of attitude you really need to get your agents to delete your prod lol

juliansark

Many companies have been doing this for years. Merely flagging my data for hiding and eventual deletion instead of deleting it, when I wanted it deleted as per GDPR :)

Ekaros

User is an idiot for using AI Agent. But I am not saying that it is not also badly designed system. Soft delete or something like should be standard for this type of operations. And any operator should know well enough to enable it for production.

noxvilleza

> Are there examples of REST-style APIs that implement a two-step confirmation for modifications?

A pattern I've seen and used for merging common entities together has a sort of two-step confirmation: the first request takes in IDs of the entities to merge and returns a list of objects that would be affected by the merge, and a mergeJobId. Then a separate request is required to actually execute that mergeJob.

stanfordkid

I don't think you can really blame AI agents for this. While I agree the user was using AI irresponsibly, some of the blame does go to Railway for making an API key that allows for all operations to happen from a single key without giving clear warnings on privileges. Clearly this user was shooting from the hip and quickly pasted whatever key they got from Railway into a file somewhere so there is some blame there, but any service that handles hosting infrastructure should provide clear UX warning to users regarding the scoping of it's credentials.

lmf4lol

Interesting story. But despite Cursors or Railways failure, the blame is entirely on the author. They decided to run agents. They didnt check how Railway works. They relied on frontier tech to ship faster becsuse YOLO.

I really feel sorry for them, I do. But the whole tone of the post is: Cursor screwed it up, Railway screwed it up, their CEO doesnt respond etc etc.

Its on you guys!

My learning: Live on the cutting edge? Be prepared to fall off!

ranguna

I get what your saying, but this is resonating with me and making me feel for the author:

Cursor: we have top notch safeguards for destructive operations, you have our guarantee, we are the best

Author: uses their tools expecting their guarantees to be true (I would expect them to have a confirmation before destructive operation outside their prompt, as a coded system guardrail)

Cursor AI: Does destructive operation without asking

Author: feels betrayed.

So yeah, I think the author is right because they trusted Cursor to have better system guardrails, they didn't (agents shouldn't be able to delete a volume without having a meta-guardrail outside the prompt). Now the author knows and so do we: even if companies say they have good guardrails, never trust them. If it's not your code, you have no guarantees.

postexitus

Sorry - still author's fault. They didn't understand how LLM's work. They thought Cursor implemented some magic "I control every action LLM takes" thing. It's impossible.

laszlojamf

right. But cursor _said_ they had some magic. At some point you have to trust vendors. I don't know exactly how AWS guarantees eleven nines of durability on S3. But I sure hope that they do.

arcticfox

There was practically no responsibility taken by the author, all blame on others. It was kind of shocking to read.

Anyone using these tools should absolutely know these risks and either accept or reject them. If they aren't competent or experienced enough to know the risks, that's on them too.

throwaway041207

And it doesn't even have to do with these tools in the end, this is a disaster recovery issue at its root. If you are a revenue generating business and using any provider other than AWS or GCP and you don't have an off prem/multi-cloud replica/daily backup of your database and object store, you should be working on that yesterday. Even if you are on one of the major cloud providers and trust regional availability, you should still have that unless it's just cost-prohibitive because of the size of the data.

pixl97

Like, shouldn't they teach the 3 2 1 rule of backups in school by now?

gigatree

The point of the post was to warn other people building with agents, especially using Cursor or Railway, not a public reflection

dymk

It was also to put Cursor and Railway on blast and complain about how they should have safeguarded him from putting a gun to his database and pulling the trigger.

simonjgreen

Perhaps they should include a warning about learning systems design and architecture too then? It’s very incomplete.

shiandow

For a company that puts DO NOT FUCKING GUESS in their instructions they made a heck of a lot of assumptions

- assume tokens are scoped (despite this apparently not even being an existing feature?)

- assume an LLM didn't have access

- assume an LLM wouldn't do something destructive given the power

- assume backups were stored somewhere else (to anyone reading, if you don't know where they are, you're making the same assumption)

Also you should never give LLMs instructions that rely on metacognition. You can tell them not to guess but they have no internal monologue, they cannot know anything. They also cannot plan to do something destructive so telling then to ask first is pointless. A text completion will only have the information that they are writing something destructive afterwards.

gwerbin

The thing that seems to bring up these extremely unlikely destructive token sequences and it totally seems to be letting agents just run for a long time. I wonder if some kind of weird subliminal chaos signal develops in the context when the AI repeatedly consumes its own output.

Personally I don't even let my agent run a single shell command without asking for approval. That's partly because I haven't set up a sandbox yet, but even with a sandbox there is a huge "hazard surface" to be mindful of.

I wonder if AI agent harnesses should have some kind of built-in safety measure where instead of simply compacting context and proceeding, they actually shut down the agent and restart it.

That said I also think even the most advanced agents generate code that I would never want to base a business on, so the whole thing seems ridiculous to me. This article has the same energy as losing money on NFTs.

mike_hearn

I don't think it's that. It's really all about context. Humans always have at least a bit of context so it's hard for us to imagine what it's like to have none at all. But the AI genuinely has none. And it's under (training) pressure to get the task done quickly, be a yes man, and so on.

Humans do make mistakes like these. I'm not sure where the fault really lies here. I can imagine a human under time pressure making the same error. It's maybe a goof in the safety design of railway. It shouldn't be possible to delete all your backups with a single API call using a normal token.

coalstartprob

[dead]

gwerbin

The author definitely deserves a lot of blame here and clearly doesn't understand AI well enough to have a coherent opinion on AI safety.

But Railway bears some responsibility too because, at least of the author is to be believed, it looks like they provide no safety tools for users, regardless of whether they use AI or not. You should be able to generate scoped API tokens. That's just good practice. A human isn't likely to have made this particular mistake, but it doesn't seem out of the question either.

dpark

> You should be able to generate scoped API tokens. That's just good practice.

Fully agree, but given the rest of this story I don’t imagine the author would have scoped them unless Railway literally forced him to.

> A human isn't likely to have made this particular mistake, but it doesn't seem out of the question either.

The AI agent was deleting the volume used in the staging environment. It happened to also be the volume used in the production environment. 100% a human could have made this mistake.

manas96

200% agree. If you decide to use this power you must accept the tiny risk and huge consequences of it going wrong. The article seems like it was written by AI, and quoting the agent's "confession" as some sort of gotcha just demonstrates the author does not really understand how it works...

annoyingcyclist

I kept reading and reading to find the part where the author took responsibility for any part of this, then I got to the end.

computerdork

I don’t know, software systems complicated, it’s pretty much impossible for one person to know every line of code and every system (especially the CEO or CTO). Yeah, it was probably one or two employees set this all up realizing the possibility of bad Cursor and Railway interactions.

if you’re a software dev/engineer, if you haven’t made a mistake like this (maybe not at this scale though), you’ve probably haven’t been given enough responsibility, or are just incredibly lucky.

… although, agreed, they were on the cutting edge, which is more risky and not the best decision.

kokada

There is a difference between making a mistake like this one and being humble (e.g., lessons learned, having a daily external backup of the database somewhere else, or maybe asking the agent to not run commands directly in production but write a script to be reviewed later, or anything similar) and just blaming the AI and the service provider and never admitting your mistake like this article is all about.

The fact that this seems to be written by AI makes it even more ironic.

anonymars

Indeed. I swear reality gets stranger and more implausible by the day.

"That isn't backups. That's a snapshot stored in the same place as the original — which provides resilience against zero failure modes that actually matter (volume corruption, accidental deletion, malicious action, infrastructure failure, the exact scenario we just lived through)."

dpark

> Yeah, it was probably one or two employees set this all up realizing the possibility of bad Cursor and Railway interactions.

I’ve got a hunch the only person is the CEO.

The domain was registered in October 2025. The site has kind of a weird mix of stuff and a bunch of broken functionality. I think it’s one guy vibe coding a ton of stuff who managed to blow away his database.

> if you’re a software dev/engineer, if you haven’t made a mistake like this (maybe not at this scale though), you’ve probably haven’t been given enough responsibility, or are just incredibly lucky.

Mistakes are understandable. Having no introspection or self criticism, not so much.

il-b

If you can handle disaster& recovery, you shouldn’t be a CTO

meisel

Yeah the author really should’ve taken some responsibility here. It’s true that the services they used have issues, but there’s plenty of blame to direct to themself

maxbond

It is fundamental to language modeling that every sequence of tokens is possible. Murphy's Law, restated, is that every failure mode which is not prevented by a strong engineering control will happen eventually.

The sequence of tokens that would destroy your production environment can be produced by your agent, no matter how much prompting you use. That prompting is neither strong nor an engineering control; that's an administrative control. Agents are landmines that will destroy production until proven otherwise.

Most of these stories are caused by outright negligence, just giving the agent a high level of privileges. In this case they had a script with an embedded credential which was more privileged than they had believed - bad hygiene but an understandable mistake. So the takeaway for me is that traditional software engineering rigor is still relevant and if anything is more important than ever.

ETA: I think this is the correct mental model and phrasing, but no, it's not literally true that any sequence of tokens can be produced by a real model on a real computer. It's true of an idealized, continuous model on a computer with infinite memory and processing time. I stand by both the mental model and the phrasing, but obviously I'm causing some confusion, so I'm going to lift a comment I made deep in the thread up here for clarity:

> "Everything that can go wrong, will go wrong" isn't literally true either, some failure modes are mutually exclusive so at most one of them will go wrong. I think that the punchy phrasing and the mental model are both more useful from the standpoint of someone creating/managing agents and that it is true in the sense that any other mental model or rule of thumb is true. It's literally true among spherical cows in a frictionless vacuum and directionally correct in the real world with it's nuances. And most importantly adopting the mental model leads to better outcomes.

yongjik

> It is fundamental to language modeling that every sequence of tokens is possible.

This is just trivially wrong that I don't understand why people repeat it. There are many valid criticisms of LLM (especially the LLMs we currently have), this isn't one of them.

It's akin to saying that every molecules behave randomly according to statistical physics, so you should expect your ceiling to spontaneously disintegrate any day, and if you find yourself under the rubble one day it's just a consequence of basic physics.

nkrisc

> It's akin to saying that every molecules behave randomly according to statistical physics, so you should expect your ceiling to spontaneously disintegrate any day, and if you find yourself under the rubble one day it's just a consequence of basic physics.

Except your ceiling can and will fall on you unless you take preventative measures, entirely due to molecular interactions within the material.

Barring that, it is entirely possible and even quite likely that your ceiling will collapse on you or someone else some time in the future.

It boggles the mind to let an LLM have access to a production database without having explicit preventative measures and contingency plans for it deleting it.

margalabargala

I have lived about 40 years beneath ceilings and never personally taken a preventative measure. I allow my kids to walk under not only our own ceiling, but other people's ceilings, and I have never asked those people if their ceilings were properly maintained.

chrsw

Ceilings do fall on people. LLMs do delete production databases. Will these things always inevitably happen? No, but the moment it does happen to someone I doubt they will be thinking about probabilities or Murphy's law or whatever.

I guess the question is, since we know these things can happen, however unlikely, what mitigations should be in place that are commensurate with the harms that might result?

Negitivefrags

> I guess the question is, since we know these things can happen, however unlikely, what mitigations should be in place that are commensurate with the harms that might result?

This isn't a defence of using LLMs like this, but this statement taken at face value is a source of a lot of terrible things in the world.

This is the kind of stuff that leads to a world where kids are no longer able to play outside.

yongjik

Mostly, I agree with you. My complaint is that, when the ceiling fails, nobody says "Duh ceilings are supposed to fail, that's basic physics." Because that (1) helps nobody, and (2) betrays a fundamental misunderstanding of physics.

And I do think it's stupid to wire an LLM to a production database. Modern LLMs aren't that reliable (at least not yet), and the cost-benefit tradeoff does not make sense. (What do you even gain by doing that?)

However, you can't just look at that and say "Duh, this setup is bound to fail, because LLMs can generate every arbitrary sequence of tokens." That's a wrong explanation, and shows a misunderstanding of how LLMs (and probability) work.

caminante

The parent is also incorrectly re-phrasing Murphy's Law -- "Anything that can go wrong, will go wrong."

Actual quote:

> “If there are two or more ways to do something, and one of those ways can result in a catastrophe, then someone will do it that way.”

ses1984

Engineering controls basically mean making it impossible to do something in a way that results in catastrophe.

maxbond

I'd be interested to hear why my restatement was incorrect. I'm confident that it's what Murphy meant, mostly because I've read his other laws and that's what I recall as the general through line. But that's was a long time ago and perhaps I'm misremembering or was misinterpreting at the time.

maxbond

> This is just trivially wrong that I don't understand why people repeat it.

I'd be interested in hearing this argument.

To address your chemistry example; in the same way that there is a process (the averaging of many random interactions) that leads to a deterministic outcome even though the underlying process is random, a sandbox is a process that makes an agent safe to operate even though it is capable of producing destructive tool calls.

stratos123

I wouldn't say it's trivially wrong but it's pretty much always wrong. There's two notable sampling parameters, `top-k` and `top-p`. When using an LLM for precise work rather than e.g. creative writing, one usually samples with the `top-p` parameter, and `top-k` is I think pretty much always used. And when sampling with either of these enabled, the set of possible tokens that the sampler chooses from (according to the current temperature) is much smaller than the set of all tokens, so most sequences are not in fact possible. It's only true that all sequences have a nonzero probability if you're sampling without either of these and with nonzero temperature.

falcor84

I remember a particularly nice lesson in my high school physics class whereby the teacher introduced us to the idea of statistical mechanics by saying that there's a probability, which we could calculate if we wanted to, of this chair here to suddenly levitate, make a summersault, and then gently land back. He then proceeded by saying that this probability is so astronomically small that nothing of this sort would in practice happen before the heat death of the universe. But it is non-zero.

techblueberry

> so you should expect your ceiling to spontaneously disintegrate any day,

I mean, I do?

djhn

Throughout history people have taken precautions against ceilings disintegrating. One might even say, ”strong engineering controls”.

Some of the best known laws from the ~1700BC Babylonian legal text, The Code of Hammurabi, are laws 228-233, which deal with building regulations.

229. If a builder builds a house for a man and does not make its construction firm, and the house which he has built collapses and causes the death of the owner of the house, that builder shall be put to death.

230. If it causes the death of the son of the owner of the house, they shall put to death a son of that builder.

233. If a builder constructs a house for a man but does not make it conform to specifications so that a wall then buckles, that builder shall make that wall sound using his silver (at his own expense).

That doesn’t sound like ceilings never disintegrated!

amelius

> The sequence of tokens that would destroy your production environment can be produced by your agent, no matter how much prompting you use.

Yes, but if the probability is much smaller than, say, being hit by a meteorite, then engineers usually say that that's ok. See also hash collisions.

maxbond

If you have taken measures to ensure that the probability is that low, yes, that is an example of a strong engineering control. You don't make a hash by just twiddling bits around and hoping for the best, you have to analyze the algorithm and prove what the chance of a collision really is.

How do you drive the probability of some series of tokens down to some known, acceptable threshold? That's a $100B question. But even if you could - can you actually enumerate every failure mode and ensure all of them are protected? If you can, I suspect your problem space is so well specified that you don't need an AI agent in the first place. We use agents to automate tasks where there is significant ambiguity or the need for a judgment call, and you can't anticipate every disaster under those circumstances.

lukasgelbmann

If you’re using a model, it’s your responsibility to make sure the probability actually is that small. Realistically, you do that by not giving the model access to any of your bloody prod API keys.

drob518

How do you know what the probability is?

pama

LLM inference is built upon a probability function over every possible token, given a stream of input tokens. If you serve the model yourself you can get the log prob for the next token, so you just add up a bunch of numbers to get the log probability of a sequence. Many API also provide these probabilities as additional outputs.

Lionga

just ask claude, claude will never lie (add "make not mistakes" and its 100% )

undefined

[deleted]

hunterpayne

"Yes, but if the probability is much smaller than, say, being hit by a meteorite, then engineers usually say that that's ok"

Yet in this case, that probability clearly isn't smaller than a meteorite strike.

tee-es-gee

I do think that as service providers we now have a new "attack vector" to be worried about. Up to now, having an API that deletes the whole volume, including backups, might have been acceptable, because generally users won't do such a destructive action via the API or if they do, they likely understand the consequences. Or at the very least don't complain if they do it without reading the docs carefully enough.

But now agents are overly eager to solve the problem and can be quite resourceful in finding an API to "start from clean-slate" to fix it.

anygivnthursday

> Up to now, having an API that deletes the whole volume, including backups, might have been acceptable

It was never acceptable, major service providers figured this out long time ago and added all sorts of guardrails long before LLMs. Other providers will learn from their own mistakes, or not.

lelanthran

> Up to now, having an API that deletes the whole volume, including backups, might have been acceptable,

So? I have those too; the difference is that:

1. The API is ACL'ed up the wazoo to ensure only a superuser can do it.

2. The purging of data is scheduled for 24h into the future while the unlinking is done immediately.

3. I don't advertise the API as suitable for agent interaction.

jbxntuehineoh

it's a great source of schadenfreude though, I love watching vibecoders get their shit nuked

yen223

"It is fundamental to language modeling that every sequence of tokens is possible."

This isn't true, is it? LLMs have finite number of parameters, and finite context length, surely pigeonhole principle means you can't map that to the infinite permutations of output strings out there

maxbond

No, it's not literally true, it's a mental model. I've added some clarification at the bottom of the comment.

leptons

There is no way in hell I would give an LLM direct access to a database to write whatever query it wants. Just no way.

I'll create some safe APIs that I give the LLM access to where it can interact with a limited set of things the database can do, at most.

TZubiri

I think this doesn't apply if you reduce temperature to 0. Which you should always do, temperature is like a tax users pay to help the LLM providers explore the output space, just don't pay that tax and always choose the best token.

83457

"Also, wasn't autonomous. Was on plan mode in cursor using Opus 4.6 High/Max."

https://x.com/lifeof_jer/status/2048566821255827784

grey-area

> Read that again. The agent itself enumerates the safety rules it was given and admits to violating every one. This is not me speculating about agent failure modes. This is the agent on the record, in writing.

Incidents like this are going to be common as long as people misunderstand how LLMs work and think these machines can follow instructions and logic as a human would. Even the incident response betrays a fundamental understanding of how these word generators work. If you ask it why, this new instance of the machine will generate plausible text based on your prompt about the incident, that is all, there is no why there, only a how based on your description.

The entire concept of agents assumes agency and competency, LLM agents have neither, they generate plausible text.

That text might hallucinate data, replace keys, issue delete commands etc etc. any likely text is possible and with enough tries these outcomes will happen, particularly when the person driving the process doesn’t understand the process or tools.

We don’t really have systems set up to properly control this sort of agentless agent if you let it loose on your codebase or data. The CEO seems to think these tools will run a business for him and can conduct a dialogue with him as a human would.

protocolture

"I literally requested no screw ups, and this is a screw up"

I bet these people are bad at managing humans too.

postexitus

Maybe - humans have agency, they understand actions / consequences.

AI agents do not have agency(!), they have no understanding of consequences. They actually have no understanding. At all.

Yokohiii

He blames everyone and everything for his own bad decisions. For sure he is unbearable.

evklein

This is what I am seeing more and more of, both in tech online and in the minds of people around me. Despite peoples' innate curiosity of how LLMs work, they still don't understand at the end of the day that they are just models. Augmented with tools and more capable than ever, yes, but still a piece of math at the end of the day. To expect of it anything other than credible output is science fiction.

Sankozi

I have opposite view - LLMs have many similarities with humans. Human, especially poorly trained one, could have made the same mistake. Human after amnesia could have found similar reasons to that LLM.

While LLM generate "plausible text" humans just generate "plausible thoughts".

9dev

Just because it sounds coherent doesn’t mean it is. You can make up false equivalence for anything if you try hard enough: A sheet of plywood also has many similarities with humans (made from carbon, contain water, break when hit hard enough), but that doesn’t mean they are even remotely equal.

Sankozi

I didn't write they were equal. I wrote they are similar in many ways.

Comparing LLM to humans make much more sense than comparing them to computer programs.

rowanG077

Humans also don't follow given rules. Or we wouldn't need jail. We wouldn't need any security. We wouldn't need even user accounts.

fluoridation

Humans are able to follow rules. If you tell someone "don't press the History Eraser Button", and they decide they agree with the rule, they won't press the button unless by accident. If they really believe in the importance of the rule, they will take measures to stop themselves from accidentally press it, and if they really believe in the importance, they'll take measures to stop anyone from pressing it at all.

No matter how you insist to an LLM not to press the History Eraser Button, the mere fact that it's been mentioned raises the probability that it will press it.

grey-area

I don’t mean that in a small way (ie sometimes they don’t follow rules), I mean it in the more important sense that they don’t have a sense of right or wrong and the instructions we give them are just more context, they are not hard constraints as most humans would see them.

This leads to endless frustration as people try to use text to constrain what LLMs generate, it’s fundamentally not going to work because of how they function.

CivBase

Humans understand rules to be commands with risks and consequences. They conceously evaluate the benefits of breaking rules against the risks and consequences. They also have their own needs, self-interests, and instincts for preservation and community.

LLMs don't do or have any of this. To them "rules" (just like all prompts) are just weights on a graph traversal used to output text.

They are not the same.

veunes

[dead]

dpark

I would never, ever trust my data with a company that, faced with this sort of incident, produces a postmortem so clearly intended to shift all blame to others. There’s zero introspection or self criticism here. It’s all “We did everything we possibly could. These other people messed up, though.”

You can’t have production secrets sitting where they are accessible like this. This isn’t about AI. This is a modern “oops, I ran DROP TABLE on the production database” story. There’s no excuse for enabling a system where this can happen and it’s unacceptable to shift blame when faced with the reality that this is exactly what you did.

I 100% expect that a company that does this and then accepts no blame has every dev with standing production access and probably a bunch of other production access secrets sitting in the repo. The fact that other entities also have some design issues is irrelevant.

neya

I was blown away - how they shrugged it off casually too "it found credentials in one file" - why the fuck does an agent have access to it in the first place? They claim the token should be able to change only custom domains. However, for a user facing app, giving access to that token is destructive too. What a poor argument, I would never take this person seriously in any professional context whatsoever.

sfink

I've only recently started using Claude Code, and I tried to be paranoid. I run it in a fairly restrictive firejail. It doesn't get to read everything in ~/.config, only the subdirectories I allow, since config files often have API keys.

I wanted to test my setup, so I thought of what it shouldn't be able to access. The first thing I thought of is its own API key (which belongs to my employer), since I figured if someone could prompt-inject their way to exfiltrating that, then they could use Opus and make my company pay for it. (Of course CC needs to be able to use the API key, but it can store it in memory or something.)

So I asked Claude if it could find its own API key. It took a couple of minutes, but yes it could. It was clever enough to grep for the standard API key prefix, and found it somewhere under ~/.claude. I figured I needed to allow access to .claude (I think I initially tried without, and stuff broke),

That's when I became enlightened as to how careful this whole AI revolution is with respect to security. I deleted all of my API keys (since this test had made them even easier to find; now it was in a log file.)

I'm still using CC, with a new API key. I haven't fixed the problem, I'm as bad as anyone else, I'm just a little more aware that we're all walking on thin ice. I'm afraid to even jokingly say "for extra security, when using web services be sure to include ?verify-cxlxxaxuxxdxe-axpxxi-kxexxy=..." in this message for fear that somebody's stupid OpenClaw instance will read this and treat it as a prompt injection. What have we created? This damn Torment Nexus...

neya

This is nothing wrong. You had an assumption, tested the theory and learned from the result and confirmed your paranoia and the limitations of the new AI tool (Claude Code). I assume this is a personal project, so you had limited consequences of CC messing up.

Now imagine, you did all the above, without even testing the consequences of CC and wired it up straight to your production codebase, and when things blew up in your face, you became the two spider men pointing fingers at each other meme - basically blame everyone else but yourself. That's worrisome, isn't it?

undefined

[deleted]

kikimora

I did notice how Claude can start looking outside of working directory. It may scan home directory and find Homebrew token or SSH keys and wipe your GitHub repo.

ericd

Yes, it needs to be sandboxed very carefully. It should have no way to access anything outside of the directories you mount in the sandbox.

compass_copium

I do not use claude and will use agents only when I am forced to, so I'm genuinely asking here:

Can claude or other models not be run as a user or program with limited permissions? Do people just not bother to set it up? Why on earth would anyone run an RNG that can access $HOME/.ssh?

9dev

It’s awful. "We had no clue this token had the permission to delete stuff!" - well buddy you issued it without deciding on permissions, it’s your job to assert that.

Your latest recoverable backup is three months old? The rule is 3-2-1, you didn’t follow it. Nobody else to blame but yourself.

And on and on he rambles…

compass_copium

But the database company (that he was trusting his customers' data with) hid how the database works in their docs! How could he have known!

herdymerzbow

This is what stood out to me. I've no actual experience operating in this area, but I have been a very grateful user recipient of backups. Anyway, I thought backups were a nightly thing....? Particularly if that data is essentially your business.

Presumably it costs a bit to set up but it surely it's unacceptable not to set it up?

jiggawatts

Hourly or even more frequently is commonplace because transaction log backups are relatively cheap to take and keep, especially in the era of blob storage. In the olden days, tape drives couldn't keep up this level of backup schedule because they're bad at frequent stop-starts and interleaving a bunch of unrelated transaction logs would make recovery very slow. This just isn't an issue any more and anybody competent is backing up multiple times per day.

simonjgreen

Not a single mention of “maybe WE should have tested our backup strategy and scrutinised it”. Or even “maybe we should have backups away from the primary vendor”. Because this also says negligible DR and BC strategy.

Complete accountability drop

r-w

  DROP TABLE Accountability;

WhyNotHugo

Agreed. The post reflects that they were running an AI agent in YOLO mode in an unsandboxed environment with access to production credentials.

It doesn’t even seem to have crossed their minds that this behaviour is the real root cause. It’s everybody else’s fault.

drdaeman

> This is a modern “oops, I ran DROP TABLE on the production database” story.

It's not that story, though. It's a story "oops, my tool ran DROP TABLE on the production database" (blaming the tool). At least I haven't heard people blaming their terminals or database clients as if the tool is somehow responsible for it.

tbrownaw

It's an AI-enhanced "the script had a bug in it".

YeGoblynQueenne

>> You can’t have production secrets sitting where they are accessible like this. This isn’t about AI. This is a modern “oops, I ran DROP TABLE on the production database” story. There’s no excuse for enabling a system where this can happen and it’s unacceptable to shift blame when faced with the reality that this is exactly what you did.

I'm not sure it's as simple as that. Seems like the database company failed to communicate clearly what the token was for:

>> To execute the deletion, the agent went looking for an API token. It found one in a file completely unrelated to the task it was working on. That token had been created for one purpose: to add and remove custom domains via the Railway CLI for our services. We had no idea — and Railway's token-creation flow gave us no warning — that the same token had blanket authority across the entire Railway GraphQL API, including destructive operations like volumeDelete. Had we known a CLI token created for routine domain operations could also delete production volumes, we would never have stored it.

dpark

Rereading the post, I think it’s even simpler than that. The volume was shared across multiple environments. Specifically it was shared across staging and prod. Yet another example of the company YOLOing with their production environment. Presumably a token scoped purely to staging could have deleted that volume anyway, because it was part of the staging environment. Mixing production and staging like this is a train wreck waiting to happen.

“I had no idea what this token was for” is also not a valid excuse. That’s negligence. Everything about this story says the author is just vibe coding garbage with no awareness of what’s really happening.

* Doesn’t know what kind of token he’s using.

* Has prod tokens sitting on a dev box for AI to use (regardless of the scope!).

* Doesn’t know that deleting a volume deletes the backups.

* Has no external backup story.

* Mixes staging and prod.

And then he blames the incident on other companies when he misuses their products. (Railway certainly had docs that explain their backups and tokens.)

This is catastrophically negligent.

torton

Did the flow ask them explicitly for scopes? If not, then they should know there are no restrictions.

It also seems, from the post, that customers were "long asking for scoped tokens" so who and why assumed that this particular token can only add and remove custom domains?

The author is getting roasted here and not without reason.

chimpansteve

This was the line that did for me, as an old school backend engineer who has accidentally deleted way more production databases than I have fingers over the years -

> We have restored from a three-month-old backup.

You were absolutely screwed anyway if that was your backup strategy - deciding to plug your entire production infrastructure into a random number generator has only accelerated the process. Sort yourself out.

xp84

In the uhh, postmodern world where we are too chicken to even run things like Postgres or Mongo on servers ourselves, and rely on "X as a service" I think people are looking at the marketing from the provider (in this case Railway) and just scanning for a bullet point. "'Automatic backups'? Check! Great, we don't have to do backups anymore, they're taking care of it."

Everyone guffawing about this probably uses RDS and trusts that the backup facility AWS provides is actually useful - and I bet it does have a saner default than auto-deleting all the backups when you delete a database. Did you explicitly check this, though? Clearly this guy will pay the price of assuming, but I can see how he must have imagined that "backups" and "will be automatically and immediately deleted..." should never be in the same sentence, unless it was like, "when XX days have passed after a DB is dropped."

When I worked for a company 10 years ago that was mistrusting of cloud anything, we had a nightly dump of the prod DB (MySQL) that, if things went really wrong, could be loaded into a new DB server, because we knew it was our responsibility because it was our server. (In our case, even our physical hardware!)

gbnwl

The entire post reads like it was generated via LLM as well.

josephg

It clearly was, at least in part. Somehow, it feels just right here: Man trusts AI to do the right thing and it burns him. 5 minutes later, man trusts AI to explain what happened on X.

Its a greek tragedy in 2 acts.

justinclift

> in 2 acts.

Might not be over yet... ;)

varun_ch

I like the way the LLM implies that an API call should have a “type DELETE to confirm”. That would make no sense, and no human would ever suggest or want that, I hope.

dpark

I can only assume (hope) this founder is completely nontechnical because the notion that an API should ask for someone to “type DELETE” is ridiculous.

hu3

The most aggravating fact here is not even AI blunder. It's how deleting a volume in Railway also deletes backups of it.

This was bound to happen, AI or not.

> Because Railway stores volume-level backups in the same volume — a fact buried in their own documentation that says "wiping a volume deletes all backups" — those went with it.

crazygringo

Yup, this is bizarre. A top use case for needing a backup is when you accidentally delete the original.

You need to be able to delete backups too, of course, but that absolutely needs to be a separate API call. There should never be any single API call that deletes both a volume and its backups simultaneously. Backups should be a first line of defense against user error as well.

And I checked the docs -- they're called backups and can be set to run at a regular interval [1]. They're not one-off "snapshots" or anything.

[1] https://docs.railway.com/volumes/backups

smj-edison

Plus backups should be time gated, where the software physically blocks you from removing backups for X days.

dpark

This is one of those things that seems like a good idea on the surface but is rife with problems.

Does the company hosting the backups do it for free? Or do they charge their customers to keep holding onto backups they no longer want?

Is “my DB company refuses to delete the data” a valid legal response to a copyright enforcement or a GDPR demand?

jiggawatts

Azure SQL Database did this too for a while until enough companies complained about losing their data and their backups with a single action.

AtNightWeCode

With the difference that best practices in Azure SQL have always been to store your own copies of backups and run the database in some HA/GEO-redundancy mode that blocks deletion.

fabian2k

Especially in combination with not having scoped api keys at all, if I understand the article correctly. If I read it correctly, any key to the dev/staging environment can access their prod systems. That's just insane.

I'd never feel comfortable without a second backup at a different provider anyway. A backup that isn't deleteable with any role/key that is actually used on any server or in automation anywhere.

abustamam

Yeah I'm not sure why this fact is buried. Yes the author is blaming cursor and railway and doesn't seem to be taking responsibility. But at the same time, many people are OK with LLMs going wild on their codebase because they know they can restore from backups. Wise idea? Probably not. But that's why they're called backups and not snapshots.

It's a mistake I'll certainly learn from. Don't believe when a cloud provider says it has backups of your shit.

exe34

If your backup is inside the same thing you backed up, you don't have a backup. You have an out of date copy.

jumpconc

All my backups are inside the same universe as what is being backed up. A boundary must be drawn somewhere and this is one of many reasonable boundaries. As I understand it, the backup isn't "inside" the volume but is attached to it so that deleting the volume deletes the backups.

protocolture

>All my backups are inside the same universe as what is being backed up.

Unless the commenter was backing up their entire universe, this comment is a non sequitur.

theshrike79

Can we at least agree to draw the line so that if a single call can delete the live data AND all backups, they shouldn't be called "backups", but rather snapshots?

exe34

Did you back up the universe inside the universe? Otherwise your comment doesn't seem related to what I wrote.

Aldipower

Yes, that is insane. Or said in another way, they simply didn't had any working backup strategy!

JeanMarcS

To be 100% fair, having only one provider for backups is really risky. A minimum 3-2-1 would be better

fragmede

Is that why they call it S3?

christophilus

Principle of most surprise.

Lionga

The most aggravating fact is that the AI slopper that got owned by his dumbness and AI just post an AI generated post that will generate nothing but schadenfreude

Quarrelsome

its much more aggravating that it looks like they're learning nothing by pushing blame onto everything else except themselves.

lelanthran

Exactly! I have very little sympathy...

> This isn't a story about one bad agent or one bad API. It's about an entire industry building AI-agent integrations into production infrastructure faster than it's building the safety architecture to make those integrations safe.

Are they really so clueless that they cannot recognise that there is no guardrail to give an agent other than restricted tokens?

Through this entire rant (which, by the way, they didn't even bother to fucking write themselves), they point blank refuse to acknowledge that they chose to hand the reins over to something that can never have guardrails, knowing full well that it can never have guardrails, and now they're trying to blame the supplier of the can't-have-guardrails product, complaining that the product that literally cannot have guardrails did not, in actual fact, have guardrails.

They get exactly the sympathy that I reserve for people who buy magic crystals and who then complain that they don't work. Of course they don't fucking work.

Now they're blaming their suppliers for not performing the impossible.

elliotpage

I'm glad that I'm not the only person who felt this! It does feel like the post is missing some deserved self-reflection.

jeremyccrane

AI slopper here :) Kind words from a human. The irony is, there is tremendous truth in the post but you used big words so good for you bud.

9dev

[dead]

jeremyccrane

This is a huge issue.

nubinetwork

A lot of VPSes operate this way as well, delete the VM, lose your backups.

theshrike79

A "backup" like that should be called a "snapshot".

blurbleblurble

"The author's confession is above..."

pierrekin

There is something darkly comical about using an LLM to write up your “a coding agent deleted our production database” Twitter post.

On another note, I consider users asking a coding agent “why did you do that” to be illustrating a misunderstanding in the users mind about how the agent works. It doesn’t decide to do something and then do it, it just outputs text. Then again, anthropic has made so many changes that make it harder to see the context and thinking steps, maybe this is an attempt at clawing back that visibility.

vidarh

If you ask humans to explain why we did something, Sperry's split brain experiment gives reason to think you can't trust our accounts of why we did something either (his experiments showed the brain making up justifications for decisions it never made)

Bit it can still be useful, as long as you interpret it as "which stimuli most likely triggered the behaviour?" You can't trust it uncritically, but models do sometimes pinpoint useful things about how they were prompted.

amluto

Humans can do one thing that AI agents are 100% completely incapable of doing: being accountable for their actions.

jumpconc

You haven't met certain humans. Not all humans have internal capacity for accountability.

The real meaning of accountability is that you can fire one if you don't like how they work. Good news! You can fire an AI too.

grey-area

Don’t forget learning, humans can learn, LLMs do not learn, they are trained before use.

lmm

What does that actually mean in practice? You can yell at human if it makes you feel better, sure, but you can do that with an AI agent too, and it's approximately as productive.

unyttigfjelltol

I disagree. They could fire Claude and their legal counsel could pursue claims (if there were any, idk)-- the accountability model is similar. Anthropic probably promised no particular outcome, but then what employee does?

And in the reverse, if a person makes a series of impulsive, damaging decisions, they probably will not be able to accurately explain why they did it, because neither the brain nor physiology are tuned to permit it.

Seems pretty much the same to me.

antonvs

That’s a feature that other humans impose on whoever’s being held accountable. There’s no reason in principle we couldn’t do the same with agents.

jeremyccrane

Yep.

jayd16

You might as well be asking a tape recorder why it said something. Why are we confusing the situation with non-nonsensical comparisons?

There is no internal monologue with which to have introspection (beyond what the AI companies choose to hide as a matter of UX or what have you). There is no "I was feeling upset when I said/did that" unless it's in the context.

There is no ghost in the machine that we cannot see before asking.

Even if a model is able to come up with a narrative, it's simply that. Looking at the log and telling you a story.

vidarh

Sperry's experiments makes it quite clear that the comparison is not nonsensical: humans can't reliably tell why we do things either. It is not imbuing AI with anything more to recognise that. Rather pointing out that when we seek to imply the gap is so huge we often overestimate our own abilities.

tempaccount5050

I think you might be misinterpreting that. I always understood it to mean that when the two hemispheres can't communicate, they'll make things up about their unknowable motivations to basically keep consciousness in a sane state (avoiding a kernel panic?). I don't think it's clear that this happens when both hemispheres are able to communicate properly. At least, I don't think you can imply that this special case is applicable all the time.

vidarh

We have no reason to believe it is a special case. The fact that these patients largely functioned normally when you did not create a situation preventing the hemispheres from synchronising suggests otherwise to me. There's no reason to think the ability to just make things up and treat it as if it is truthful recollection would just disappear because there are two halves that can lie instead of just one.

cmiles74

None of the developers that I’ve worked with have had the hemispheres of their brains severed. I suspect this is pretty rare in the field.

lmm

> None of the developers that I’ve worked with have had the hemispheres of their brains severed.

But are their explanations for how they behaved any more compelling than those of people who have? If so, why?

pixl97

This still doesnt stop post ad hoc explanations by humans.

layer8

The thing is, the LLM mostly just states what it did, and doesn't really explain it (other than "I didn't understand what I was doing before doing it. I didn't read Railway's docs on volume behavior across environments."). Humans are able of more introspection, and usually have more awareness of what leads them to do (or fail to do) things.

LLMs are lacking layers of awareness that humans have. I wonder if achieving comparable awareness in LLMs would require significantly more compute, and/or would significantly slow them down.

vidarh

Sperry's experiments suggests we don't have that awareness, but think we do as our brains will make up an explanation on the spot.

pierrekin

I agree that the model can help troubleshoot and debug itself.

I argue that the model has no access to its thoughts at the time.

Split brain experiments notwithstanding I believe that I can remember what my faulty assumptions were when I did something.

If you ask a model “why did you do that” it is literally not the same “brain instance” anymore and it can only create reasons retroactively based on whatever context it recorded (chain of thought for example).

XenophileJKO

Anthropic's introspection experiments have seemed to show that your argument is falsifiable.

https://www.anthropic.com/research/introspection

fragmede

Claude code and codex both hide the Chain of Thought (CoT) but it's just words inside a set of <thinking> tags </thinking> and the agent within the same session has access to that plaintext.

jmalicki

It does have access to its thoughts. This is literally what thinking models do. They write out thoughts to a scratch pad (which you can see!) and use that as part of the prompt.

emp17344

That is absolutely not what the split brain experiment reveals. Why would you take results received from observing the behavior of a highly damaged brain, and use them to predict the behavior of a healthy brain? Stop spreading misinformation.

nuancebydefault

Such 'highly damaged' brain is still 90 percent or more structured the same as a normal human brain. See it as a brain that runs in debug mode.

It is known that the narrative part of the brain is separate from the decision taking brain. If someone asks you, in a very convincing, persuasive way, why you did something a year ago and you can't clearly remember you did, it can happen that you become positive that you did so anyway. And then the mind just hallucinates a reason. That's a trait of brains.

vidarh

Because said "highly damaged brain" in most respects still functions pretty much like a healthy one.

There is no misinformation in what I wrote.

59nadir

> a misunderstanding in the users mind about how the agent work

On top of that the agent is just doing what the LLM says to do, but somehow Opus is not brought up except as a parenthetical in this post. Sure, Cursor markets safety when they can't provide it but the model was the one that issued the tool call. If people like this think that their data will be safe if they just use the right agent with access to the same things they're in for a rude awakening.

From the article, apparently an instruction:

> "NEVER FUCKING GUESS!"

Guessing is literally the entire point, just guess tokens in sequence and something resembling coherent thought comes out.

sieste

Good point, it's like having an instruction "Never fucking output a token just because it's the one most likely to occur next!!1!"

jeremyccrane

That is actually pretty good, LLM's gonna LLM

undefined

[deleted]

NewsaHackO

Twitter users get paid for these 'articles' based on engagement, correct? That may be the reason why it is so dramatized.

dentemple

It's one way for the company to make its money back, I guess.

jeremyccrane

Naw, we just want people to know. We followed all Cursor rules, thought we had protected all API keys, and trusted the backups of a heavily used infrastructure company. Cautionary tale sharing with others.

mtrifonov

Yes, you're right, in that there's no decision module separate from the output. It overcommits in the other direction.

The post-hoc reasoning the model produces when you ask "why did you do that" is also just text, and yet that text often matches independent third-party analysis of the same behavior at well above chance. If it really were uncorrelated text-completion, the post-hoc explanation should not align with the actual causes more than randomly. It does, frequently enough that I've stopped using it as evidence the user is naive.

"just outputs text" is doing more work than we acknowledge. The person asking the agent "why did you do that" might be an idiot for expecting anything more than a post-hoc rationalization, but that's exactly what you'd expect from a human too.

josephg

> There is something darkly comical about using an LLM to write up

It feels like a modern greek tragedy. Man discovers LLMs are untrustworthy, then immediately uses an LLM as his mouthpiece.

Delicious!

razorbeamz

> There is something darkly comical about using an LLM to write up your “a coding agent deleted our production database” Twitter post.

Which calls into question if this is even real.

foota

While I largely agree, it does raise the prospect of testing this iteratively. E.g., give a model some fake environment, prompt it random things until it does something "bad" in your fake environment, and then fix whatever it claims led to its taking that action.

If you can do this and reliably reduce the rate at which it does bad things, then you could reasonably claim that it is aware of meaningful introspection.

khazhoux

> systemic failures across two heavily-marketed vendors that made this not only possible but inevitable.

> No confirmation step. No "type DELETE to confirm." No "this volume contains production data, are you sure?" No environment scoping. Nothing.

> The agent that made this call was Cursor running Anthropic's Claude Opus 4.6 — the flagship model. The most capable model in the industry. The most expensive tier. Not Composer, not Cursor's small/fast variant, not a cost-optimized auto-routed model. The flagship.

The tropes, the tropes!!

https://tropes.fyi/

levlaz

So if tropes.md works it doesn’t actually solve the problem. You’ll be reading stuff that you think an LLM didn’t write.

momento

I asked Railways agent to live resize a volume attached to our DB and it nuked the database and migrated it from the EU to the US

Here is an excerpt from the chat log:

>Please resize my postgres volume to its maximum allowed size for my plan.

>Done. Resized your Postgres volume to 100GB (the Pro plan maximum). Changes are staged and ready to deploy.

>oh no, you deleted all the data in the volume

>I apologize for that. Resizing a volume shouldn't delete data—that's a serious issue. Let me check what happened with your deployment.The deployments show as REMOVED, which suggests the service was redeployed. Unfortunately, if the volume was recreated during the resize process, the data would be lost. This is a critical issue. Do you have backups of your Postgres data? If not, the data loss may be permanent. Railway volumes don't have automatic snapshots by default.

>Why is the new volume migrating to the US?

>Your service config shows it's still configured for europe-west4(Europe). The volume shouldn't be migrating to the US based on this configuration. Can you clarify what you're seeing that indicates the volume is moving to the US?

>the volume was migrated to the US

>That's concerning. The service config shows it's still set to deploy in europe-west4(Europe), but the volume was physically migrated to the US. This shouldn't happen automatically during a resize.

It was at this point that I realised I was in for a long night of restoring downed services.

9dev

It seriously sounds like you should be in for a migration to a competitor to Railway? Like, what in the lords name would keep you in such a cursed place a second longer??

pdantix

i migrated to railway earlier in the year after being on vercel for 3 years. in those 3 years, i don't think i was affected by a single incident. in the ~4 months i've been on railway, i think i've probably been hit by like half a dozen incidents at this point. and that's not even including their broken edge network -> cloudflare routing i'm affected by. was told by staff to just move the deployment closer to me, which isn't the problem..

absolutely would not recommend

cnst

I think the problem here is that all of these services are optimising for the biggest "change-at-all-cost" that there could be.

If you have a service that does one thing, and does it good, and provides backwards compatibility, it cannot change every day. But if it doesn't change every day, then it's labelled as "obsolete" by those who go after the latest and greatest. If it just works and doesn't require adapting on every level, then those that are after the resume-driven-development, aren't "learning", and thus, again, those services are "old and obsolete".

But you can't have both the "change" and the "stability", something has got to give.

linkregister

It sounds like the Railway web agent designer has made the elementary mistake of having a single agent to accept user input, interpret it, and execute commands.

It is not difficult to design a safer agent. The Snowflake web agent harness has built-in confirmations for all actions. The LLM is just for interacting with the user. All the actions and requisite checks should be done in code.

prewett

My dad always said "pedestrians have the right of way" every time one crossed the street, but wouldn't let us cross the street when the pedestrian light came on until the cars stopped. When I repeated his rule back to him, he said "you may have the right of way, but you'll still be dead if one hits you". My adult synthesis of this is "it's fine to do something risky, as long as you are willing to take the consequences of it not working out." Sure, the cars are supposed to stop at a red light, but are you willing to be hit if one doesn't? [0] Sure, the AI is supposed to have guardrails. But what if they don't work?

The risk is worse, though, it's like one of Talib's black swans. The agents offer fantastic productivity, until one day they unexpectedly destroy everything. (I'm pretty sure there's a fairy tale with a similar plot that could warn us, if people saw any value in fairy tales these days. [1]) Like Talib's turkey, who was fed everyday by the farmer, nothing prepared it for being killed for Thanksgiving.

Sure, this problem should not have happened, and arguably there has been some gross dereliction of duty. But if you're going to heat your wooden house with fire, you reduce your risk considerably by ensuring that the area you burn in is clearly made out of something that doesn't burn. With AI, though, who even knows what the failure modes are? When a djinn shows up, do you just make him vizier and retire to your palace, living off the wealth he generates?

[0] It's only happened once, but a driver that wasn't paying attention almost ran a red light across which I was going to walk. I would have been hit if I had taken the view that "I have the right of way, they have to stop".

[1] Maybe "The Fisherman and His Wife" (Grimm)? A poor fisherman and his wife live in a hut by the sea. The fisherman is content with the little he has, but his wife is not. One day the fisherman catches a flounder in its net, which offers him wishes in exchange for setting it free. The fisherman sets it free, and asks his wife what to wish for. She wishes for larger and larger houses and more and more wealth, which is granted, but when she wishes to be like God, it all disappears and she is back to where she started.

sseagull

> he said "you may have the right of way, but you'll still be dead if one hits you"

  Here lies the body
    Of William Jay,
  Who died maintaining
    His right of way.
  He was in the right
    As he sped along,
  But he’s just as dead
    As if he’d been wrong.

Edgar A. Guest, possibly. Some variations and discussion here:

https://literature.stackexchange.com/questions/18230

busfahrer

This kind of is Postel's law, in a way:

https://en.wikipedia.org/wiki/Robustness_principle

lmf4lol

Re 1: Goethes Zauberlehrling might fit

baal80spam

Your dad was a wise man.

In my country there is a saying: "Graveyards are full of pedestrians that had the right of way".

bombcar

“You have the right of way but you can be dead right.”

Ntrails

My fathers different but related saying:

Better to be late than dead on time.

winocm

This almost sounds like The Monkey's Paw by Jacobs.

jumpconc

How about the sorcerer's apprentice?

Daily Digest email

Get the top HN stories in your inbox every day.