Codex logging bug may write TBs to local SSDs

Daily Digest email

Get the top HN stories in your inbox every day.

b--l

Codex is one of the most infamous examples of slopware. Just having the window unhidden on my mac will cause it to use 100% of the GPU displaying the spinner message.

THE SPINNER MESSAGE CAUSES 100% GPU USAGE ON AN MBP M5!!

So any time you're waiting on the model (which is 90% of the time), your fans will be blasting (careful, don't use it on battery).

The issue is on github and close to 6 months old. Probably since the release of vibe coded junk. I would literally fix it myself but it's closed source for whatever reason.

There are many discussions about which model is better, or if vibe coding is even possible. I point you to the extent of what one of the most well funded, money flush, well staffed model making companies can do with vibe coding.

To me a screwup this bad (where the CEO has already made it clear they're now "focussing on coding") indicates that there's something truly broken in the company. No one on polymarket expects them to have a leading model any time soon for example.

It's a tragedy. The world needs competition to anthropic.

jofzar

> Codex is one of the most infamous examples of slopware

Woah, let's not forget Claude code is right there

me551ah

Claude is also weird for being the only coding assistant that for some reason doesn't support AGENTS.md. Codex, Amp, Cursor all of them support it and read from it, but not claude which forces it's users to use CLAUDE.md instead.

The issue is the higest voted issue on their gitlab repo: https://github.com/anthropics/claude-code/issues/6235

ValentineC

My CLAUDE.md is just:

    @AGENTS.md

And Claude processes it just fine.

(I see that it's a common workaround, and there's a comment in the above link saying just this: https://github.com/anthropics/claude-code/issues/6235#issuec...)

It's a hassle having to add it to every repo that I use Claude with though, and I often use other models and harnesses too for the more trivial tasks.

chorkpop

CLAUDE.md has been incredibly successful for them advertising wise. I wouldn’t expect them to admit AGENTS.md exists anytime soon.

bandrami

So there's this amazing thing called a symlink

dgunay

Maybe I'm just not aware enough to notice any quality degradation, but I've been using Claude Code in repos that only have AGENTS.md and it just generally seems to know to read it when getting its bearings.

datsci_est_2015

Literally trying to use file naming to build a moat. “We can’t switch to Cursor, we’d have to rename all of our files from CLAUDE.md to AGENTS.md!”

hexsprite

I created a Claude Code plugin to load AGENTS.md. Uses symlinks but it’s better than no support.

https://github.com/hexsprite/claude-agents-md

anon373839

[flagged]

kokada

Not that Claude Code is much better, I just hit this issue[1] because it seems setting DO_NOT_TRACK=1 seems enough to get a really strange behavior in the newest versions of CC.

[1]: https://github.com/anthropics/claude-code/issues/69238#issue...

Edit: I think I misunderstood OP, they're saying that CC is even worse and not better than Codex CLI.

mvATM99

Yeah exactly.

I'm not exactly building TUI's every day, but even i felt pain when i read that "small game engine" post

matheusmoreira

At least game engines manage to render their frames properly. Claude Code sometimes eats entire paragraphs of text output, resulting in things such as numbered lists jumping from 2 to 4 out of nowhere.

I'd just ask Claude to repeat himself at first but it happens so often that I actually made a little tool to dig up the output inside the session history and present it properly in a separate terminal.

TacticalCoder

> I'm not exactly building TUI's every day, but even i felt pain when i read that "small game engine" post

The bigger issue is they where somehow thinking it was "cool" and "advanced" while it's just a kludgy rube-goldbergy monstrous hack.

Which is of course only semi-working: to me the model thinking what you see is what it outputs in the TUI is the deal-breaker for me. It's of course not working like that for they're apparently, in their "game engine", converting on the fly a headless browser to approximated characters to display in the terminal. So the model tells you he did output ASCII but people are copy/pasting (because, yes, at times you want to copy/paste) Unicode chars.

Plenty of bug reports and pissed users.

That's the bigger issue.

The biggest issue is those thinking a 10 GB VM required to run a headless Electron browser and then fuxx0ring characters conversion is somehow an achievement.

varjag

Right, just yesterday I found my laptop kinda hot. And what do you think, it was good old Claude deciding to load a few cores with completely idling prompts.

cozzyd

Yeah, somehow claude-cli sitting in a tmux session doing nothing uses my laptops battery twice as fast. Powertop suggests a ton of wakeups, but why? Is it busy polling the input or something?

sambcui

I don’t know if you can resonate, but I feel like the Vibe Coded codex and Claude Code desktop apps are iterating way faster than they should be.

malfist

How are they iterating? I've not noticed anything major changing between the versions of my claude code. Other than that sometimes this version includes /btw and sometimes it's missing.

iLoveOncall

Surprisingly Kiro is fine (I work at Amazon but not at all on the Kiro team). I prefer it to anything else I've tried (except Amazon Q Developer in IntelliJ, but it's now deprecated).

epistasis

Kiro is surprisingly good, if the interface for saving and resuming was slightly more reliable, and there was the hope of remote sessions, I'd probably switch to it full time. I vastly prefer it to having to fight against buggy force-fed features like UltraPlan or whatever.

r_lee

if we are at 10x with AI and near AGI or ASI, then how is it possible that these products (Codex, Claude Code CLI) are still such garbage?

shouldn't this "agentic AI revolution" have long solved this already?

no way they're over there saying "we are on it plz wait" or that "it's too much effort"?

igleria

This is the biggest elephant in the room I have seen in my decade+ career. At the same time, look how bad Apple is in software compared to its hardware... It's not an AI only problem, it's almost like software in general gets a free pass on being very unsafe or low quality because no one wants to face the same "profit reducing red tape" that civil engineers or similar face.

CharlieDigital

Anthropic were the progenitors of the Model Context Protocol. Claude Code does not fully implement the client end of the protocol. A protocol; a literal pre-defined spec that an agent should be able to one-shot. Neither does Codex. Codex does not implement MCP Prompts.

(I want Codex to implement MCP Prompts because then we have one central way to ship skills from a server).

The fact that neither platform can implement a protocol given what is functionally infinite frontier model tokens really says a lot. I do not care what kind of random project some influencer can ship with a swarm of 1000 agents. If you cannot make the basics work, it is a farce.

thewebguyd

> same "profit reducing red tape" that civil engineers or similar face.

I don't think we should ever head toward licensing/a credential body for software development, but I do think now is a good time to have discussions around liability for defective products.

A good start would be to stop allowing companies to disclaim all warranties of fitness for a particular purpose in their EULAs. The joke of Microsoft Copilot applies here where they have a big disclaimer that "Copilot is for entertainment purposes only" while advertising says otherwise. Not even the chrome EULA will agree that its fit for purpose as a web browser. The clause is a get out of jail free card that shifts all liability and risk to the end user.

forshaper

How much of all this is due to hardware improving, and software bloating enough to fill the capacity?

thewebguyd

> shouldn't this "agentic AI revolution" have long solved this already?

Daily reminder that Anthropic took over a year to fix the Claude Code terminal flickering issue despite proclaiming all over the internet that software development as a "solved problem."

Apple forked over $250 Million in a class action over false advertising for Apple Intelligence. When do we start seeing the same for the misleading and outright false claims coming out of the frontier labs about the model capabilities? At this point the marketing is doing more harm than the technology itself because its warping the perceptions of those at the top that make decisions. The only reason tokenmaxxing was ever a thing was because marketing mislead execs and technology decisions were made based on vibes instead of evidence.

mannanj

Why is not a thing that people track the lies of people as they are public, and tie them to their reputation over time for anyone to find?

mannanj

As long as a majority of the people of the living class are gullible and naive and sick, entrained behavior from the institutions and media they are made to consume, they stop seeing the misleading and false claims. Or at least they myopically see it short enough to complain about it in an ineffective way, then continue to consume the next big lie or slop. Until something happens that channels that accumulated rage finally into a cause they feel makes things right (assuming they have not already died and the next generation has been groomed to fall for the rich man's trap) and those who's family and next generation is to continue the extraction and trickery hides behind an anonymous personality or system.

jeffybefffy519

Because vibe coding is a toy… thats the secret.

You can use it to accelerate development certainly, but that requires careful change->review cycles. The developer still needs to be in heavy control, versus vibe coding having an agent own the code base.

Nullsession

[flagged]

hombre_fatal

Like anything, you have to decide between polish vs switch to any other task in the queue. If you choose too much from the latter, then polish suffers, yet that's a human thing.

Also, Codex and Claude Code aren't as bad as people say. I think most of the noise is embellished by the "hah see? AI sucks" angle.

It's kind of like how HNers would claim to your face that you can't actually build anything with Javascript and Node.js (JS just sucks too much), then they'd list off a few footguns that were supposed to demonstrate why. In other words, champing at the bit for JS to lead people to catastrophize issues that were pretty mediocre.

geodel

> yet that's a human thing.

is this joke?

Here we are talking about trillon dollar AI companies who claim AI can fix decade old bugs and create new compilers, OSs and what not. Are parallel agents working autonomously to fix issues as well as create new features not allowed at these companies?

coldtea

>Like anything, you have to decide between polish vs switch to any other task in the queue

Why do you "have to decide"? Let some agents go at both of those, isn't that what they claim people can just do?

>Also, Codex and Claude Code aren't as bad as people say. I think most of the noise is embellished by the "hah see? AI sucks" angle.

Why shouldn't it? They're not the ones making the extraordinary claims.

mnicky

If the code churn is high the investment to refactoring etc is less beneficial than may be obvious. I don't remember the details but I heard in some podcast that the code base of Claude Code changes so fast that any piece of code won't be there for long..

coldtea

In other words it's an ever moving vibe fest, with random bugs and misbehaviors each time they roll the dice...

undefined

[deleted]

tartoran

If they respected their users they’d at least pin some versions that are more stable.

fg137

You are asking too many good questions.

rjh29

Gemini is also buggy as heck and has been buggy for years. For a company of Google's size with "all the power of AI" it's seriously embarrassing.

ValentineC

The "AI revolution" feels like it's creating a bunch of ultra-smart AI models are scarily good at cracking most of human-created security (Mythos), but also happen to be careless snobs that just leave litter and mess in their wake.

LtWorf

We don't really know how much human intervention there is in mythos… maybe it has a very high rate of false positives that get checked by hand before publishing them.

nicce

Not only Codex, but I can't leave ChatGPT app in macOS open for few hours, because it will consume 60 gigabytes of RAM over time and crashes all the apps.

Mindboggling. Or can't use Google's AI Studio in browser because it takes 100% CPU.

Need to write own app for everything???

nbaksalyar

It's not just Google AI Studio, it's also Google proper. Just one search result page consumes gigabytes of RAM. How did this happen? I've switched to DDG and never looked back.

veber-alex

ChatGPT works ok for me but Whatsapp consumes 1000% cpu after the mac wakes up after sleep.

I swear a few years ago shit like this didn't happen on macOS.

coldtea

A few years ago vibe-coded crap apps like that didn't exist on macOS.

porridgeraisin

the damn chat.openai.com webapp lags a lot as well on long chats, typing takes so long.

rsfern

In my experience the input field lags on short chats too, sometimes in the middle of writing the second or third prompt. Are they running some kind of prospective evaluation or something?

lelandfe

I believe this is because they don’t lazyload the document.

The entire conversation sits in the DOM.

LtWorf

Well seems osx has a terrible oom killer.

mannanj

Plot twist: the companies are unethically (blaming it on AI slop code) harvesting your GPU and CPU resources to generate tokens for their products consumers in a decentralized fashion. Not unlike those old monero or other crypto spyware miners.

xpct

Well thank you for your service. I thought about trying out Codex after the disaster that is Claude Code. I'll be fine without either one on my machine

jofzar

Imo codex is significantly better then Claude code for me ATM.

christophilus

Codex is much better, which is to say, it’s only pretty bad.

comboy

I mean, Codex CLI is really bad. But Claude's CLI is so much worse.

Welcome to the world of tomorrow!

DrewADesign

> it's closed source for whatever reason.

When working in an organization that defaulted to open sourcing everything, (even side projects,) there was only one reason any of us would keep something closed — embarrassment. Nobody wants to be the public face of some garbage code base. I’m sure that’s triply true when you’re using that code to justify exorbitant pricing.

CryZe

> THE SPINNER MESSAGE CAUSES 100% GPU USAGE ON AN MBP M5!!

This seems to be a common Chromium problem across tons of software. GitHub has the same issue with its spinners, VSCode as well.

y1n0

> THE SPINNER MESSAGE CAUSES 100% GPU USAGE ON AN MBP M5!!

Is that m5 specific? I’m not seeing on my m4 and I use codex (desktop and cli) quite a bit.

giancarlostoro

> It's a tragedy. The world needs competition to anthropic.

I agree, though Sam Altman's company is the last option I'd want to replace Claude with. I would sooner exhaust every open model.

woadwarrior01

Someone posted a temporary workaround for this on X[1].

sqlite3 ~/.codex/logs_2.sqlite "CREATE TRIGGER IF NOT EXISTS block_log_inserts BEFORE INSERT ON logs BEGIN SELECT RAISE(IGNORE); END;"

Also, I found that running VACUUM FULL on the sqlite file on my laptop shrunk it from 27GB to a mere 73MB[2].

[1]: https://xcancel.com/bdsqlsz/status/2067964486615810369

[2]: https://xcancel.com/jeethu/status/2068087449469780434

sgarland

DB-level rules saving the day once again.

NamlchakKhandro

The real solution is to stop using it and switch to Pi

woadwarrior01

I’ve been using oh-my-pi with GLM-5.2 xhigh as the main model and GPT-5.5 medium as its advisor model. IMO, the combo works better than either of those models alone.

undefined

[deleted]

christophilus

Well, everyone's bashing on OpenAI as well they should, but just a reminder, unlike Claude Code, Codex is officially available to customize here: https://github.com/openai/codex

It's fairly easy to patch.

redox99

That's the CLI, not the codex app which is proprietary.

milkshakes

the issue is in the cli and app-server

Lionga

[dead]

i2km

Shocking. Been open a week and AFAICT just silence from OpenAI. I just find it baffling. You'd think that these vendors would be very sensitive to this sort of issue. I mean, surely they have multiple agents hooked up to github monitoring potential issues and proposing fixes, right? ...right?

Surely it should be trivial for them to have their own tools spinning away trying to fix all the github issues in real time...

drakythe

They're pretty bad about fixing issues it seems. My favorite is #2472 which they demonstrated "fixing" on stage on the release of GPT 5, but the ticket is still open and the "fix" hasn't been merged. The original blog that flagged this fact https://blog.tymscar.com/posts/openaiunmergeddemo/ and the issue: https://github.com/openai/openai-python/issues/2472

lelandfe

Claude meanwhile just auto closes all issues because they simply don’t triage ~anything. There’s been countless instances of this horrifying issue created for over a year now: https://github.com/anthropics/claude-code/issues/16180

> Permission bypass when commands are chained with &&

At one point they fixed their auto stale bot closing bugs but, hey, guess that wasn’t long lived.

cl3misch

There have been Issues on Github about the same problem since April. I'm using Codex a lot and I'm very happy with its performance (UX and output), but it's baffling they haven't fixed this problem.

neuralkoi

Vibe coding takes "move fast and break things" to a whole nother level.

cryo32

Yeah. Here I am sitting on a major incident at our company because someone’s vibe coded shit went seriously wrong.

al_borland

I hope that ends up in the RCA, to show these tools as a real risk, and not swept under the rug, where all blame is shifted elsewhere.

cryo32

It'll go under the rug as it always does because no one wants to explain that our AI first strategy was a stupid one that caused a net negative ROI impact and reputational damage.

Imustaskforhelp

Can you talk more in detail if possible and are allowed to do so?

I do know one instance of someone literally losing a job because they vibe-coded their way to prod. Their response/justification was: "The code wasn't written by me. It was written by Claude/Chatgpt"

They hadn't done anything to the database itself but you betcha that there are some horror stories involving database, lack of proper backups and Vibe-coding gone insanely wrong.

cryo32

I can say very little in detail but basically Claude doesn’t have any conceptual idea of order of operations and transactional guarantees which resulted in producing something that failed under normal load. There is an evidence chain to suggest it was asked to do this but did not and that wasn’t verified.

Our engineers are accountable for what they produce regardless of how so they are cleaning up the extensive mess this made. This will result in a very heated post-mortem meeting between the two factions in the company.

ValentineC

> I do know one instance of someone literally losing a job because they vibe-coded their way to prod. Their response/justification was: "The code wasn't written by me. It was written by Claude/Chatgpt"

People like that and their managers should all be put on PIP right away.

It's not like there is a lack of talent on the market.

smoe

> "The code wasn't written by me. It was written by Claude/Chatgpt"

That seems like a good way to justify your own job away.

flir

> "The code wasn't written by me. It was written by Claude/Chatgpt"

Culturally (across all LLM use, not just programming) we need to nip that in the bud. If we don't it's going to be the new "someone hacked my social media password" get out of jail free card for avoiding responsibility.

I don't care what tools you used, but if your name is on it, you're the author and the responsibility is yours. No "it wasn't me it was my typewriter" bullshit.

latexr

> Their response/justification was: "The code wasn't written by me. It was written by Claude/Chatgpt"

It boggles the mind someone could think that is a valid justification, because ultimately what they’re saying is “I’m useless, what you get from me is the same thing as prompting the model” which still means they would lose their job.

comboy

We are running out of things to break.

stavros

Make more things to break.

GL26

as long as you don't have technical debt, vibe coding is mostly useful for prototyping. For a real product, true SWE will never be replaced

throwatdem12311

all code is technical debt

Otek

Already got replaced at world top tier tech jobs. „True SWE” will be niche / luxury soon, just like real woodworking vs IKEA

inigyou

Software is freely duplicable unlike wood. IKEA could be mass producing copies of the most beautiful chair in the world just as easily as it produces copies of something a 5-year-old drew in freecad.

slopinthebag

Source that a big tech company replaced all their SWEs with vibe coders?

ewsbr

Looks like this was fixed[0], so it should land in the next release.

[0] https://github.com/openai/codex/commit/e98d43ac372ddf7f513c0...

taspeotis

OpenAI really snatched defeat from the jaws of victory late last year when Claude Code was a laggy mess.

Nowadays Codex has typing latency out of the gate, whereas Claude Code has the odd pause but generally displays my key presses as … you know … I press them.

kasey_junk

Fwiw I have the exact opposite experience.

christophilus

I find Claude Code nearly unusable. I always have to type in neovim if I’m typing anything more than a few words.

aquariusDue

It runs fine for me on an old ThinkPad X220 loaded with 8 GB, an i5 and a barely working SATA SSD. This is on Fedora and Claude Code is installed from Anthropic's dnf repo (the latest channel). Granted I'm on the Pro Plan and I'm not running lots of sub agents but the default terminal app from KDE (Konsole) renders and keeps Claude Code responsive enough.

I must be honestly missing some key piece of workflow otherwise I don't know why it would run so slow for other people on better hardware? Granted I'm taking care to tell Claude to not exhaust CPU cores and make sure to not trigger OOM errors, akin to "make no mistakes pls".

Lionga

[dead]

jofzar

This is actually such a classic blunder (shipping trace/debug logging on for everything), but funnily the impact is not in a normal way.

It's crazy we have hit a point where memory, CPU speed and disk speed isn't getting clapped because a Dev shipped logging at trace level instead of what used to the application being catastrophically slow so its immediately fixed in the next update.

kuekacang

It helps too that agent work is done server side so you can hog all the local resources for your thin client.

collabs

This is a little off topic but

These guys really need to stop polluting the root folder of repo with Claude dot MD and copilot dot MD. Get in a room together and decide on a well known folder structure like docs/llm/*

fc-oai

Hey everyone, I'm an engineer at OpenAI. Thanks for the discussion here. Just wanted to report that a fix for this issue has been published in an update to the CLI and Codex App.

tdehnke

Looks like the fix doesn't work for all users from the comments on Github, please verify.

bravetraveler

Somebody please donate some tokens to this plucky startup, they need our help.

ramon156

Blegh, I puke every time I see obviously AI generated comments in GH PR's. You cannot assume any of these people have done their research, other than telling Codex to do it for them

b--l

It's because they use gpt-5.5-xhigh (the money making* model) to build it.

(*for them)

undefined

[deleted]

Daily Digest email

Get the top HN stories in your inbox every day.