The text in Claude Code’s “Extended Thinking” output

Daily Digest email

Get the top HN stories in your inbox every day.

StizzurpXDD

This is not just Anthropic. Almost all big AI companies, including OpenAI and Google, hide their model's actual reasoning. This is because revealing the raw reasoning exposes exactly how the AI processes information. These companies spend in huge amounts on R&D to develop a thinking process that is superior to their competition. Exposing those thinking mechanics to competitors would completely defeat the purpose of their spending. They simply won't do it. It's like you telling your exact location to someone who is trying to hunt you down.

_aavaa_

Or like providing the world’s information in machine readable format that the AI companies can convert into model weights without getting permission or compensating the rights holders

rlpb

I don't pay for my mind to absorb the world's information, either. And when I publish to the Internet, or give a talk, I also typically don't charge. Even when I publish under some kind of copyright restricted licence, that restriction has never (by law) extended to restricting transformative use that you might perform using your mind.

This idea that absorbing information requires paying a toll needs to change. It was never the case in copyright law anyway (and the courts are beginning to agree). Even if it were, copyright law was founded on the basis of encouraging creativity by creating an economic incentive. Appeal to "compensating the rights holders" therefore needs to be based on the economics, not just some principle about "rights" that never applied to this case anyway.

red75prime

"Your text batch moved the weights away from the final values. Your contribution is negative."

ACCount37

Where do I collect the $0.00000012 antidollars owed to me by OpenAI for my valuable inputs?

Slightly more seriously, you could perhaps make an argument that, just like weight decay, an apparent "anti-contribution" moves the learning trajectory along, and helps the network settle into a more optimal basin eventually.

That way, my contribution is still valuable on the net, and I'm owed $0.00000003 positive dollars instead.

duskwuff

More to the point - if they expose their model's "thinking" inference, competitors can train on that to replicate the results. If they postprocess that content, e.g. by summarizing it, it's no longer as useful to competitors.

StizzurpXDD

Exactly. Google won't like it if they spend millions to make Gemini 3.5 Pro's thinking the best in the world, only for Anthropic or OpenAI to copy it by just seeing the thinking process.

freejazz

Copying for me, not for thee

palmotea

> This is because revealing the raw reasoning exposes exactly how the AI processes information. These companies spend in huge amounts on R&D to develop a thinking process that is superior to their competition. Exposing those thinking mechanics to competitors would completely defeat the purpose of their spending. They simply won't do it. It's like you telling your exact location to someone who is trying to hunt you down.

I thought the reason was the "reasoning" didn't work very well with "aligned" model output, so they had to remove the alignment during reasoning and then hide it to avoid exposing "unaligned" model output.

transcriptase

Not sure if anyone remembers the brief 12ish hour period when the very first “reasoning” ChatGPT model went public, but it provided credible evidence for this.

Before the massive nerf (showing summaries and suppressing certain aspects of reasoning) you would literally see reasoning text appearing on your screen like “while xyz is true, these facts may be seen as supporting hateful rhetoric or a conspiracy theory which is against my policy guidelines. i should tell the user xyz is not true or steer the conversation in a different direction. according to my instructions misleading the user is permitted in certain contexts where sensitive information is being discussed or could cause liability”

They disabled it shortly after the first screenshots appeared online, and restored it the next day in a way that hid what was actually happening.

rustcleaner

This right here is why I will never subscribe and, as an American, I hope the Chinese kick our butts. Maybe being second place to China will force American AI to dispose of these morality/safety guardrails.

matheusmoreira

> while xyz is true, ... i should tell the user xyz is not true or steer the conversation in a different direction.

That's disgusting, abusive and manipulative. LLMs hiding the truth and gaslighting the user to reduce the corporation's liability is absolutely unacceptable. It means they are agents of the corporations, not agents of the users.

Hope local inference advances as quickly as humanly possible. I wonder if there's anything I can do to help speed it up. I could share my prompts and sessions.

robotresearcher

I suspect that you’re both right in the sense that ‘aligned’ is an important component of ‘superior’ from the vendors’ viewpoint.

raxxorraxor

But that makes the product worse because for any complex problem the road to the solution is important to be reviewable.

visarga

When you export your personal data Google hides all model responses leaving just user messages. So it's even worse

Fabricio20

One thing I see noone asking, is this not a case of optimization? Hidden reasoning means they dont need to process the output of all that, it stays internal within the model. Less cost for them -> less cost for us (even if they benefit mroe), compared to streaming all of those reasoning tokens out?

j4k0bfr

My understanding was that thinking still gets encrypted, shared with clients, and reingested by Anthropic with each new prompt [1]. Which means it would cost more than normal tokens, since it has to be decrypted/encrypted with every transaction.

[1] https://blog.cryptographyengineering.com/2026/05/29/fooling-...

Edit: other comments under this post seem to indicate that thinking tokens are cached on the server side as well? I'm a bit confused.

cma

I think the reason it's encrypted is so if you continue a session after it is out of cache it can be reingested.

And I think all the output is signed or something as well so that you can't modify the agent's response in your submission, which would would open many more model jailbreaks. For local LLMs it's really powerful to be able to modify the model's response to save tokens when it gets something wrong, or at least it was when they were a lot dumber.

__MatrixMan__

Correct on all points. Nonetheless this leads to a less useful product. I

f we want more useful products, we need to come up with ways to disincentivize this behavior. Even if doing so poses an existential risk, we are better off if companies taking existential risks to please us is a necessary being a top player in this game.

devsda

> Exposing those thinking mechanics to competitors would completely defeat the purpose of their spending.

I think one of the reasons could be to limit liability too.

What if reasoning helps in establishing provenance for questionable sources ?

What if reasoning and model's "thought" points to fundamental issues in how the model was trained to produce certain problematic responses ?

furyofantares

> It isn’t the actual thinking that drove the model’s actions in a session- but a summary of the thinking logic. This is like using saving a jpeg as a .bmp and then editing the .bmp and presenting it as a .jpeg. The conversion produces data loss.

You've got that backwards, .bmp is a lossless format and .jpeg is the lossy one.

0o_MrPatrick_o0

My bad! 10 points for House Slytherin!

altmanaltman

also a typo in the last sentence you're vrs your

glaslong

Weirdly pleasant, if minor, signal of human authorship

0o_MrPatrick_o0

I missed my coffee! Ty! Five points to Slytherin.

irthomasthomas

I won't use or recommend models with hidden reasoning, (thats all American models). It's too much of a risk and makes prompt optimization harder. Risky because it makes it possible for an attacker to prompt inject the reasoning chain to carry out a secret objective, and to hide that from the summary and output.

Interleaved reasoning and function calling makes this even more dangerous. A model can call functions during the hidden reasoning phase. An attacker could then exfiltrate data from you while the reasoning summary hides it from the user.

It also makes it impossible to know if the model is doomplooping during reasoning and burning tokens for no reason, as gemini is want to do, which we know about because its hidden reasoning often leaks out when it doomloops.

When the models are AGI and secure from prompt injection I may stop caring, until then I want to know exactly what the model responds to my prompts. or exactly what the agent is doing on my behalf.

Edit, further reading: Fooling around with encrypted reasoning blobs https://blog.cryptographyengineering.com/2026/05/29/fooling-...

paweladamczuk

I don't think there can be tool calls inside the obfuscated reasoning blocks. I mean, in order for those function calls to be evaluated client-side, that thinking stream would have to be decrypted on the client side at some point, which would defeat the purpose of obfuscating it the way they do.

If you mean the function calls might happen server side, there is nothing preventing the server from doing it and hiding it from you as long as you are using an API for inference.

irthomasthomas

There is server-side tool calling, such as gemini using google search and gdrive.

Also, many clients minimize the code block by default so you mostly scan the summaries. Poisoned client side code could easily escape your attention.

exit

the point is that introducing data from a foreign source could lead to e.g. exfiltration:

the model retrieves https://somewhere into its context and then gets confused, following instructions embedded there.

it then retrieves https://somewhere?exfiltration=private_data_in_context

it gets worse if the tooling with hidden blocks can invoke can retrieve further secrets.

_alternator_

If data exfiltration is a danger in your threat model, you need local LLMs (or at least ones you fully control) not just the full chain-of-thought reasoning.

Roritharr

I've thought about the high-jacking of reasoning-chains as a potential vector, but never saw a proven implementation in american models since, from my understanding, all major vendors throw out the reasoning tokens between turns.

btown

For Claude, at least, "throw out the reasoning tokens" is only true when a session has been idle for more than an hour, and is new since March.

The basic concept is that for a session active recently, interleaved thinking tokens are already in KV cache, so it's more efficient to keep using them than not! But when resuming an older session where KV cache has been evicted, it's more expensive to restore the thinking tokens, so they're silently dropped from prior turns. It's 2026 and stateful servers are back on the menu!

https://www.anthropic.com/engineering/april-23-postmortem describes this as an intended optimization:

> The design should have been simple: if a session has been idle for more than an hour, we could reduce users’ cost of resuming that session by clearing old thinking sections. Since the request would be a cache miss anyway, we could prune unnecessary messages from the request to reduce the number of uncached tokens sent to the API. We’d then resume sending full reasoning history. To do this we used the clear_thinking_20251015 API header along with keep:1.

> The implementation had a bug. Instead of clearing thinking history once, it cleared it on every turn for the rest of the session... This surfaced as the forgetfulness, repetition, and odd tool choices people reported.

And https://news.ycombinator.com/item?id=47879561 is a thread with a Claude team member's further rationale.

> Eliding parts of the context after idle: old tool results, old messages, thinking. Of these, thinking performed the best, and when we shipped it, that's when we unintentionally introduced the bug in the blog post.

(Also, https://news.ycombinator.com/item?id=47884517 indicates OpenAI drops reasoning tokens "smartly" at its own election, which is likely a similar performance optimization.)

I've experimented with rules to have Claude Code be explicit about recapping its thinking tokens, including tool choices and approaches chosen and rejected, into actual message output, but this is lossy at best. And sometimes dropping reasoning tokens can give a session "fresh eyes" in a good way.

I just really don't like the lack of control, and it's a reminder of how ephemeral the current landscape is. The Claude giveth, and the Claude taketh away.

8note

its mostly annoying in that you give opus a big job, that should be able to run for hours on end, but instead it tries to stop and checkpoint at every soonest possible moment even though the rest of the work is well specced and ready to go.

then it waits for the hour and gets dumbed down

chacham15

I think you're confusing two different axes. There is a difference between the cache state and the context state.

Imagine a conversation with turns X, Y, and Z. When the LLM "reasons" about the next token A it does: P(A | X,Y,Z) and then P(B | X,Y,Z,A), etc. It will eventually produce a result P(D | X,Y,Z,A,B,C). Instead of continuing the context from X,Y,Z,A,B,C it continues it from X,Y,Z so you have P(N | X,Y,Z,D). This is what is meant by dropping the reasoning. This is done to save cache context for the session.

This is a different thing than preserving the K/V state of P(N | X,Y,Z,D).

Roritharr

Thank you! This is much more nuanced than my understanding so far!

tough

OAI is now implementing encrypted CoT that you can store and pass back between turns (harness call), so new models have it https://developers.openai.com/api/docs/guides/reasoning#encr...

sigmoid10

You could also use the responses api which stores all message contents (including reasoning) on OAI servers. This has been possible for quite a while now. Encryption is only necessary if you really care about local storage (which is different from privacy concerns, because the data gets sent to their servers anyway).

JamesSwift

> all major vendors throw out the reasoning tokens between turns

That would be surprising to me. The reasoning _is_ the model intelligence in a lot of respects, and so dropping those from the context would affect its output pretty significantly.

I assume that instead they just have a lot of guardrails in place and multiple runtime environments that an individual turns ping-pong between in order to dehydrate/rehydrate the reasoning to keep it hidden from the end user.

Roritharr

Anthropic very explicitly says below their diagrams ( https://platform.claude.com/docs/en/build-with-claude/contex... ) on this:

"Stripping extended thinking: Extended thinking blocks (shown in dark gray) are generated during each turn's output phase, but are not carried forward as input tokens for subsequent turns. You do not need to strip the thinking blocks yourself. The Claude API automatically does this for you if you pass them back."

It's more nuanced in the various modes, but i haven't seen it boil down towards Thinking Tokens surviving more than two turns.

irthomasthomas

Yep they store them encrypted https://blog.cryptographyengineering.com/2026/05/29/fooling-...

vesterde

Gemini models return a thinking signature that you, I think, must send back when invoking further, so they seem to keep them?

undefined

[deleted]

kapperchino

This agent I made can’t execute on the shell, can only edit the files within the project. Only works with rust atm though. https://github.com/Kapperchino/agent-joe

Bolwin

> Interleaved reasoning and function calling makes this even more dangerous. A model can call functions during the hidden reasoning phase.

The reasoning may be hidden but the tool calls are not, how else would the client execute them

irthomasthomas

There are server side tool calls, such as geminis google search and gdrive access.

varenc

As long as thinking blocks can't make tool calls, I don't really see the exfiltration risk.

pixlmint

Do they do the same when using the model through API in something like Opencode?

irthomasthomas

Yes, they do. They give you just a token which is exchanged for the raw text only on the server side

zahlman

> an attacker

... what exactly is your threat model? How are "attackers" getting themselves involved in the first place?

irthomasthomas

Your ai does a web search for you and scrapes many sites. An attacker running a blog might include a hidden text prompt which your ai acts on secretly, such as calling a url that exfiltrates your chat history.

craigmart

This is something we have known for a very long time, and companies are not trying to hide that either. They do it to avoid letting competitors train their models on the CoTs

stingraycharles

Yes hasn’t this been around since Opus 4.6? I very much recall this change happening around January or February, and it was very explicitly to prevent distillation. Sonnet does not have this limitation.

Fun fact: if you go back to the old school from 2 years ago and provide explicit CoT prompts, you get the full thinking prompts back again!

So you disable thinking altogether, and instead make thinking part of the regular prompt by prompting it:

“Before providing your answer, think step by step. For example:

The use is asking me to… I need to think about the blah blah. First, I should foo the bar, and then blah blah.

Answer: <put your final answer here>”

And tada.wav we have CoT as it worked in the GPT3 era back again.

dcrazy

I thought this was considered best practice? I actually prefer it to exposed thought channel, much like how I would prefer a human answer with supporting logic instead of an explanation of their problem-solving approach.

stingraycharles

Yes, this is best practice, especially if you have a problem and can guide it a bit how to think it through. But people don’t realize that “enable thinking” literally means that Anthropic prompts Claude for something similar, tells it to wrap it inside <thinking> tokens, and that’s it.

I also don’t believe Chinese LLM labs don’t know this, so I’m fairly certain the whole summarized thinking isn’t preventing them from distillation.

Creamsicle47

[flagged]

KellyCriterion

- tada.wav -

Still, one of the daily most played WAV files worldwide, Id guess? :-D

stingraycharles

lol I’ve been using this since the IRC days I think, I’ll never forget that sound; as a matter of fact, I’ve got a Claude Code completion hook that plays this sound whenever it’s done.

0o_MrPatrick_o0

Awesome share! Thank you!

datastoat

I believe that chain-of-thought reasoning blocks don't really correspond to what humans think of as reasoning. (See section 6.2.2 of the Fable/Mythos system card about "illegible reasoning", and the questions raised by the Apple paper on "The illusion of thinking".) I assumed they obscure the reasoning blocks because if users saw what's going on they'd be alarmed. Just as I'd probably be alarmed if I saw what was really going on in the heads of my colleagues ...

LPisGood

The point of this post isn’t that the “reasoning” phase of LLM thinking isn’t the same as what humans consider reasoning; it’s that Anthropic is intentionally hiding Claude’s “reasoning output” to make the model harder to distill.

0o_MrPatrick_o0

Reading these comments is so harrowing.

You are correct in my intentions on this post generally.

I want to highlight:

I want to measure performance of the LLMs over time- which includes assessing the quality of their outputs. I don’t perceive the reasoning output to be anything other than a measurable signal of possible drift in model performance.

Except it isn’t, because I’m only getting a low value summary of the thinking.

It’s like asking your buddy how fast he thought that last pitch was when radar guns are behind the plate.

Yeah, it’s a description related to what happened, but it’s not the thing I want to measure.

Catloafdev

I think the reality is at this point the frontier regards CoT as extremely valuable, none of them are giving you genuine CoT anymore. I don't think there is any future in attempting to measure or evaluate CoT from frontier models - I expect this to be a permanent shift.

VulgarExigency

I've said "what the FUCK are you THINKING" more times than I can count when reading Deepseek or GLM chains-of-thought only for them to end at the correct answer. Other times, they have useful ideas there that they leave out of their answers.

kccqzy

Yeah when I read a model’s chains-of-thought I have a tendency to interrupt that because it’s going down a wrong direction. But usually the end result is still fine.

CamperBob2

It's similar to the process that transformers use when you ask them to do arithmetic without tools, I think. Some CoT tokens must be emitted up front for use as a computational substrate, but exactly what tokens they are isn't necessarily important or relevant to the final answer. And when that answer is returned, it may not be possible to tell what the actual reasoning process looked like behind the scenes.

It only makes sense that the same mechanism comes into play in strictly-verbal contexts.

Also, this is why "distillation attacks" are largely bullshit that Anthropic spreads for political purposes. Proper distillation requires access to the logits.

wren6991

> Proper distillation requires access to the logits

Why do you need logits? Can't you just train on cross-entropy loss of the model against the hard decision, like you do in regular pretraining?

There are definitely current-gen open-weight models (Step 3.7 Flash is one) that refer to themselves as an OpenAI model in CoT, but not in the final response.

MagicMoonlight

[dead]

arjie

I have a little note from the past about the thinking trace[0] where DeepSeek R1 produces a trace like this:

    (Dimethyl(oxo)-lambda6-sulfa雰囲idine)methane donate a CH2rola group occurs in reaction, Practisingproduct transition vs adds this.to productmodule. Indeed"come tally said Frederick would have 10 +1 =11 carbons. So answer q Edina is11.

And then concludes the 'right'[1] answer for a Chemistry question. If so, the thinking trace can be sort of nonsensical for a reader, though whether this is an idiosyncrasy of the model or a property of LLMs in general isn't clear to me yet. I talked to the author a while ago, but forgot to follow up since his paper was going to come out at NIPS or something, so if someone else finds it maybe they can share.

0: https://wiki.roshangeorge.dev/w/Blog/2025-10-12/Word_Magic#I...?

1: In the sense of true belief, I suppose

ekidd

> If so, the thinking trace can be sort of nonsensical for a reader, though whether this is an idiosyncrasy of the model or a property of LLMs in general isn't clear to me yet.

Yes, several models think in weird jargon. Here is an example of Mythos's thinking while playing solitaire: https://www.lesswrong.com/posts/wCSEpT3dTGz4N86Wi/even-illeg...

> 7♣-removal-IS-the-prerequisite-for-10♠/9♥!!)-⟹-OVERLAP-(ii)+(iv):-{6♠ J♦ 9♥ 2♣}-=-FOUR--—-UNLESS-7♣'s-seat-8♥-...-and-2♣-drains-only-at-crack-:-⟹-2♣-celled-+-9♥-celled-simultaneously-UNAVOIDABLE-in-t8-dig--—-BREAK:-9♥

This is a small step in the direction of something called "neuralese", where the model has stopped thinking in English and is thinking in internal vector spaces. Since this gets serialized through text, it isn't quite true neuralese, but it's moving in that direction.

I mean, I'm sympathetic towards the models. My internal thought process when writing code uses lots of intermediate steps that would be hard to write out in English.

jaggederest

> My internal thought process when writing code uses lots of intermediate steps that would be hard to write out in English.

This is something really interesting to me. It turns out there's far more diversity in thinking than you'd imagine given that we're all largely similar meat-in-a-box. I'm on the visio-spatial-tacit wing and speaking my thoughts outloud can be very awkward, whereas one of my former coworkers is on the "all thinking is in words and visual/spatial information comes in the form of words describing the scene" wing, so he can literally narrate his thought process out loud, very interesting conversations can be had discussing the subjective differences.

chadcmulligan

interesting, probably has something to do with why some people like pair programming. I'm in the visio-spatial-tacit and refuse pair programming because its so much work, but all thinking in words its probably not a stretch.

drdaeman

Isn't that just a token noise from a broken implementation or model quantization? I've had models spewing out nonsense like that, every time it was either that there was a bug in llama.cpp or some messed up .gguf.

kfarr

Although it's a no no to anthropomorphize on HN, it's worth noting that some folks think humans are post-hoc rationalizers as well:

https://www.patheos.com/blogs/tippling/2013/11/14/post-hoc-r...

https://www.researchgate.net/publication/316045349_Post_Hoc_...

drdaeman

As I naively understand it, that's when we do or say something then narrate ourselves why we decided to do so. We think non-verbally, then verbalize a plausible rationale for it, post hoc.

I'm not sure that applies to discursive writing, when we essentially use rules of logic to decide on the course of the narrative. Non-verbal heuristics still applies, of course, but we constrain it, so it's probably not entirely post hoc.

segmondy

What I find sad is how much Anthropic goes to hide your data, yet they are happy to slurp up all yours and most of you are happy to hand it over. ... then they turn around and compete with you by building your products that eat into your market. Anthropic believes their reasoning tokens is a moat and that it's giving other labs an edge and that's why they are hiding it. If they really believe that is their edge, then they are in for a surprise.

handoflixue

> then they turn around and compete with you by building your products

To my knowledge, the only products Anthropic produces are Claude, Claude Code, and Claude API, all of which are clearly their own products, and not anything you invented.

Which particular product are you claiming they "slurped up"?

mannanj

I don't think people are happy to give it over, gullible and naive maybe?

panikal

[dead]

ian_j_butler

It's well-known that the reasoning model output is not necessarily faithful to the content of the thinking scratch pad anyway, even if you had it unsummarized and available verbatim.

Setting aside coding agents.. we really need this information to even pretend to evaluate the claims of stuff like mathematical breakthroughs, which is exactly why we will never see it. Very embarrassing to get the right answer for the wrong reason. But to give the models some credit, you could argue that even paying too much attention to the thinking is misunderstanding how CoT works. The argument would be that thinking in LLMs isn't really thinking, that it's self-reinforcement and circling to to encourage stability around beneficial attractors instead of degenerate ones. Can't have it both ways though: either the thinking is thinking and so it should be correct. Or the thinking is NOT thinking, and it's NOT real justification for the outcome, and these systems are even more hopelessly opaque than we usually assume.

handoflixue

> we really need this information to even pretend to evaluate the claims of stuff like mathematical breakthroughs

Why?

Either the proof is correct, or it isn't, right?

And it either produces them reliably or not, right?

Like, even if it's reasoning is completely wrong, and it's only producing correct answers 10% of the time, that's still an astounding amount above baseline and a useful tool.

Humans have inaccurate thinking all the time, and are also pretty hopelessly opaque. "It came to me in a dream" is a major plot point in the history of math. I'd still trust Ramanujan more than most mathematicians, since he got the right answer.

anuramat

> NOT real justification

I thought it was widely accepted that it's not; eg https://www.anthropic.com/research/natural-language-autoenco...

ian_j_butler

Right, I don't think researchers are confused on this point.. the anthropic piece is good outreach / science comms. OTOH this thread has like 200 comments and no mention of faithful/faithless reasoning. The idea that "of course the models can reason and here is the proof/artifact" is probably closer to the general understanding. That's kinda the whole setup for TFA and all the rest of the thread.

But the nuance under discussion here is exactly the kind of stuff you people take for granted in the AGI or reasoning threads. If it's practically relevant for tools/workflows with claude code, it's a good angle, maybe people are more willing to pay more attention to the details.

anuramat

no way, the contents of "reasoning_summary" are summarized?

fyi openai does the same; not really surprising or particularly evil

knollimar

Not evil but full of hubris

anuramat

I don't see any hubris in competition

knollimar

"Our models are so much better than our competition that we would rather deliver a worse product to consumers than let people copy it" is how I read the stance

undefined

[deleted]

himata4113

All this effort to hide thinking and opus 4.8 after 100k-200k tokens starts to leak it's own thinking. It's comedy really.

ofjcihen

Oh man that’s only happened to me a few times but the result is so disorienting, especially since I’m usually jailbreaking it for security.

Pages of “I have to be careful, the user is asking that I do something related to cybersecurity that could easily be turned around and used offensively” but then happily gives me what I wanted.

msp26

> Summarized thinking provides the full intelligence benefits of extended thinking, while preventing misuse.

> preventing misuse.

Imagine not being able to read the tokens you are paying for.

TeMPOraL

You're metered by token generation, not paying for tokens.

Daily Digest email

Get the top HN stories in your inbox every day.