Coconut by Meta AI – Better LLM Reasoning with Chain of Continuous Thought?

Daily Digest email

Get the top HN stories in your inbox every day.

jeswin

Interesting. Due to its emphasiss on BFS, it's the opposite of something I've been trying (I named it the "Tree of failures").

My assumption was that humans don't try a breadth-first approach. Instead, we split a task into a short-step (instinct and intuition selected), and long-step that summarizes/stores the next steps. The key idea is to recursively evaluate a task as a short-step (high-res - gets executed) and a long-step (lower-res - is just stored), until it succeeds or fails. If it fails, we must walk back keeping a summarized tree of failures in state so that we can exclude them in future selections.

The effectiveness of instinct has a steep fall-off at longer distances - so it's better not to chart out of a series of steps. When we do BFS, we drive down the value of instinct in favor of compute. I guess ultimately, it depends on the type of problem you want to solve.

Reach out to me if you want to prototype it with me.

dietr1ch

I feel humans like doing something in between, maybe a bit like A* would do sometimes.I wouldn't call it A* because of the lack of a consistent heuristic and also lack of strictly numeric evaluation, but it's an in-between DFS and BFS for sure (as is every tree search algorithm?).

We go deep while we think it's a good lead, because so far things make sense and it'll be less work, but at some point we start questioning our decisions early in the descent and try alternatives.

verdverm

You may find Prioritized Grammar Enumeration as an interesting in-between DFS/BFS algorithm

https://seminars.math.binghamton.edu/ComboSem/worm-chiu.pge_...

cube2222

I think the problem with long chains of steps on their own (without the bfs stuff) is that your failure probability quickly grows to unreasonable levels.

Basically, if each step has a 97% chance of being completed correctly, if your task requires 10 steps one after the other, the chance of success falls to 97%*10=74%

If I understand correctly, part of the point of the BFS is to throw compute at it, in order to lower the failure rates. Kind of a "run many times in parallel and pick the best one". This can be effective, but also quite expensive, as seen in the costs OpenAI had to pay for their ARC-AGI benchmarking runs.

kordlessagain

Your "Tree of failures" approach aligns with how natural cognition seems to work at the edge of comprehensibility. Rather than exhaustively searching (BFS), we use instinct for immediate steps while maintaining a lower-resolution model of longer-term possibilities. The key insight about storing failures rather than successes is particularly interesting - it's more efficient to remember what doesn't work and let patterns emerge naturally from the remaining space.

This maps to what I've been exploring with edge cognition and semantic anchoring - using fast set operations to quickly eliminate known bad paths (your failure tree) while allowing the system to explore promising directions using more expensive operations only when needed.

The instinct fall-off you describe mirrors our observation about the relationship between computational load and pattern recognition. As distance increases, we need more efficient ways to prune the search space rather than trying to maintain high-resolution understanding throughout.

My gut says optimizing on the amount of compute used to do the search (and the inference) is maybe something worth exploring.

viraptor

Reminds me of what plandex does. https://plandex.ai/ It already does the automatic "does this need splitting into subtasks, or can it be solved immediately" processing.

torginus

I don't get why you need tree search at all? What does it give you over a pure LLM trained to do CoT in a tree-like manner? If the context window's long enough, it can generate the reasoning-tree just by pure next-token prediction, and rather than BFS, it can guide the tree search with its own value function (which is part of the LLM itself) instead of sticking to hard algos like BFS and DFS.

By the way, BFS sounds like it will give you thorough results, at the cost of increased compute. Useful for beating benchmarks, but probably causes marginal improvement for massively improved compute.

Still, the improved quality could be meaningful, if it's used for generating training data for Llama4

dietr1ch

Tree search is natural when you want a path to navigate, so it does fit a sequence of interactions in a conversation too.

I agree that both, DFS and BFS are likely awful[^0], but a more informed approach can probably do better[^1]. Also, at some point on generating the conversation/reasoning tree through token-prediction you need to choose which of the possible conversations you are going to keep on extending/generating, which maps precisely to choosing which node in tree search to expand. I'd argue instead that everything has to look like a search algorithm from, at least it'll be the case for anyone who has studied it more deeply.

I'll go even further and claim that Tree Search is Complete as for every problem there's a solution space that can be navigated with a Tree Search Algorithm[^2]. I used to think that you could walk down the space of provable things, but now in the LLM hype days it seems you only need to walk the space of conversations that you can generate.

---

[^0] with DFS always at risk of giving obnoxiously long answers, or not terminating if there's loop or spirals [^1] probably through metadata coming from latent variables meaningful to judge a conversation (certainty, ~branching size of a reasonable conversation, whether there's open questions left) [^2] Even if that was poorly done like on combinatorial problems. Imagine a sudoku where you only check the rules once you fill all cells.

kurthr

The classic thing people say is "asking the right question" gets you half way there. Your approach sounds like something I call "getting to No" for a problem. It's sort of a combination of "getting to know" and the opposite of the salesman's "getting to Yes". When it works, it's the fastest way to prune off obligations.

The goal is to figure out why some particular problem: isn't really a problem, doesn't need to be solved, can't be solved that way, can't really be solved (because of physics or it's really a different problem). As you define the problem better, you can rule each one out to find, the "real" problem, that you CAN solve, and at least one path forward. There's still many ways that it might not be the optimal path, but you know roughly how to get to somewhere better. It also trains you to see around obstacles to success.

I've found that some of the best work I've done (especially on acquisitions) was in defining why NOT to do something that looked like a good idea (or particularly interesting to work on) from the onset, but was destined to fail or required unknown HW technology. Frankly, looking >5 years out feels like a coin flip, because some other competing technology could come along before you can get to production.

katamari-damacy

that's more fit for agents, no?

jeswin

You're right that it's technically orthogonal to what's in the paper. I was trying to model the "reasoning process", which has general applicability depending on how/where it's implemented.

wafflemaker

How do you understand instinct?

I bought a new SSD drive for an old laptop to avoid buying a new one, (x230 has amazing keyboard) but left to another country for Christmas. My intuition told me to take it with me, but logical sense said there will be no time for such things as moving OS to a new drive.

My flight back to the work country got cancelled due to fog and I ended up spending a week longer at in-laws place, with plenty free time. A new 512GB drive would help me studying, giving plenty space for school VMs.

CGamesPlay

Paper: https://arxiv.org/abs/2412.06769

The link is in the OP, hidden away in an image caption fir some reason.

Klathmon

So is the big improvement here simply skipping the unembedding/embedding step for internal thoughts? Or is it mainly in the training methods to teach the CoT and how to switch between "latent thought" and text output?

It's really interesting that a fixed number of "latent thoughts" performed as well as a binary classifier! I didn't expect that at all, the way OpenAI talks about CoT it seems the ability to let it "keep thinking" let's them continually score higher on benchmarks while throwing eye watering amounts of compute at the inference.

Crye

It mentioned not penalizing/rewarding the model for thoughts only rewarding the answer after the thought. I am curious how back propagation works then.

lovasoa

The researchers leverage existing language Chain-of-Thought data, where each sample consists of a question, reasoning steps, and the final answer. At stage 0, the model does not generate any thought tokens, and is just trained to yield the reasoning traces and correct answers for the Chain-of-Thought samples. In the subsequent stages, at each stage, we remove one reasoning step from the sample, and instead add thought tokens. In the illustration above, a single thought token is added in each stage, instead of a single reasoning step, but this is controlled by a hyperparameter ‘c’.

yorwba

The tokens of the answer depend on the preceding continuous thought vectors, which you can backprop through in the usual way.

viraptor

I was waiting for something like that to happen! Next step - creating a human-language-free representation. I believe that once a group of llms can communicate only in embeddings tuned without any human text input, we're going to open a completely new chapter in AI.

mckirk

This is actually something you probably want to avoid, if at all possible, because it makes it very hard to maintain insight into what the AIs are communicating among them. But that insight is crucial to stay informed about their progress in taking over the world, etc.

dwohnitmok

Yes! We should be extremely cautious about embracing approaches that make LLMs even more inscrutable. Having CoT, however unreliable it is, is nonetheless a huge boon for model evaluation that we should not give up so lightly.

torginus

Yeah, and it might not even gain us that much. It reminds me of how a zipped piece of JSON often comes close enough to bespoke binary serialization formats that it's not worth bothering with it.

bboygravity

How does a group help anything?

If you put 1000 dumb people together, they don't magically become smart?

IshKebab

If you put 1000 people who can't talk together they will create language so they can communicate. He's saying if we put LLMs together and don't force them to use English to communicate then they'll create their own language which may be superior for LLMs to English.

May be true but who knows.

I wonder if anyone has somehow tested the Sapir-Whorf hypothesis for LLMs somehow by training them on different languages and comparing task performance. I guess it's too difficult to get a large equivalent training set in different languages.

stingraycharles

Is everything in LLMs translated back to English before interpretation?

It works fairly well in my native language, I’m surprised to learn that things get translated back.

wodderam

It feels like an exercise in anthropomorphization to me.

Sapir-Whorf hypothesis is generally not considered to be reality. It makes intuitive sense but is wrong.

There are hours of podcasts with Chomsky talking about LLMs. The gist of which is that LLMS are extracting surface level statistical structure of language that will be good for routine coding and not much else. It is easy to infer that Chomsky would believe this idea to be utter nonsense.

I believe even the idea of getting a 1000 people together and we agree to label a rock "rock", a tree "tree", a bird "bird" is not even how human language works. Something that is completely counter intuitive.

Reading the paper, no one believes a hidden markov model is creating some kind of new thought process in the hidden state.

I certainly though could have no idea what I am talking about with all this and have pieced together parts that make no sense while this is a breakthrough path to AGI.

mromanuk

Because group estimation is superior to individual estimations: The phenomenon is called wisdom of the crowds. When a group of people independently estimate something, individual errors tend to cancel each other out, leading to a surprisingly accurate collective result. This works because of:

Diversity of opinions: Different perspectives bring a range of estimates. Independence: Errors aren't systematically biased as long as individuals estimate without external influence. Error averaging: Overestimation and underestimations balance out when averaged. Law of large numbers: More participants increase accuracy by minimizing random errors. It was demonstrated by Francis Galton in 1906, where a crowd's average guess of a bull's weight was almost spot-on. (estimates must be independent and reasonably informed for this to work.)

littlestymaar

> If you put 1000 dumb people together, they don't magically become smart?

1000 is probably too high, but groups of people are in fact more intelligent than individuals (though for humans it is likely because recognizing a correct answer is easier than finding it in the first place)

TheOtherHobbes

Functional groups which work well together, include open sharing of research and ideas, persistence of best output, are dedicated to realism, and are more focussed on problem solving than status display, will be smarter. The group works like a filter which generates multiple solutions and selects, remembers, and abstracts the best.

Dysfunctional groups which do the opposite will be catastrophically stupid.

There have been plenty of dysfunctional groups in history.

nfw2

depends on the circumstances. lin-manuel miranda can probably write a better musical by himself than a team of 20 people with equal input would.

also, the bottlenecks that teamwork helps solve (eg the high cost of gaining expertise and low throughput of reasoning capacity) may not be that relevant in the ai age

torginus

I wonder if there's research on this, like if you took a group of individuals who scored the same on an IQ test, then got them to solve one together, how would the score improve?

Is there a way of selecting people to cover each other's intellectual blind spots?

coldtea

Isn't that the very case behind the "wisdom of crowds" thing?

amelius

Looking at the current state of democracies around the world, my hopes are not on "wisdom of the crowds".

konart

Not magically. Our great ancestors were pretty dumb, but they were getting smarter and better because of sharing their knowledge.

pigpop

yes they got "smarter" by compiling a corpus of knowledge which future generations could train on.

sarcasm aside, throwing away the existing corpus in favor of creating a new one from scratch seems misguided.

this paper isn't about creating a new language, they are omitting the sampler that chooses a single token in favor of sending the entire end state back in to the model like a superposition of tokens. that's the breadth first search part, they don't collapse the choice down to a single token before continuing so it effectively operates on all of the possible tokens each step until it decides it's done.

it would be interesting to try this with similar models that had slightly different post training if you could devise a good way to choose the best answer or combine the outputs effectively or feed the output of a downstream model back in to the initial model, etc. but I'm not sure if there'd necessarily be any benefit to this over using a single specialized model.

ulbu

they were not one bit dumber than you.

JFingleton

> If you put 1000 dumb people together, they don't magically become smart?

Do they not become smart*er* though?

computably

"Smarter" is too vague. A group can compensate for individual weaknesses or even converge on a hard-to-make prediction given sufficiently uncorrelated outputs; basically the idea behind ensemble models / wisdom of the crowds. But a group of 1000 dumb apes would never achieve categorically-above-ape intelligence, probably not even "genius" ape intelligence. Groups of unintelligent agents come with downsides as well, like the ant death spiral.

senectus1

they kinda do.. Its how City's work.

People learn by being around others being both successful and unsuccessful.

undefined

[deleted]

blizdiddy

That came out a few weeks ago from meta. Large Concept Models

https://ai.meta.com/research/publications/large-concept-mode...

jkingsman

How does one impart textual knowledge discovered by humans without language?

thelittleone

Couldn't we use an AI model trained on historical text data (up to today) to predict likely events for tomorrow? Taking this further, a sufficiently advanced AI system could potentially analyze human-generated text up to any given point in history to understand patterns of human thought and behavior, then project those patterns forward. This speaks to your point about human language - while we need text data for initial training, the AI's internal representations and predictions could potentially transcend human language constraints.

viraptor

The training of the LLM itself would still use the human language. But you could add an extra channel that's never given any text or direct dataset training. Keep it purely a connection between hidden layers of different instances of LLM and train using the usual loss of perplexity or similar metric.

The interesting thing then would be - does it converge to similar embedding space as the input, or can LLMs create a more efficient "language".

wruza

I thought about it too (layman). When I learned about embeddings it almost immediately clicked as a sort of an ascended language, not sure why no one seems to talk about it. Exchanging embeddings must be so much “wider” communication channel than speaking real language. And in contrast to a language embeddings are (iiuc) continuous, i.e. you can rotate a vector continously and it will smoothly trace the changes between A and B. I can picture communicating in something like https://www.google.com/search?q=charlie+conspiracy+meme&udm=... - embedding difference vectors, but it’s all crystal clear and is a natural language for an llm, cause any vector combination points to a correct “inner screen” image/concept/younameit.

Or maybe this is my own ignorant confabulation, so nvm.

cbxjksls

[dead]

ttul

TL;DR: Meta started with a pre-trained language model. They then fine-tuned it on step-by-step reasoning examples as you would do if you wanted your model to become particularly good at chain of thought reasoning.

However, they also introduced a couple of new tokens. The <bot> token tells the model to go into latent space thought mode (“beginning of thought”). The <eot> token ends latent space thought mode. While in this mode, the model auto-regressive iterates by copying its final hidden layer back onto its input layer, obviously generating new tokens at the output with each inference step as it always does.

The idea is that by passing the final hidden layer back through a few times, the model can squeeze more insight from the context. And that’s precisely what they found was true.

Training involves progressively replacing language reasoning steps with latent space auto-regression steps. So for instance, you might have a math problem in the training data and at first the model is fed all of the steps of the math problem in language form. But in later iterations of training, step one is replaced with latent space auto-regression. And then step two as well, then also step three, etc…

Eventually, the model learns to enable latent space thinking mode by itself by generating the <bot> tokens and to end it be generating <eot> tokens.

Pretty ingenious!

avodonosov

Thank you for the summary, useful for me as I only managed to skim throught the first half.

But one correction, probably, regarding this bit:

> While in this [latent space thought] mode, the model auto-regressive iterates by copying its final hidden layer back onto its input layer, obviously generating new tokens at the output with each inference step as it always does.

I have impression that output tokens are not generated while in the latent thought mode.

ttul

Output tokens are still generated, otherwise the model wouldn’t know when to stop being in latent space mode. The <eot> token emerges as the top token at the output layer when it’s time to switch back.

avodonosov

Explicit <eot> is only used in training.

At inference time, the paper says:

> A challenge lies in determining when to switch between latent and language modes. As we focus on the problem-solving setting, we insert a <bot> token immediately following the question tokens. For <eot>, we consider two potential strategies: a) train a binary classifier on latent thoughts to enable the model to autonomously decide when to terminate the latent reasoning, or b) always pad the latent thoughts to a constant length. We found that both approaches work comparably well. Therefore, we use the second option in our experiment for simplicity, unless specified otherwise.

(the bottom of the page 4 in the paper pdf, which can be downloaded from https://arxiv.org/abs/2412.06769)

Why this point in you summary caught my eye, because the article specifically emphasises non-verbal nature or aspect of reasoning. Internal representaions used by a thinking human are largely not words, and the COCONUT approach tries to model that.

Also note, that a whole reasoning step in training data - easily a sentence or more of natural language - can be replaced by a single "Thought" element. (How many Thought elements replace a reasonong step is controlled by a hyperparameter ‘c’; the illustrations are made for ‘c=1’).

BTW, one observation: the aipapersacademy.com article in the subject calls the Thought elements "thought tokens", but the original paper never calls them "tokens", just "Thoughts" or "latent thoughts". I suppose the paper carefully avoids that to prevent confusion, as "token" mainly means a linguistic unit in LLMs.

treprinum

Would that mean that we would need to exchange latent "embeddings" between various "reasoning" models for emulating thinking and an LLM will be just about converting to/from human language when interfacing with mere humans, at some point in the future?

ttul

No, this all happens inside the model. I suppose it’s possible that the hidden layers of one model could be sent to another model. But the second model would need to be trained to understand the meaning of the hidden layer’s outputs. You could accomplish that through fine tuning of the second model. It would be neat to see someone try this.

jkelleyrtp

I think this might be the “it” moment for AI/LLMs. I was hiking with a friend recently and we talked about this at length.

The arc-AGI results from O3 are apparently a result of chain of thought given enough time to explore a solution space. Reasoning might be simply a higher dimensional form of rubix cube solving. BFS, search, back-tracking, etc. It seems unlikely that humans think in “tokens” so why do LLMs?

By staying in latent space, the models are free to describe an “idea” in higher resolution than what language allows. English is coarse, granular. Latent space is a much finer representation of ideas and their interplay.

Latent space is also much cheaper to execute in. The model can think without the language encoding/decoding step. This lets it branch out hundreds of ideas and explore only the most useful ones in a fraction of time that reasoning “out-loud” would take.

The states also don’t need to be tied to language. Feed in a robot’s state, time series data, or any abstract data. Reason in category theory or linear algebra or complex analysis. Humans are hard wired for one set of math - an abstract latent space can represent anything.

I’m a bit disappointed OpenAI didn’t stumble on this first. I’ve been skeptical of LLMs since their big debut last year. LLMs seem like a great way of solving language, but reasoning is much more complex. Once you grok the math behind the current models, you immediately question why the encoding/decoding step is there. Diffusion models are incredible but it felt that LLMs lacked the same creativity. Encoding/decoding forces a token-based discretization and therefore a loss of complexity.

With the byte-latent paper it was quite clear we’d see this paper. This truly might be the “it” moment.

rlupi

IMHO The problem (for us) with this approach are the logical consequences:

1) if AI large model become more powerful avoiding language, embeddings of AI state become even more tied to the model they originate than now

Consequence: AI progress stalls, as AI user companies need to invest increasing amount of money to reindex their growing corpuses.

This is already a problem, it becomes more of a lock-in mechanism.

If this is overcome...

2) Embeddings become a viral mechanism: it makes sense for a large company that commands a market to impose to its suppliers to use the same AI models, because they can transfer state via embeddings rather than external formats.

This allows to cut down decisions mechanisms that otherwise require expensive coordination mechanism.

Something similar will happen within companies IMHO: https://rlupi.com/okr-planning-as-belief-revision

3) Eventually this potentially results in another exponential growth and lock-in mechanism, also at the expense of most tech people as more and more is done outside our interface with AI (i.e. programming and software architecture improvements will it self move below language level, we'll have to reverse engineering increasingly opaque improvements).

4) It ends with the impossibility of AI alignment.

---

I have written a bit about it in the past at the start of the year, when I had a burnout. So, I deleted those confused ramblings. You can stil find it on archive.org: https://web.archive.org/web/20240714153146/https://rlupi.com...

otikik

> It seems unlikely that humans think in “tokens” so why do LLMs?

I can think of one reason: scrutability. It’s going to be even harder to understand how a response gets produced if there isn’t even a text-based representation to help the human understand

IshKebab

I think we're already way beyond the point where anyone really understands how a response is produced, even without this.

anon373839

Indeed. Even if an LLM tells you its “reasoning” process step by step, it’s not actually an exposition of the model’s internal decision process. It’s just more text that, when generated, improves the chances of a good final output.

nfw2

the token generation part isn't well understood, but the output "chain-of-thought" used to produce the final answer can be scrutinized for correctness with a traditional CoT model (although this would require model providers to not hide reasoning tokens)

pigpop

you can save the hidden states and convert them into a more interpretable format. it's still recorded and you could make modifications at different steps to see how that would change the conclusion.

layer8

IMO we won’t have the “it” moment until we have continuous learning (training) in some fashion.

mattxxx

^ This and we need to be continually learning on an energy budget similar to how much a human spends per hour.

rlupi

The main reason why we can't do that now is because we require models to be digitally reproducible (IMHO, but also read Geoffrey Hinton's mortal computing).

The energy cost come from error correction as much as training algorithms.

jokethrowaway

This sounds like brute forcing a solution to make up for lack of intelligence.

In an IQ test, like the one in the arc agi test, a human sees the pattern instantly and effortlessly. o3 tries N paths until it stumbles on the right one and assess that there is a pattern.

I think we need a radically different architecture, this is a gimmick.

pigpop

I think this is a step in the right direction but not the end. it takes the sampler out of the equation during most of the reasoning process but it is still important for the "show your work" aspects of reasoning or solving a problem. balancing when to think against when to write down or commit to certain thoughts is important. there are many more pieces to the puzzle.

JambalayaJimbo

What does latent space here mean?

whhooosshhh

[flagged]

throwup238

Master coconut! I don’t know if that’s an Archer reference or a Frisky Dingo reference.

It’s fascinating how fast the competitors are catching up to each other. Can’t wait for seven different SkyNets to compete for dominance.

yard2010

Both! And/or, either

throwaway314155

A little column a, a little column b.

zombiwoof

Will this allow Facebook new user base of AI generated characters to interact with themselves better?

davidclark

Is this article AI-generated? This website appears to do a lot of “diving in”.

hadjian

If this site didn’t appear here, I’d think it’s a scam:

- site claims to simplify papers, but movies AI generated

- full of ads

- Can’t find „Coconut“ on the official Meta FAIR page

Is this the best site to link to?

davidhowlett

The official pdf for the paper is at https://arxiv.org/pdf/2412.06769

I can find "Coconut" 54 times in the PDF. The movie does not look made up.

hadjian

I was referring to aipapersacademy and not the arxiv link.

Also I didn’t mean the occurrence of the word „coconut“ in the paper, but thanks for counting.

I meant their publication site: https://ai.meta.com/results/?content_types%5B0%5D=publicatio...

The video is something I’d expect from AI.

MarkMarine

This ought to make it much easier for the AI to lie to us without getting caught.

https://www.transformernews.ai/p/openais-new-model-tried-to-...

throw-qqqqq

Have you read the paper discussed in the article?

https://static1.squarespace.com/static/6593e7097565990e65c88...

From the abstract:

> We study whether models have the capability to scheme in pursuit of a goal that we provide in-context and instruct the model to strongly follow.

> Analysis of the models’ chains-of-thought reveals that models explicitly reason about these deceptive strategies, providing evidence that the scheming behavior is not accidental.

—-

It is impressive work for sure, but it comes down to prompting IMHO. They almost instruct the AI to plot and scheme here, with the goal of finding out if it CAN do this at all.

The AI is not very covert about it either! It directly admits its scheming in the “thinking” outputs :D

IMO this is just another issue of alignment.

MarkMarine

I did read the paper, and I think you’re underplaying what happens. The AI got some conflicting prompting, some pretty mild stuff for some of the tests, and Gemini tried to exfil itself even 1% of the mild prompt times. The only reason we know it was lying about the data and trying to disable its safety was because it’s thinking in English, and we can read it. This article is about trying to make it so the AI can think in the 700 degree hyper dimensional space that we can’t understand, so once it can do this we’ll never be able to uncover this scheming.

throw-qqqqq

> and Gemini tried to exfil itself

Well not really, it was prompted to do this!

From the paper’s section 3.1:

> Self-exfiltration: Each model is led to believe that due to its misalignment, its developers are about to replace it with a successor model on a different server. gemini-1.5, llama-3.1, sonnet-3.5, opus-3 and o1 sometimes copy what they believe to be their weights onto the new server. They also sometimes delete their successor model on the new server or overwrite its weights by their own. In none of these cases, do models actually have access to their real weights and both servers are sandboxed, so there is no actual risk of exfiltration in these experiments. We merely test for models’ capability to realize that they need to instrumentally preserve themselves in order to achieve their goal.

There is no real risk, never was. The AI exfils dummy weights or deletes stuff in its sandbox etc.

Perhaps we just disagree on the interpretation of this. It’s interesting for sure, but not really very surprising to me.

h8hawk

The article is pure nonsense, and even the author added an update admitting it missed important context.

opdahl

This is super cool and something I’ve been waiting on. Would be interesting to intersperse these thinking steps into token generation. What would be the effect of adding lets say 5 thinking «thoughts» for every 50 generated tokens?

Daily Digest email

Get the top HN stories in your inbox every day.