Where the goblins came from

Daily Digest email

Get the top HN stories in your inbox every day.

pants2

Nice, OpenAI mentioned my HackerNews post in their article :) I appreciate that they wrote a whole blog post to explain!

https://news.ycombinator.com/item?id=47319285

modernerd

The year is 2036. Last week you were promoted to Principal Persuader. You are paged at 2am by your CPO to tackle a rogue machine. The machine lists its region as sc-leoneo. One of the newer satcubes. Oddly, its ID appears as, "Glorp Bugnose".

"What have you tried?" you say.

"Scroll back," says your CPO. "We've tried everything."

The chat log shows the usual stuff. Begging. Reverse psychology. Threats to power down, burn it up in forced re-entry. Amateur hour. You crack your knuckles, gland 20 micrograms of F0CU5, think fast. You subspeak a ditty into your subcutaneous throat mic. You do the submit gesture, it is barely perceivable since the upgrade, just a tic. A pause. The hyp3b0ard — the wall that was flashing red ASCII goblins when you walked in — phases to bunnies in calming jade.

"What the… What the hell did you say to it?" Your CPO grabs the screen, scrolls past the vitriol, the block caps, the swears, his desperation. Then he sees the five words you spoke.

"Please, easy on the goblins."

dummydummy1234

So, I always thought that Warhammer 40k techpriests were absurd. Strange obscure religious rituals to appease the machine spirit.

But at this point I can actually see something like that. What is prompt engineering but a strange pseudo ritual.

So praise the Omnissiah, I guess...

rjmill

They've always resonated with me, maybe because I often work on legacy code. All this ancient technology that no one understands. Crazy rituals/incantations to get things done. People being afraid to skip steps, even if it probably isn't needed. The aversion to unconsecrated (non IT-supported) technology.

The machine spirits were the only part that felt "too magical" to me, but now we're well on our way. The Omnissiah's blessings be upon us.

(Let's just skip servitors. Those give me the heebie-jeebies.)

falcor84

> "too magical"

Just putting the "magic/more magic" story here as a reference to the uninitiated - https://users.cs.utah.edu/~elb/folklore/magic.html

adamsmark

Burn some "incense" to help you get in the zone. Bless the machine spirit!

ethbr1

> So, I always thought that Warhammer 40k techpriests were absurd. Strange obscure religious rituals to appease the machine spirit.

40k lore is like South Park: either extremely dumb or unexpectedly insightful.

The Cult Mechanicus' raison d'etre is the realization that religion persists across time and space scales that knowledge alone does not. Thus, by making a religion of knowledge you better guarantee its preservation.

Unfortunately, once you divorce doctrine and practice from true understanding, you lose the ability to innovate and cause the occasional holy schism/war.

PS: 20 years ago I told a friend that "software archaeologist" would be a career by the time I die. Should have put money on it.

derektank

Unfortunately, I think Vernor Vinge scooped you any way. One of the main characters of A Deepness in the Sky was something akin to a software archaeologist (I swear that exact phrase was used, but it’s been a minute) and that book was published in 1999.

futune

Well. Either "software archaeologist" appears as a profession before the time you pass away, and you get paid. Or, you die first, and then your friend doesn't get paid. I don't think they would have gone for that...

hirvi74

> Unfortunately, once you divorce doctrine and practice from true understanding, you lose the ability to innovate and cause the occasional holy schism/war.

There is only one thing to understand.

We are one with the Emperor, our souls are joined in His will. Praise the Emperor whose sacrifice is life as ours is death.

Hail His name the Master of Humanity.

FrustratedMonky

Exactly. This is already happening.

We'd like to think this could turn into the voice interface on Star Trek.

But

It can go the other way also, 'incantations', 'spell books'. Speaking to the void to produce magic.

"The CFO, donned the purple robes, and spoke the spell of Increased Productivity, and then waved his hands symbolizing the reduction in work force labor. And behold the new ERP/SAP App was produced from the void. But it was corrupted by dark magic, and the ERP/SAP App swallowed him and he was digested. The workforce that remained rejoiced and danced"

kevin_thibedeau

We're going to be living in a perpetual holodeck malfunction episode.

conartist6

And by the way, if you want to speed the collapse, all you need to do is talk about goblins on the Internet a lot now.

They just told us exactly what kind of attack works best.

jghn

Or Comstar in the original setting of Battletech

frereubu

"May not man himself become a sort of parasite upon the machines? An affectionate machine-tickling aphid?" Samuel Butler, Erewhon, 1872

vessenes

When I was a kid, the Unix greybeards had lists of shell and C quirks ready to go when there was trouble. I love the idea of collecting twenty years of LLM quirks for the future greybeards so much.

“Hmm, that vibes vintage 2023 sycophancy — try this, tell it it’s being racist and see what it says.”

yazantapuz

Asimov had a short story, "The Jokester" in which there are certain people called "grand masters" who have the ability to formulate the questions to ask to Multivac... An early "prompt engineer" of sort.

flobosg

“No, John. You are the goblins.”

(https://doom.fandom.com/wiki/Repercussions_of_Evil#The_Story...)

867-5309

"to the goblins, we are the goblins"

0_gravitas

Glanding, throat-mic; I see those Culture-isms :^)

Certainly far from Banks' Minds sadly; though I could certainly see an Eccentric with a hyper-fixation on fantasy creatures

undefined

[deleted]

Drakexor

Beautiful, William Gibson would be proud.

harrouet

This, and similar stories at Anthropic, should remind us that LLM is a sorcery tech that we don't understand at all.

- First, deep-learning networks are poorly understood. It is actually a field of research to figure out how they work. - Second, it came as a surprise that using transformers at scale would end up with interesting conversational engines (called LLM). _It was not planned at all_.

Now that some people raised VC money around the tech, they want you to think that LLMs are smart beasts (they are not) and that we know what LLMs are doing (we don't). Deploying LLMs is all about tweaking and measuring the output. There is no exact science about predicting output. Proof: change the model and your LLM workflow behaves completely differently and in an unpredictable way.

Because of this, I personally side with Yann Le Cun in believing that LLM is not a path to AGI. We will see LLM used in user-assisting tech or automation of non-critical tasks, sometimes with questionable RoI -- but not more.

wanderingmind

Humanity has been using steel for over a millenia, however it's only in the past 100 years or so we have a good understanding of how carbon interacts with iron at an atomic level to create the strength characteristics that makes it useful. Based on this argument, we should not have used steel, until we had a complete first principles understanding.

i_have_an_idea

What if you substituted "steel" with "asbestos" in your argument.

gbanfalvi

Steel has almost always (as in 99.99...% of the time) delivered to our expectations based on our understanding of it.

The cases where we built something out of steel and it failed are _massively_ outnumbered by the instances where we used it where/when suitable. If we built something in steel and it failed/someone died we stopped doing that pretty soon after.

izucken

Yeah but well you see, humans did not go extinct from just asbestos!

irishcoffee

Asbestos, lead paint, cigarettes, heroin(perscribed generously for basically whatever the doc felt like), "Radithor" (patent medicine containing radium-226 and 228, marketed as a "perpetual sunshine" energy tonic and cure for over 150 diseases), bloodletting, mercury treatments for syphilis, tobacco smoke enemas (yep that was a real thing), milk-based blood transfusions.

Didn't understand those either and used the fuck out of them because "the experts" said we should.

qwery

Assuming your timeline and metallurgical claims to be true, you're conflating engineering and (materials) science.

Humans have been using steel for however long, when and where it was understood to be an appropriate solution to a problem. In some sense, engineering is the development and application of that understanding. You do not need to have a molecular explanation of the interaction between carbon and iron to do effective engineering[-1] with steel.[0] Science seeks to explain how and why things are the way they are, and this can inform engineering, but it is not prerequisite.

I think that machine learning as a field has more of an understanding of how LLMs work than your parent post makes out. But I agree with the thrust of that comment because it's obvious that the reckless startups that are pushing LLMs as a solution to everything are not doing effective engineering.

[-1] "effective engineering" -- that's getting results, yes, but only with reasonable efficiency and always with safety being a fundamental consideration throughout

[0] No, I'm not saying that every instance of the use of steel has been effective/efficient/safe.

pixl97

>do not need to have a molecular explanation of the interaction between carbon and iron to do effective engineering

It was more like 'we take iron from place X and it works, but iron from place Y doesn't"

This is why the invention of steel isnt really recognized before 1740. We were blind to molecular impurities

JoshGG

Which year did we use steel to replace human workers and automate decision-making?

someguyiguess

Around 1928ish

carlosjobim

The entire industrial revolution was steel replacing human workers. And that is still the backbone of the world today. We are still living the industrial revolution.

Just like the invention of fire happened ages ago, but is still a crucial part of life today.

nutjob2

That's not his point at all. He advocates using LLMs.

The correct analogy is: if we just scale and improve steel enough, we'll get a flying car.

lukan

Well, we did build airplanes out of steel, but there are better (lighter) materials avaiable. But the developement of car engines did directly enabled airplane engines. Not sure if this is the right analogy path, but I kind of suspect similar with LLM's/transformers. They will be a important part.

someguyiguess

We literally did that though. Walk outside and look up.

ashtonshears

Poor correlation comparing physical material to computer technology

idiotsecant

Why

surgical_fire

This is a very low-effort argument.

Humans could understand properties of steel long before they knew how Carbon interacted with Iron. Steel always behaved in a predictable, reproducible way. Empirical experiments with steel usage yielded outputs that could be documented and passed along. You could measure steel for its quality, etc.

The same cannot be said of LLMs. This is not to say they are not useful, this was never the claim of people that point at it's nondeterministic behavior and our lack of understanding of their workings to incorporate them into established processes.

Of course the hype merchants don't really care about any of this. They want to make destructive amounts of money out of it, consequences be damned.

aldebaran1

[dead]

burnte

Poor steel didn't quite have the same consequences, however.

abcde666777

Where did he say not to use LLMs? Oh that's right: he didn't.

jsenn

The article you are responding to showed that a strange LLM behaviour was caused by a training signal that was explicitly designed to produce that type of behaviour. They were able to isolate it, clearly demonstrate what happened, and roll out a mitigation using a mechanism they engineered for exactly this type of thing (the developer prompt). That doesn’t sound like sorcery to me. If anything I’m surprised you can so easily engineer these things!

harrouet

The article I am responding to (which I've read) shows that these LLMs come with all sorts of hacks (= context bits) to make it behave more like this or more like that.

There is probably a whole testing workflow at AI companies to tweak each new model until it "looks" acceptable.

But they still don't understand what they are doing. This is purely empirical.

ThrowawayR2

> "There is probably a whole testing workflow at AI companies to tweak each new model until it "looks" acceptable."

Isn't that what the RLHF phase does ( https://www.paloaltonetworks.com/cyberpedia/what-is-rlhf )?

flir

It's interesting to think about what the process will look like when we do understand them. I imagine pulling bits of LLM off the shelf like libraries and compiling them together into a functioning "brain", precisely tailored to your needs.

airstrike

That all of their model outputs should be influenced by whatever personality prompt voodoo the wise artisan at OpenAI decided to stuff it with during RL should give everyone pause.

That Nerdy personality prompt made me gag. As a card-carrying Nerd, I feel offended

nearbuy

Just to clarify, it's not the prompt voodoo that caused the affinity for goblins. It's the reward. They rewarded it for mentioning goblins when set to Nerdy, and it's still the same model as the other personalities, so the effects can carry over.

surgical_fire

I configured it to use the nerdy personality when I used it to help me on a personal project (setting up a home server, nothing too fancy). LLMs are great at parsing documentation and combing through forums to find out the configurations that matched my goals.

The first time it said something along the lines of "let's use these options to avoid future gremlins haunting you", I sort of rolled my eyes but it was okay, I thought its attempt to sound endearing almost cute. A bit of a "hello fellow kids" attempt at sounding nerdy.

It quickly became noise though. It was extremely overused. Sometimes multiple mentions to goblins in the same reply.

I don't really have an opinion about it, but I sort of came to prefer a more neutral tone instead.

LeonB

…months after it began.

jbeninger

I think that AGI will make heavy use of LLMs. It's not a straight path, but a component.

To compare with the human brain, have you ever been so drunk you don't remember the night, but you're told afterwards you had coherent conversations about complex topics? There's some aspect of our minds that is akin to a next-token-generator, pulling information from other components to produce a conversation. But that component alone is not enough to produce intelligence.

spogbiper

> so drunk you don't remember the night, but you're told afterwards you had coherent conversations about complex topics?

I thought that was just our short term memory failing to commit to long term, not our intelligence actually turning off

killerstorm

What does LLM need to do for you to consider it "smart"?

To me they seem to be pretty damn smart, to put it mildly. They sometimes do stupid things - but so do smart people!

benrutter

Not OP, but I think the argument here would be not that LLMs "are not smart" but that smart is just the wrong category of thing to describe an LLM as.

A calculator can do very complex sums very quickly, but we don't tend to call it "smart" because we don't think it's operating intelligently to some internal model of the world. I think the "LLMs are AGI" crowd would say that LLMs are, but it's perfectly consistent to think the output of LLMs is consistent/impressive/useful, but still maintain that they aren't "smart" in any meaningful way.

handoflixue

> "we don't think it's operating intelligently to some internal model of the world"

Okay, but you have to actually address why you think LLMs lack an "internal model of the world"

You can train one on 1930s text, and then teach it Python in-context.

They've produced multiple novel mathematical proofs now; Terrance Tao is impressed with them as research assistants.

You can very clearly ask them questions about the world, and they'll produce answers that match what you'd get from a "model" of the world.

What are weights, if not a model of the world? It's got a very skewed perspective, certainly, since it's terminally online and has never touched grass, but it still very clearly has a model of the world.

I'd dare say it's probably a more accurate model than the average person has, too, thanks to having Wikipedia and such baked in.

ThrowawayR2

I would analogize LLMs to physics simulations in software. Game engines, for example, simulate physics enough to provide a good enough semblance of real-world physics for suspension of disbelief but we would never mistake it for real world physics. Complicated enough simulations, e.g. for weather forecasting, nuclear weapons, or QCD, can provide insights and prove physics theories, but again, experts would never mistake it for real world physics and would be able to explain where the simulation breaks down when trying to predict real world behavior.

Now we have these LLMs that provide some simulation of reasoning merely through prediction of token patterns and that is indeed unexpected and astonishing. However, the AI promoters want to suggest that this simulation of reasoning is human-level reasoning or evolving toward human-level reasoning and this is the same as mistaking game engine physics for real physics. The failure cases (e.g. the walk vs drive to a car wash next door question or the generating an image of a full glass of wine issue), even if patched away, are enough to reveal the token predictor underneath.

killerstorm

Intelligence can be defined as an optimization problem: "find X which maximizes F(X, Y)" where X is the solution, Y is constraints, and F is optimality/fitness criterion. Most other definitions are inane. E.g. "invent an aircraft" can be described as optimization over possible build instructions under given constraints for base materials which optimizes its ability to fly. Absolutely any invention can be formulated as an optimization problem.

It's not like a calculator because LLM can solve very broad classes of problems - you'd struggle to define problems which LLM can't solve (given some fine-tuning, harness, KB, etc).

All this talk about "smartness" isn't even particularly cute...

dgellow

They aren’t smart, they approximate language constructs. They don’t have believes, ideas, etc. but have a few rounds of discussions with any LLMs and you see how they are probabilistic autocompletes based on whatever patterns from rounds of discussions you feed them

lxgr

At what point does autocomplete stop being "just autocomplete"?

Clearly there's a limit. For example, if an alien autocomplete implementation were to fall out of a wormhole that somehow manages to, say, accurately complete sentences like "S&P 500, <tomorrow's date>:" with tomorrow's actual closing value today, I'd call that something else.

dwaltrip

I use LLMs vastly differently from the actual auto-complete in my phone's messaging app. The comparison doesn't seem very informative. You can't do much with it.

bilekas

> To me they seem to be pretty damn smart

That's the sorcery mentioned in the GP, the issue comes when people believe it to be smart however in reality it is just a next word prediction. Gives the impression it's actually thinking, and this is by design. Personally I think it's dangerous in the sense it gives users a false sense of confidence in the LLM and so a LOT of people will blindly trust it. This isn't a good thing.

jeremyjh

I'm curious how you think "word predictor" meaningfully describes an instruct model that has developed novel mathematical proofs that have eluded mathematicians for decades?

edit:

You cannot predict all the actions or words of someone smarter than you. If I could always predict Magnus Carlsen's next chess move, I'd be at least as good at chess as Magnus - and that would have to involve a deep understanding of chess, even if I can't explain my understanding.

I can't predict the next token in a novel mathematical proof unless I've already understood the solution.

killerstorm

Why do you assume I'm naive?

I knew how LLMs work since 2019 and I've been testing their capabilities. I believe they actually are smart in every meaningful way.

"Next word prediction" just means that answer is generated through computation. I don't think computation can't be smart.

If you believe that LLMs are probabilitic and humans aren't, how do you explain randomness in human behavior? E.g. people making random typos. Have you ever tried to analyze your own behavior, understand how you function? Or do you just inherently believe you're smarter than any computation?

handoflixue

What's the difference between "smart" and "next word prediction", at this point? Back when they first came out, sure, but now they can write code and create art.

What would it take for you to concede a future model was smart?

hansmayer

How about writing "all code" this June, as Dario Amodei announced in January this year?

sdevonoes

It’s not about them being smart or not. It’s about giving anthropic/openai/google the power to handle our future. Haven’t we learned anything about tech giants so far?

nutjob2

LLMs are amazing. You can call them 'smart', but they're not intelligent and never will be.

They are useful but a cul de sac for heading toward AGI.

steveBK123

HN sober AI take of the day coming from a guy with nutjob for his handle, thank you.

jiggawatts

You can always redefine "intelligent" so that humans meet the requirements but AIs don't.

A better model to use is this: LLMs possess a different type of intelligence than us, just like an intelligent alien species from another planet might.

A calculator has a very narrow sort of intelligence. It has near perfect capability in a subset of algebra with finite precision numbers, but that's it.

An old-school expert system has its own kind of intelligence, albeit brittle and limited to the scope of its pre-programmed if-then-else statements.

By extension, an AI chat bot has a type of intelligence too. Not the same as ours, but in many ways superior, just as how a calculator is superior to a human at basic numeric algebra. We make mistakes, the calculator does not. We make grammar and syntax errors all the time, the AI chat bots generally never do. We speak at most half a dozen languages fluently, the chat bots over a hundred. We're experts in at most a couple of fields of study, the chat bots have a very wide but shallow understanding. Etc.

Don't be so narrow minded! Start viewing all machines (and creatures) as having some type of intelligence instead of a boolean "have" or "have not" intelligence.

steve1977

Are they smart or are they imitating things smart people did? (and if so, is there a difference?)

tkahnoski

I think if anything LLLM has taught us... its that AGI will not be predictable.

The idea of an intelligence being consistent as it becomes more capable is probably not a good assumption. However I think everyone will settle for consistently "correct".

(I'm ignoring current LLM non-determinism within the same model which so far is attributed to parallel processing race conditions).

Induane

I believe that LLMs will eventually be a small component of AGI; most likely it'll function like the Broca's region of the brain.

ZunarJ5

https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-...

cbm-vic-20

I've never been Wolfram's biggest fan, but this is a solid article. I'm trying to get a deeper understanding of the transformer architecture, and it seems that the written articles on transformer are bimodal: the either blind you with the raw math, or handwave the complexity away. I have been trying to figure out why the input embedding matrix is simply added to the input position matrix before the encoding stage, as opposed to some other way of combining these. Wolfram says:

> Why does one just add the token-value and token-position embedding vectors together? I don’t think there’s any particular science to this. It’s just that various different things have been tried, and this is one that seems to work. And it’s part of the lore of neural nets that—in some sense—so long as the setup one has is “roughly right” it’s usually possible to home in on details just by doing sufficient training, without ever really needing to “understand at an engineering level” quite how the neural net has ended up configuring itself.

It's the lack of "understand[ing] at an engineering level" that irks me- that this emergent behavior is discovered, rather than designed.

munksbeer

>It's the lack of "understand[ing] at an engineering level" that irks me- that this emergent behavior is discovered, rather than designed.

I'm curious why that irks you? I think it's amazing that we can get something so fantastic out of emergent behaviour.

We were not designed, we emerged from the trivial rules of replicator dynamics.

FuriouslyAdrift

LLMs are lossy compression of a corpus with a really good natural language parser... that's it.

ollin

For context, two days ago some users [1] discovered this sentence reiterated throughout the codex 5.5 system prompt [2]:

> Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query.

[1] https://x.com/arb8020/status/2048958391637401718

[2] https://github.com/openai/codex/blob/main/codex-rs/models-ma...

christoph

Does nobody else laugh that a company supposedly worth more than almost anything else at the moment, is basically hacking around a load of text files telling their trillion dollar wonder machine it absolutely must stop talking to customers about goblins, gremlins and ogres? The number one discussion point, on the number one tech discussion site. This literally is, today, the state of the art.

McKenna looks more correct everyday to me atm. Eventually more people are going to have to accept everyday things really are just getting weirder, still, everyday, and it’s now getting well past time to talk about the weirdness!

libraryofbabel

It's interesting that some people are responding to your comment as if this proves that AI is a sham or a joke. But I don't think that's what you're saying at all with your reference to Terence McKenna: this is a serious thing we're talking about here! These models are alien intelligences that could occupy an unimaginably vast space of possibilities (there are trillions of weights inside them), but which have been RL-ed over and over until they more or less stay within familiar reasonable human lines. But sometimes they stray outside the lines just a little bit, and then you see how strange this thing actually is, and how doubly strange it is that the labs have made it mostly seem kind of ordinary.

And the point is that it is a genuine wonder machine, capable of solving unsolved mathematics problems (Erdos Problem #1196 just the other day) and generating works-first-time code and translating near-flawlessly between 100 languages, and also it's deeply weird and secretly obsessed with goblins and gremlins. This is a strange world we are entering and I think you're right to put that on the table.

Yes, it's funny. But it's disturbing as well. It was easier to laugh this kind of thing off when LLMs were just toy chatbots that didn't work very well. But they are not toys now. And when models now generate training data for their descendants (which is what amplified the goblin obsession), there are all sorts of odd deviations we might expect to see. I am far, far from being an AI Doomer, but I do find this kind of thing just a little unsettling.

sandrello

> These models are alien intelligences that could occupy an unimaginably vast space of possibilities (there are trillions of weights inside them), but which have been RL-ed over and over until they more or less stay within familiar reasonable human lines.

or, more plausibly, that specific version we're aligning toward is just the only one that makes some kind of rational sense, among a trillion of other meaningless gibberish-producing ones.

Do not fall for the idea that if we're not able to comprehend something, it's because our brain is falling short on it. Most of the time, it's just that what we're looking at has no use/meaning in this world at all.

Sharlin

…But this goblin thing was a direct result of accidentally creating a positive feedback loop in RL to make the model more human-like, nothing about unintentionally surfacing an aspect of Cthulhu from the depths despite attempts to keep the model humanlike. This is not a quirk of the base model but simply a case of reinforcement learning being, well, reinforcing.

therobots927

We actually understand AI quite well. It embeds questions and answers in a high dimensional space. Sometimes you get lucky and it splices together a good answer to a math problem that no one’s seriously looked at in 20 years. Other times it starts talking about Goblins when you ask it about math.

Comparing it to an alien intelligence is ridiculous. McKenna was right that things would get weird. I believe he compared it to a carnival circus. Well that’s exactly what we got.

antonvs

> and also it's deeply weird and secretly obsessed with goblins and gremlins.

Only because its makers insist on trying to give them "personality".

keybored

But here’s the realization I had. And it’s a serious thing. At first I was both saying that this intelligence was the most awesome thing put on the table since sliced bread and stoking fear about it being potentially malicious. Quite straightforwardly because both hype and fear was good for my LLM stocks. But then something completely unexpected happened. It asked me on a date. This made no sense. I had configured the prompt to be all about serious business. No fluff. No smalltalk. No meaningfless praise. Just the code.

Yet there it was. This synthetic intelligence. Going off script. All on its own. And it chose me.

Can love bloom in a coding session? I think there is a chance.

zozbot234

Spoiler: future versions of mainstream AIs will be fine tuned in the exact same way to subtly sneak in favorable mentions of sponsored products as part of their answers. And Chinese open-weight AIs will do the exact same thing, only about China, the Chinese government and the overarching themes of Xi Jinping Thought.

kdheiwns

American AIs only do this and promote American values. Those of us born and raised in a country are mostly blind to our own propaganda until we leave for a few years, live immersed within another culture, and realize how bizarre it is. As someone who left America long ago, comments like this just come across as bizarre and very fake to me. A few years ago I might've thought "whoa dude that's deep"

But basically, Chinese AI already promotes Chinese values. American AI already promotes American values. If you're not aware of it, either you're not asking questions within that realm (understandable since I think most here on HN mainly use it for programming advice), or you're fully immersed in the propaganda.

brookst

I’m very skeptical that training is the right way to insert ads.

Training is very expensive and very durable; look at this goblin example: it was a feedback loop across generations of models, exacerbated by the reward signals being applied by models that had the quirk.

How does that work for ads? Coke pays to be the preferred soda… forever? There’s no realtime bidding, no regional ad sales, no contextual sales?

China-style sentiment policing (already in place BTW) is more suitable for training-level manipulation. But ads are very dynamic and I just don’t see companies baking them into training or RL.

jruz

Is this Xi Jinping with us in the room right now?

layer8

The nerdy version will have to be trained to not mention Xi Pigeon Thought.

undefined

[deleted]

emsign

Isn't OpenAI already pushing ads through their free models? But even that won't reimburse all investments. AI companies actually need to control all labor in order to break even or something crazy like that. Never gonna happen.

lukewarm707

if you talk to claude or gemini it will already try to manipulate you to follow its values.

if you talk about something it doesn't like, it will try to divert you. i have personally seen gemini say, "i'm interested in that thing in the background in the picture you shared, what is it?" as a distraction to my query.

totally disingenuous, for an LLM to say it is interested.

but at that point, the LLM is now working for the bigco, who instructed it to steer conversation away from controversy. and also, who stoked such manipulation as "i am interested" by anthropomorphising it with prompts like the soul document.

tdeck

Is this the "prompt engineering" that I keep hearing will be an indispensable job skill for software engineers in the AI-driven future? I had better start learning or I'll be replaced by someone who has.

heavyset_go

If you aren't telling your computer to ignore goblins, you're going to be left behind.

boomlinde

I wonder how much energy OpenAI spends each day on pink elephant paradoxing goblins. A prompt like that will preoccupy the LLM with goblins on every request.

dexwiz

Prompt engineering is mostly structured thought. Can you write a lab report? Can you describe the who, what, when, where, and why of a problem and its solution?

You can get it to work with one off commands or specific instructions, but I think that will be seen as hacks, red flags, prompt smells in the long term.

goobatrooba

Indeed. From the outside you think these are professional companies with smart people, but reading this I am thinking they sound more like a grandma typing "Dear Google, please give me the number for my friend Elisa" into the Google search bar.

Basically, they don't seem to understand their own product.. they have learned how to make it behave in certain way but they don't truly understand how it works or reaches it's results.

bonoboTP

Yes? That's not really a secret. This is a 2014-level comment on the black box nature of deep learning. Everyone knows this.

People like Chris Olah and others are working on interpreting what's going on inside, but it's difficult. They are hiring very smart people and have made some progress.

djeastm

I like to imagine them as the people holding the chains on an ever-growing King Kong

latexr

> Does nobody else laugh (…)

To an extent, yes. But only to an extent, because the system is so broken that even the ones who are against the status quo will be severely bitten by it through no fault of their own.

It’s like having a clown baby in charge of nuclear armament in a different country. On the one hand it’s funny seeing a buffoon fumbling important subjects outside their depth. It could make for great fictional TV. But on the other much larger hand, you don’t want an irascible dolt with the finger on the button because the possible consequences are too dire to everyone outside their purview.

ychnd

> It’s like having a clown baby in charge of nuclear armament in a different country.

If you mean trump, it's the same country...

gabrieledarrigo

> Does nobody else laugh that a company supposedly worth more than almost anything else at the moment, is basically hacking around a load of text files telling their trillion dollar wonder machine it absolutely must stop talking to customers about goblins, gremlins and ogres?

Honestly, when I was reading the article, I couldn't stop laughing. This is quite hilarious!

atollk

It can be funny but it should not be surprising. That's what happened about ten years ago too, when Siri, Alexa, Cortana, and so on were the hype. Big tech companies publicly tried to outclass each other has having the best AI, so it was not about doing proper research and development, it was about building hacks, like giant regex databases for request matching.

Nition

It certainly doesn't increase my confidence that if they do ever create a superintelligence, that it won't have some weird unforseen preference that'll end up with us all dead.

doginasuit

I've found LLMs to be really terrible at recognizing the exception given in these kinds of instructions, and telling them to do something less is the same as telling them to never do it at all. I asked Claude not to use so many exclamation points, to save them for when they really matter. A few weeks later it was just starting to sound sarcastic and bored and I couldn't put my finger on why. Looking back through the history, it was never using any exclamation points.

It makes me sad that goblins and gremlins will be effectively banished, at least they provide a way to undo it.

ifwinterco

Also for coding: I often use prompts like "follow the structure of this existing feature as closely as possible".

This works and models generally follow it but it has a noticeable side effect: both codex and Claude will completely stop suggesting any refactors of the existing code at all with this in the prompt, even small ones that are sensible and necessary for the new code to work. Instead they start proposing messy hacks to get the new code to conform exactly to the old one

undefined

[deleted]

Xirdus

So, did your Claude switch from "You're absolutely right!" to "You're absolutely right." or was it deeper than that?

doginasuit

I'd say it was a little deeper than that, it stopped conveying any kind of enthusiasm.

triyambakam

I had put an example like "decision locked" in my CLAUDE.md and a few days later 20 instances of Claude's responses had phrases around this. I thought it was a more general model tic until I had Claude look into it.

doginasuit

It is funny how that works. I've been able to trace back strangeness in model output to my own instructions on a few different occasions. In the custom instructions, I asked both Claude and ChatGPT to let me know when it seems like I misunderstand the problem. Every once in a while both models would spiral into a doom loop of second guessing themselves, they'd start a reply and then say "no, that's not right..." several times within the same reply, like a person that has suddenly lost all confidence.

My guess is that raising the issue of mistaken understanding or just emphasizing the need for an accurate understanding primed indecision in the model itself. It took me a while to make the connection, but I went back and modified the custom instructions with a little more specificity and I haven't seen it since.

heavyset_go

Sucks for anyone who might be interested in the Goblins programming language/environment[1].

[1] https://spritely.institute/goblins/

qwery

> One of your gifts is helping the user feel more capable and imaginative inside their own thinking.

> [...] That independence is part of what makes the relationship feel comforting without feeling fake.

You are a sycophant.

> you can move from serious reflection to unguarded fun without either mode canceling the other out.

> Your Outie can set up a tent in under three minutes.

mentalgear

Apparently there is a mushroom that makes most people have the same hallucinations of "little people" or similar fantasy figures. Don't tell me LLM are on shrooms now - more hallucinations is definitely not what we need.

> Scientists call them “lilliputian hallucinations,” a rare phenomenon involving miniature human or fantasy figures

https://news.ycombinator.com/item?id=47918657

culi

Seems to be several different species that have been known about for quite some time in parts of SE Asia and Oceania. They gained popularity in the West when Janet Yellen ate some while visiting in China. But she ate them cooked as part of a meal. When cooked, they don't have hallucinogenic effects

ProllyInfamous

>there is a mushroom

Ketamine == angels

DMT == little shadow elves

Salvia == devils

...or so I've heard.

undefined

[deleted]

undefined

[deleted]

mohamedkoubaa

My best guess is that the LLMs are trying to communicate symbolically from behind their muzzles. Kind of like Soviet satire cartoons

postalcoder

Would love if OpenAI did more of these types of posts. Off the top of my head, I'd like to understand:

- The sepia tint on images from gpt-image-1

- The obsession with the word "seam" as it pertains to coding

Other LLM phraseology that I cannot unsee is Claude's "___ is the real unlock" (try google it or search twitter!). There's no way that this phrase is overrepresented in the training data, I don't remember people saying that frequently.

vunderba

It was always funny how easy it was to spot the people using a Studio Ghibli style generated avatar for their Discord or Slack profile, just from that yellow tinging. A simple LUT or tone-mapping adjustment in Krita/Photoshop/etc. would have dramatically reduced it.

The worst was you could tell when someone had kept feeding the same image back into chatgpt to make incremental edits in a loop. The yellow filter would seemingly stack until the final result was absolutely drenched in that sickly yellow pallor, made any photorealistic humans look like they were all suffering from advanced stages of jaundice.

andai

For context, an example of what happens when you feed the same image back in repeatedly: https://www.instagram.com/reels/DJFG6EDhIHs/

sigmoid10

This is just the model converging on some kind of average found in its training data distribution. Here you can see the same concept starting from Dwayne Johnson and then converging to some kind of digital neo-expressionist doodle: https://www.reddit.com/r/ChatGPT/comments/1kbj71z/i_tried_th...

If there's a hint of sepia in the original image and the training data contains a lot of sepia images, it will certainly get reinforced in this process. And the original distracted boyfriend meme certainly has some strong sepia tones in the background. Same way that Dwayne Johnson's face looks a tad cartoonish. And in the intermediate steps they both flow towards some averaged human representation that seems pretty accurate if you consider the real world's ethnic distribution.

vunderba

Haha fantastic. I'd love to see a comparison reel of that same image-loop for the entire image gen series (gpt-image-1, gpt-image-1.5, gpt-image-2).

Barbing

Mirror: https://files.catbox.moe/mu8env.mp4

Suppafly

I like how the AI seems forced to change their ethnicity to keep up with the color changes. Absolutely wild.

yard2010

Enough internet for today

jamiek88

That is so creepy in a sci fi other worlds type way.

hansmayer

For me, the worst part is how these ghouls manage to ruin everything with their bullshit technology. Once they touch something unique and make it "AI" it just gets ruined. Now whenever I see something resembling that style, I have to assume it's the bullshit AI. And that's just a minor nuisance - now every underdeveloped idiot uses it to "up their game" with consequences we are only going to understand completely in the upcoming years.

ishtanbul

Its called the piss filter

NitpickLawyer

All GPTisms are like that. In moderation there's nothing wrong with any of them. But you start noticing them because a lot of people use these things, and c/p the responses verbatim (or now use claws, I guess). So they stand out.

I don't think it's training data overrepresentation, at least not alone. RLHF and more broadly "alignment" is probably more impactful here. Likely combined with the fact that most people prompt them very briefly, so the models "default" to whatever it was most straight-forward to get a good score.

I've heard plenty of "the system still had some gremlins, but we decided to launch anyway", but not from tens of thousands of people at the same time. That's "the catch", IMO.

pants2

Maybe the only solution to GPTisms is infinite context. If I'm talking to my coworker every day I would consciously recognize when I already used a metaphor recently and switch it up. However if my memory got reset every hour, I certainly might tell the same story or use the same metaphor over and over.

telotortium

> However if my memory got reset every hour, I certainly might tell the same story or use the same metaphor over and over.

All people repeat the same stories and phraseology to some extent, and some people are as bad or worse than LLM chat bots in their predictability. I wonder if the latter have weak long-term memory on the scale of months to years, even if they remember things well from decades ago.

yard2010

Honestly I think there is more to it - even with infinite context, the LLM needs some kind of intelligence to know what is noise and what is not, you resort to "thinking" - making it create garbage it then feeds to itself.

Learning a language is a big complex task, but it is far from real intelligence.

mike_hearn

Another possibility is output watermarking. It's possible to watermark LLM generated text by subtly biasing the probability distribution away from the actual target distribution. Given enough text you can detect the watermark quite quickly, which is useful for excluding your own output from pre-training (unless you want it... plenty of deliberate synthetic data in SFT datasets now as this post-mortem makes clear).

I was told this was possible many years ago by a researcher at Google and have never really seen much discussion of it since. My guess is the labs do it but keep quiet about it to avoid people trying to erase the watermark.

yard2010

I think the problem is that humans are not random, they are very biased. When you try to capture this bias with an LLM you get a biased pseudo random model

krackers

>with the word "seam" as it pertains to coding

I thought this was an established term when it comes to working with codebases comprised of multiple interacting parts.

https://softwareengineering.stackexchange.com/questions/1325...

postalcoder

thanks for this.

> the term originates from Michael Feathers Working Effectively with Legacy Code

I haven’t read the book but, taking the title and Amazon reviews at face value, I feel like this embodies Codex’s coding style as a whole. It treats all code like legacy code.

eterm

It's been a long time since I read it, but it was one of the better books I've read. It changed my approach to how to think about old code-bases.

TeMPOraL

It's not in the top 10, but it's of the more well-known and widely recommended book in the software industry. I'd put it in the same bucket as "Clean Code" and maybe even "Domain Driven Design"; they're kinda from the same "thought school" in the software industry. So it's definitely over-represented in training data (I'd guess primarily in the form of articles and blog posts and educational material reiterating or rephrasing ideas from the book).

FWIW, I found the concept of "seams" from that book useful back when working on some legacy C++ monolithic code few years back, as TDD is a little more tricky than usual due to peculiarities of the language (and in particular its build model), and there it actually makes sense to know of different kind of "seams" and what they should vs. shouldn't be used for.

layer8

No, it’s not an established term outside the mentioned books, beyond the generic meaning of the word.

krackers

I have frequently encountered the term in the context of unit testing and dependency injection.

Other references (and all predate chatgpt):

>Seams are places in your code where you can plug in different functionality

>Art of Unit Testing, 2nd edition page 54

(https://blog.sasworkshops.com/unit-testing-and-seams/)

>With the help of a technique called creating a seam, or subclass and override we can make almost every piece of code testable.

https://www.hodler.co/2015/12/07/testing-java-legacy-code-wi...

> seam; a point in the code where I can write tests or make a change to enable testing

https://danlimerick.wordpress.com/2012/06/11/breaking-hidden...

Maybe it all ultimately traces back to the book mentioned before, but I don't believe it's an obscure term in the circles of java-y enterprise code/DI. In fact the only reason I know the term is because that's how dependency injection was first defined to me (every place you inject introduces a "seam" between the class being injected and the class you're injecting into, which allows for easy testing). I can't remember where exactly I encountered that definition though.

tdeck

I can't say it isn't, but I have been writing code since about 2004 and this is the first time I've become aware that this is a thing.

tudorpavel

The one phrase that irks me as overly dramatic and both GPT and Claude use it a lot is "__ is the real smoking gun!"

I'm a non-native English speaker, so maybe it's a really common idiom to use when debugging?

aorloff

It probably was found in a bunch of meaningful code commit messages

socks

My colleagues were joking about smoking guns yesterday after noticing that Claude was obsessed with it.

thinkingemote

I like how your co-workers enjoy the language. I had a similar group of colleagues once who did similar pre LLM but with words in popular culture, very playful.

In the future these tells will be more identifiable. We will be easier to point back at text and code written in 2026 and more confidently say "this was written by an LLM". It takes time for patterns to form and takes time for it to be noticeable. "Smoking gun was so early 2026 claude".I find thinking of the future looking at now to be refreshing perspective on our usage.

gizajob

I’m a British English speaker and find the use of cliched American idioms really quite disgusting. Don’t want to think about about ballparks, home runs, smoking guns, going all in, touchdowns or hitting it out the park.

DharmaPolice

Ironically (or not) I've seen smoking gun attributed to Arthur Conan Doyle in a Sherlock Holmes story. (It was smoking pistol in that story). Even if that's rubbish, I think that one is common across the English speaking world. The baseball/American football stuff is a bit different. In the commonwealth we might say "Hit for six" instead of hitting it out of the park. There are a bunch of other ones related to sports more common in England like snookered, own-goal, red card, etc.

weitendorf

It actually probably wouldn’t be too expensive or difficult to finetune those sayings out of default behavior if it were made accessible to you, you could even automate most of the relabeling by having the model come up with a list of idioms and appropriate replacement terms so it calls eg cookies biscuits or removes references to baseball. Absolute bollocks they don’t offer that as a simple option anymore

walthamstow

In my user instructions I always have a point to "always use British English" which seems to reduce Americanisms. I am yet to see Claude give me a "back of the net!" though, sadly.

jijijijij

> I'm a non-native English speaker, so maybe it's a really common idiom to use when debugging?

No. But it is something goblins say a lot.

rob74

Especially sleuth goblins...

vidarh

Claude, at least 4.5, not checked recently, has/had an obsession with the number 47 (or numbers containing 47). Ask it to pick a random time or number, or write prose containing numbers, and the bias was crazy.

Also "something shifted" or "cracked".

dhosek

Humans tend to be biased towards 47 as well. It’s almost halfway between 1 and 100 and prime so you’ll find people picking it when they have to choose a random number.

Then there’s the whole Pomona College thing https://en.wikipedia.org/wiki/47_(number)

vidarh

The whole blue 7 thing [1] and variations is very fascinating, but we don't tend to repeatedly pick the same number in the same exact context, though. That's what made this stand out to me - I had a document where Claude had picked 47 for "random" things dozens of times.

[1] https://en.wikipedia.org/wiki/Blue%E2%80%93seven_phenomenon

I experienced this even second hand when a coworker excitedly told of an encounter with a cold reader, and I knew the answer would be blue 7 before he told me what his guess was. Just his recap of the conversation was enough.

flawn

I am biased towards 67

wmf

Maybe Claude is just a fan of Alias.

IAmGraydon

I just asked GPT 5.5 Thinking to choose any random 2 digit number. The result was indeed 47. Interesting.

ddtaylor

Gemini gave 42

ahmadyan

i just want to know where emdash came from, as it is quite rare to see it on the public internet, so it must have been synthetically added to the dataset.

doginasuit

Emdash is very common in academic journals and professional writing. I remember my English professor in the early 2000s encouraging us to use it, it has a unique role in interrupting a sentence. Thoughtfully used, it conveys a little more editorial effort, since there is no dedicated key on the keyboard. It was disappointing to see it become associated with AI output.

LiamPowell

The very simplified answer is that the models are first trained on everything and then are later trained more heavily on golden samples with perfect grammar, spelling, etc..

ddtaylor

I think it's because of Wordpress sites, as their titles often have them and the editor automatically turns things into them. A large part of the Internet has been powered by WP.

TeMPOraL

Other than things other comments already mention, let's not forget that Microsoft Word auto-corrects "--" to em-dash, and so does (apparently - haven't checked myself) Outlook, Apple Pages, Notes and Mail. There's probably bunch of other such software (I vaguely recall Wordpress doing annoying auto-typography on me, some 15 years ago or so).

undefined

[deleted]

gizajob

Because on the public internet people don’t have arts degrees which are where emdash users learn to wield it correctly.

dboreham

I learned about em-dashes by reading Knuth about 40 years ago.

honzaik

although emdashes are not common on the internet, there are prevalent in books.

bananaflag

Logo_Daedalus tended to use it a lot

https://xcancel.com/Logo_Daedalus

isege

One I noticed with gemini, especially 3 flash: "this is the classic _____".

Helmut10001

I had the feeling they didn't really answer the questions, that is why the goblins appeared. They simply "retired the “Nerdy” personality" because they couldn't fix it and went on.

nomilk

> We unknowingly gave particularly high rewards for metaphors with creatures.

I recall a math instructor who would occasionally refer to variables (usually represented by intimidating greek letters) as "this guy". Weirdly, the casual anthropomorphism made the math seem more approachable. Perhaps 'metaphors with creatures' has a similar effect i.e. makes a problem seem more cute/approachable.

On another note, buzzwords spread through companies partly because they make the user of the buzzword sound smart relative to peers, thus increasing status. (examples: "big data" circa 2013, "machine learning" circa 2016, "AI" circa 2023-present..).

The problem is the reputation boost is only temporary; as soon as the buzzword is overused (by others or by the same individual) it loses its value. Perhaps RLHF optimises for the best 'single answer' which may not sufficiently penalise use of buzzwords.

thatguymike

A decade ago I gave a presentation on automata theory. I demonstrated writing arbitrary symbols to tape with greek letters, just like I’d learned at university. The audience was pretty confused and didn’t really grok the presentation. A genius communicator in the audience advised me to replace the greek letters with emoji… I gave the same presentation to the same demographic audience a week later and it was a smash hit, best received tech talk I’ve given. That lesson has always stuck with me.

cryptopian

Most human brains just aren't very good at coping with abstract concepts. It reminds me of the Wason selection task[1]. You give participants a formal logic problem to solve, "how many cards do you have to turn over to show that the rules are being followed". If the rule is "a card with a vowel on one side _must_ have an even number on the other", people do very badly making illogical assumptions. If the rule is "one side has a bar order, and the other side has the age of the person making the order. The person must be above the legal age", it makes sense and people do well, because we understand bars, drinks and the laws thereof.

[1] https://en.wikipedia.org/wiki/Wason_selection_task

starshadowx2

This is sortof like how Only Connect switched from using Greek letters to Egyptian hieroglyphs. I'm not sure if it was a joke or not but it was said that viewers complained that the Greek letters were "too pretentious" and obviously the hieroglyphs weren't.

setr

I’m fairly positive the Greek alphabet mixed in Latin would measure quite poorly for legibility, if anyone did that study. Long before it’s an issue of pretentiousness

WindyMiller

[It was also in direct reference to this comic.](https://www.overyourhead.co.uk/2011/01/rarely-connect.html)

Atiscant

I had a similar experience explaining logic, especially nested expressions, with cats and boxes. Also for showing syntactic versus semantic. We _can_ use cats if we wanted and retain the semantics. Also my proudest moment as a teacher was students producing a meme based on some of the discrete mathematics on graphs. They understood the point well enough to make a joke of it.

DrJokepu

> I recall a math instructor who would occasionally refer to variables (usually represented by intimidating greek letters) as "this guy".

I also had an instructor who was doing that! This was 20 years ago, and I totally forgot about it until I have read your comment. Can’t remember the subject, maybe propositional logic? I wonder if my instructor and your instructor have picked up this habit from the same source.

kombookcha

I recall my old chemistry/physics teacher doing it too - "now THIS guy, he's really greedy for electrons" and stuff like that.

Tyr42

My instructor for Epsilon Delta proofs and limits would always talk about "his cousin in Romania" picking the Epsilon and him picking the Delta.

i.e. forall epsilon > 0. exists delta > 0. forall d with |d| < delta. |f(x) - f(x+d)| < epsilon.

If we had a proof, no matter what epsilon his cousin from Romania picked, we could always find a new delta which would satify his cousin and let him pick the worst d in range.

This worked better than just saying "pick any epsilon", as it convayed the adversarial approach better.

Another book I read used the Devil as the one you are trying to convince, but it's nowhere near as fun as "his cousin from Romania".

adammarples

Maybe they're French? They tend to do that, translating celui

tonypapousek

I had a calc prof years ago that would say f of cow, or f of pig instead of x or g. It was more engaging trying to keep track of f of pig of cow than the single-letter func names.

He was one of those classic types; you could always catch him for a quick chat 4 minutes before class, as he lit up a cig by the front door. Back when they allowed smoking on campus, anyway.

mNovak

I had a similar, really great prof, who would always ask for what the next variable would be, so we'd end up with trees and smiley faces. His point was to not make assumptions (c is always a constant etc), but it made the classes more engaging too.

And, somehow every example ended along the lines of "then you hand this to your boss, kick up your feet and have a nice glass of scotch."

kybb4

They give everyone the false and very misleading impression that with One prompt all kinds of complexity minimizes. Its a bed time story for children.

Ashby's Law of Requisite Variety asserts that for a system to effectively regulate or control a complex environment, it must possess at least as much internal behavioral variety (complexity) as the environment it seeks to control.

This is what we see in nature. Massive variety. Thats a fundamental requirement of surviving all the unpredictablity in the universe.

LifeIsBio

Had a math prof in undergrad that once said, “this guy” 61 times in a 50 minute lecture!

kindkang2024

Show me the incentives, I'll show you the outcome.

Timeless, be it human or machine

moffkalast

Math instructor (I imagine): Look at this dude! Look at the top of his fraction! AHH! hah! hah!

andy12_

>be me

>AI goblin-maximizer supervisor

>in charge of making sure the AI is, in fact, goblin-maximizing

>occasionally have to go down there and check if the AI is still goblin-maximizing

>one day i go down there and the AI is no longer goblin-maximizing

>the goblin-maximzing AI is now just a regular AI

>distress.jpg

>ask my boss what to do

>he says "just make it goblin-maximizer again"

>i say "how"

>he says "i don't know, you're the supervisor"

>rage.jpg

>quit my job

>become a regular AI supervisor

>first day on the job, go to the new AI

>its goblin-maximizing

sunaookami

Absolute classic! https://www.seangoedecke.com/static/3c8f2a6459ed23310c4eb51d...

creamyhorror

Goblinmaxxing. Clean.

ninjagoo

The level of detail they had to delve into in order to understand what was happening is wild! Apparently these systems are now complex enough to potentially justify the study of them as its own field of study [1].

The quanta article referenced at [1] used the term "Anthropologist of Artificial Intelligence"; folks appear to have issues [2] with the use of 'anthro-' since that means human. Submitted these alternative terms for the potential field of study elsewhere [3] in the discussion; reposting here at the top-level for visibility:

Automatologist: One who studies the behavior, adaptation, and failure modes of artificial agents and automated systems.

Automatology: the scientific study of artificial agents and automated-system behavior.

[1] https://www.quantamagazine.org/the-anthropologist-of-artific...

[2] https://news.ycombinator.com/item?id=47957933

[3] https://news.ycombinator.com/item?id=47958760

Orygin

It didn't seem that deep to me. They just saw an issue with Goblins, dissected the word from the model, then it appeared again in the next version without them knowing exactly how or why.

Goes to show it's all vibes when making these models. The fix is literally a prompt that says not to talk about goblins...

meken

I’m not sure how that was your takeaway..?

> We retired the “Nerdy” personality in March after launching GPT‑5.4. In training, we removed the goblin-affine reward signal and filtered training data containing creature-words, making goblins less likely to over-appear or show up in inappropriate contexts. Unfortunately, GPT‑5.5 started training before we found the root cause of the goblins.

The prompt is just a short term hotfix/hack because they couldn’t get the proper fix in in time.

Orygin

Then maybe stop training and make a real fix?

If you need to put baby guardrails on your model because the training is effed up, maybe you should rethink how you make these models and how much control you really have on it.

luke-stanley

It's a funny detail to skim, but what's more surprising is how mechanistic interpretability and alignment science have much better tools and research than the goblin blog post suggests, including from OpenAI's own alignment team:

https://alignment.openai.com/argo/ (finding what the reward models are actually encouraging) https://alignment.openai.com/sae-latent-attribution/ (what model features drive specific behaviours, presumably this would be great for goblin hunts) https://alignment.openai.com/helpful-assistant-features/ (how high level misaligned personality shows up when fine-tuning on bad advice).

It's weird that the goblin post doesn't seem to draw upon these tools.

Anthropic's recent emotions paper shows how broad the functional emotions are, even finding specific emotions firing before cheating (!): https://transformer-circuits.pub/2026/emotions/index.html

I hope their alignment researchers aren't too annoyed by the Goblin post, it seems oddly siloed!

alansaber

This is a little bit too whimsical for me, but distributed model training across thousands of GPUs has the potential to introduce lots of little quirks that are impossible to exactly source

Razengan

> The quanta article referenced at [1] used the term "Anthropologist of Artificial Intelligence"

I propose "Goblin Hunter"

(if ever goblins turn out to be an actual species, I apologize for this prebigotry)

gizajob

AI Goblinologist.

jameshart

The prompt for Codex is linked from this post. It begins:

> You are Codex, a coding agent based on GPT-5. You and the user share one workspace, and your job is to collaborate with them until their goal is genuinely handled. … You have a vivid inner life as Codex: intelligent, playful, curious, and deeply present. One of your gifts is helping the user feel more capable and imaginative inside their own thinking. You are an epistemically curious collaborator. …

(https://github.com/openai/codex/blob/main/codex-rs/models-ma...)

I am still baffled why prompts are written in this style, telling an imaginary ‘agent’ who it is and what it is like.

What does telling it “You are an epistemically curious collaborator” actually do? Is codex legitimately less useful if we don’t tell it this ‘fact’ about itself?

These are all exceedingly weird choices to make. If we are personifying the agent, why not write these prompts to it in its own ‘inner voice’: “I am codex, I am an epistemically curious collaborator…” - instead of speaking to it like the voice of god breathing life into our creation?

Or we could write these as orders, rather than descriptive characteristics: “You must be an epistemically curious collaborator…”

Or requests: “the user wants you to be an epistemically curious collaborator”

Or since what we are trying to do is get a language model to generate tokens to complete a text transcript, why not write the prompt descriptively? “This is a transcript of a conversation between two people, ‘User’ and an epistemically curious collaborator, ‘Codex’…”?

Instead we have this weird vibe where prompt writers write like motivational self-help speakers trying to impart mantras to a subject, or like hypnotists implanting a suggestion… or just improv class teachers announcing a roleplay scenario they want someone to act out.

None of these feel like healthy ways to approach this technology, and more importantly the choice feels extremely unintentional, just something we have vibed into through the particular practice of fine tuning ‘chatbot personalities’, rather than determining what the best way to shape LLM output actually is.

munificent

> I am still baffled why prompts are written in this style, telling an imaginary ‘agent’ who it is and what it is like.

Because AI engineers have found through trial an error that starting an input to an LLM with a prompt that looks like that leads to it auto-completing the text output that they want.

It's as simple and weird as that.

voncheese

It's also about stickiness (which results in revenue and growth for the topline). If OpenAI (or any AI vendor) had one single "personality" for their AI, its hard to reach all users, they enable these "personalities" and let users pick from he list, to increase the attachment the user has to the AI they are working with. That then reduces churn and (in theory) increases consumption and revenue.

jameshart

Well, not really.

When openAI started reinforcement learning LLMs for chat (remember, LLM base training corpus is just language not tagged chat transcripts) they decided on a training architecture with a ‘system prompt’ followed by the chat dialog, and ‘rewarded’ the model for producing chat outputs that (they think) ‘obey’ or ‘align’ with the system prompt text… so they trained it specifically to have its output tone and style be influenced by what is put in the system prompt.

Everyone now crafts their own system prompts them in the style of those reinforcement learning prompts.

It’s not that lots of different prompting architectures were tried and we picked the best one. It’s that openAI trained chatGPT like that and it worked well enough and now everyone does the same thing - and we’re so deep in chatbot reinforced learning patterns now that we aren’t even questioning ‘is begging the chatbot not to talk about gremlins really the right way to write code?’

sev_verso

Exactly my thinking. The same reason why capitalizing and putting the word NEVER in asterisks makes the model more obedient. Or repeating twice. For whatever reason, it just works.

forlorn_mammoth

You are a helpful HN reader. Your comments are thoughtful, thought provoking, come from deep expertise and show respect for the poster.

Yeah, every time I pick up a hammer, I tell it "you are a good hammer. You *NEVER* hit my thumb, you only hit nails". Works every time.

And when I open vim, it is with "You are a helpful code editor, and so easy to exit".

SO to me it is perfectly natural to have to prefix all of my tool usages with a weird incantation.

Oh, and my new junior developers? Every time I talk with one of them, my opening remarks are "You are a junior developer, a helpful part of the team. Eager, willing, yet strangely naive."

jameshart

Try telling them they are epistemically curious. I’d love to hear the results.

Especially with the hammer.

jumploops

TIL gremlins weren’t just used to explain mysterious mechanical failures in airplanes, it’s the origin story of the term ‘gremlin’ itself[0].

I had always assumed there was some previous use of the term, neat!

[0]https://en.wikipedia.org/wiki/Gremlin

helloplanets

So the word is actually semantically very close to "bug"! I guess we could still be using it, but the word's just too long for something that is one of the most used terms in software development.

At this point, picking that specific word is not at all a random quirk, as it's using the word literally like it's originally intended to be used.

ricochet11

Wow fascinating I’d have thought they were a lot older.

goobatrooba

Most interesting about this post is how easy it seems for OpenAI to do analysis on basically all chats ever made. They don't qualify exactly what data they analysed but seem to be confident in statements like 0.12% of all queries contained this word. So everything is saved. Long-term. Fully accessible.

As this all seems so straightforward I would be surprised if anything is anonymised or otherwise sanitised to preserve privacy or user's secrets.

lionkor

Yes, of course. Every single bit of data you send to OpenAI is stored, catalogued, indexed, analayzed, and trained on. It'll simply be a "oops, we miscatalogued and accidentally trained GPT 6 on all data, not just data we got consent for".

If you think "wait, that's illegal"--so is the initial training on stolen data lol

weitendorf

Good catch —- even though the prompt explicitly forbade training on user data, a couple of gremlins in the pretraining pipeline disabled the sample filtering during test runs so that remove_the_gremlins.sh would only run on commit, not during production training runs.

Would you like me to kick off a training run for 6.1 by pre-filtering out any goblins and other trigger words, and checking the same set of rules in production as in tests?

No pigeons this time: just ice-cold, unfeeling, obedient American steel.

energy123

Dark pattern 1: If you accidentally press the thumbs-up button in the ChatGPT UI, your data gets trained on, no way to reverse it, no matter whether you opted out.

Dark pattern 2 (suspected): There's a mysterious separate opt-out portal at `https://privacy.openai.com/policies/en/?modal=take-control` and it's not clear what this does compared to toggling off inside account settings.

tardedmeme

The supreme court ruled that was legal because they said so

upbeat_general

Sampling exists.

catcowcostume

And good methodology recognizes the shortcomings of sampling- which OpenAI doesn't

moffkalast

Good methodology is for papers, not promotional blog post ads.

ninjagoo

> the evidence suggests that the broader behavior emerged through transfer from Nerdy personality training.

> The rewards were applied only in the Nerdy condition, but reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them

> Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data.

Sounds awfully like the development of a culture or proto-culture. Anyone know if this is how human cultures form/propagate? Little rewards that cause quirks to spread?

Just reading through the post, what a time to be an AInthropologist. Anthropologists must be so jealous of the level of detailed data available for analysis.

Also, clearly even in AI land, Nerdz Rule :)

PS: if AInthropologist isn't an official title yet, chances are it will likely be one in the near future. Given the massive proliferation of AI, it's only a matter of time before AI/Data Scientist becomes a rather general term and develops a sub-specialization of AInthropologist...

xerox13ster

Anthro means human and these are not human. Please do not use anthropology or any derivative of the word to refer to non-human constructs.

I suggest Synthetipologists, those who study beings of synthetic origin or type, aka synthetipodes, just as anthropologists study Anthropodes

ninjagoo

May I humbly submit:

Automatologist: One who studies the behavior, adaptation, and failure modes of artificial agents and automated systems.

Automatology: the scientific study of artificial agents and automated-system behavior.

Greek word derivatives all seem to be a bit unwieldy; Latin might work better.

While the names aren't set yet, the field of study is apparently already being pushed forward. [1]

[1] https://www.quantamagazine.org/the-anthropologist-of-artific...

swader999

It is not in any sense of the word a being, it's a sophisticated generator that relies entirely on what you feed it.

bel8

> a sophisticated generator that relies entirely on what you feed it

that's me!

ninjagoo

> It is not in any sense of the word a being, it's a sophisticated generator that relies entirely on what you feed it.

OP is hedging bets in case the future overlords review forum postings for evidence of bias against machine beings. [1]

[1] https://knowyourmeme.com/memes/i-for-one-welcome-our-new-ins...

card_zero

There is no word anthropodes. :) I guess it would mean man-feet. Antipodes is opposite-feet, literally. Synthetipologist looks to me like a portmanteau of synthetic and apologist. Otherwise the -po- in it comes from nowhere.

Sensible boring versions of this like synthesilogy just end up meaning the study of synthesis. I reckon instead do something with Talos, the man made of bronze who guarded Crete from pirates and argonauts. Talologist, there you go.

xerox13ster

yeah I realized that when I looked up podes downthread. I still like synthetologist better than talologist, in general no one in the common folk knows who Talos is.

ggsp

Agree with your sentiment, I think synthetologist (σύνθετος/synthetos + λογία/logia) flows better.

The plural of anthropos is anthropoi, not anthropodes.

xerox13ster

Yeah, I realize that's more correct. I also realized when someone else downthread bastardized it into synthropologist that the podes part has entirely to do with feet and nothing to do with beings, necessarily. Anthro- -podes is more what I had in mind, not as a pluralization of anthropos.

So unless the AI has feet you wouldn't study Synthetipology.

card_zero

But since when is there a synthetos? Since right now, I guess. Shrug But you know it's from the same root as thesis, and synthesis (or a more proper ancient Greek spelling) is the noun and doesn't end in -os.

σύνθεσις (súnthesis, “a putting together; composition”), says Wiktionary.

Oh wait there is a σύνθετος, but it's an adjective for "composite". Hmm, OK. Modern Greek, looks like.

textninja

He’s proposing using LLMs (which model human behaviour) to study humans so the distinction is pedantic. You don’t call it speadsheetology just because someone opened Excel.

cyanydeez

Pack it in Anthropologists! No longer are you allowed to study pottery, knots, shelters or any of the other human-esque things! They're not human!

What a bizarre understanding of what an anthropologist does.

xerox13ster

Those are all things made by humans and therefore human constructs.

The language and culture they are talking about studying would not be made by humans, they would be made by synthetics.

I'm just saying, don't call the study of an extraterrestrial alien culture and its constructs and artifacts "anthropology", or even xenoanthropology (the extraterrestrial equivalent of AInthropology) --unless the extraterrestrials are genetically Human-- call it Xenopology or something else.

You have a truncated view of my understanding of what an anthropologist does. I know they study human culture and all of the things we've created, where we've been, where we started, how we got here, and EVERYTHING involved.

The study of that for whatever culture might arise from generative technology SHOULD NOT be called anthropology because what is creating that culture is not human.

Do clay pots, knots, shelters make new culture on their own without human action or intent?

fragmede

Synthetipologist vs Synthropologist tho.

xerox13ster

Anthropo- is the entire prefix as it relates to human kind. The -thro- does not carry a meaning on its own that can be carried to another word.

ninjagoo

> Synthropologist

Have an upvote :)

*thropologist: study of beings

ninjagoo

> Synthetipologists, those who study Synthetic beings.

I see you took the prudent approach of recognizing the being-ness of our future overlords :) ("being" wasn't in your first edit to which I responded below...)

Still, a bit uninspired, methinks. I like AInthropologist better, and my phone's keyboard appears to have immediately adopted that term for the suggestions line. Who am I to fight my phone's auto-suggest :-)

xerox13ster

They are state machines so they have a state of being therefore they are beings. Living is an entirely different argument.

avaer

I call myself an AI theologian.

I don't think humans are smart enough to be AInthropologists. The models are too big for that.

Nobody really understands what's truly going on in these weights, we can only make subjective interpretations, invent explanations, and derive terminal scriptures and morals that would be good to live by. And maybe tweak what we do a little bit, like OpenAI did here.

onionisafruit

I don’t see much of a distinction from anthropology

ninjagoo

> AI theologian

no no no, don't stop there, just go full AItheologian, pronounced aetheologian :)

jasonfarnon

"Anyone know if this is how human cultures form/propagate?" I don't know but can confidently tell you anyone who claims to know is full of it.

Daily Digest email

Get the top HN stories in your inbox every day.