Get the top HN stories in your inbox every day.
bensyverson
torben-friis
For me the issue is the lack of human explanation for mistakes. With a person, low quality comes from a source. Sometimes the source is lack of knowledge, sometimes time pressure, sometimes selfish goals.
Most importantly, those sources of errors tend to be consistent. I can trust a certain intern to be careful but ignorant, or my senior colleague with a newborn daughter to be a well of knowledge who sometimes misses obvious things due to lack of sleep.
With AI it's anyone's guess. They implement a paper in code flawlessly and make freshman level mistakes in the same run. so you have to engage in the non intuitive task of reviewing assuming total incompetence, for a machine that shows extreme competence. Sometimes.
bambax
It's not that pre-LLM era was a "golden age of quality", far form it. It's that LLMs have removed yet another tell-tale of rushed bullshit jobs.
bensyverson
Have they though?
happytoexplain
Absolutely. Our heuristics for judging human output are useless with LLMs. We can either trust it blindly, or tediously pick over every word (guess which one people do). I've watched this cause havoc over and over at my job (I work with many different teams, one at a time).
AI signatures don't mean low quality, they just mean AI. And humans do use them (I have always used the common AI signatures). And yes, humans produce good-looking garbage, but much more commonly they produce bad-looking garbage. This is all tangential to the point.
esafak
For example, science articles written in Word vs. Latex helped filter out total cranks.
undefined
manquer
It was and still is a negative filter, not a positive one. Meaning it is easy to reject work because there typos and basic factual errors, absence of them is not a good measure of quality. Typically such checks is the first pass not the only criteria.
It is valuable to have this, because it the work passes the first check then it easier to identify the actual problems. Same reason we have code quality, lint style fixed before reasoning with the actual logic being written.
mbreese
I’m also not sure I agree with the assertion that LLMs will produce a high quality (looking) report with correct time frames, lack of typos, and good looking figures. I’m just as willing to disregard human or LLM reports with obvious tells. An LLM or a person can produce work that’s shoddy or error filled. It may be getting harder to differentiate between a good or bad report, but that helps to shift the burden more onto the evaluator.
This is especially true if we start to see more of a split in usage between LLMs based on cost. High quality frontier models might produce better work at a higher cost, but there is also economic cost pressure from the bottom. And just like with human consultants or employees, you’ll pay more for higher quality work.
I’m not quite sure what I’m trying to argue here. But the idea that an LLM won’t produce a low quality report just seemed silly to me.
yarekt
You’ve missed the point of original article about the proxy for quality disappearing. LLMs are trained adversarially, if that’s a word. They are trained to not have any “tells”.
Working in a team isn’t adversarial, if i’m reviewing my colleague’s PR they are not trying to skirt around a feature, or cheat on tests.
I can tell when a human PR needs more in depth reviewing because small things may be out of place, a mutex that may not be needed, etc. I can ask them about it and their response will tell me whether they know what they are on about, or whether they need help in this area.
I’ve had LLM PRs be defended by their creator until proven to be a pile of bullshit, unfortunately only deep analysis gets you there
puttycat
The goal of automation is to automate consistently perfect competence, not human failures.
You wouldn't use a calculator that is as good as a human and makes mistakes as often.
downboots
Yes. I think the main warning here is that it is an added risk. A little glitch here and there until something breaks.
Aurornis
> I don't know if I agree with either assertion… I've seen plenty of human-generated knowledge work that was factually correct, well-formatted, and extremely low quality on a conceptual level.
Putting a high level of polish on bad ideas is basically the grifter playbook. Throughout the business world you will find workers and entire businesses who get their success by dressing up poor ideas and bad products with all of the polish and trimmings associated with high quality work.
maplethorpe
[dead]
rushabh
A corollary of this could be that people interested in Serious Work will never use LLMs. Could be the new "tell".
vivid242
With AI, we‘re cargo-culting understanding. We‘re reproducing the surface of having understood something, but we‘re robbing ourselves the time and effort to truly do it.
hellohello2
AI can do things on its own, without you understanding them yes.
But if you are trying to understand something well, there is no better tool for helping you than AI.
trueno
i been telling my coworker this who's only use case he can conjure up with AI is simply "im going to give claude snowflake cortex, our integration code, all our documentation, jira tickets and its gonna make everything so much better. we'll be able to ask him anything and get the answer" and he's just lost the plot because there wasn't much of a plot. Sci-fi's infused him with how great it would be to have something to answer any question he had. he's hung up on this possibility of having his own tony stark jarvis at his disposal, in his head this is going to be the thing that speeds him up.
i'd say it's been a huge distraction for him and the obsession over using LLM for Big Wikiz hasn't yielded anything near what he thought the tech was for. few occasions now he's learned the hard way how imperfect the technology is.
between that and everyones grand visions for agentic workflows i've mostly just receded into being one of the few who is still regularly delivering stuff. i'm using AI to speed my delivery up quite a bit, i'm just not wasting my time taking it on some big grand adventure. the irony that a lot of people pushed back on companies who wanted to implement chat bots and they spend most of their credits/tokens making their own chat bots by collecting six trillion .md files and adding skill files.
my real takeaway is this: i've come to reason that there is some sort of loss in actual real institutional knowledge when we attempt to take shortcuts to growing the breadth of our own knowledge. i don't mean "hey claude give me some examples of how companies typically design x to solve for y" or "golang is new to me, what are the benefits of a compiled language versus something that requires a runtime going".
no, i'm talking about these kinds of questions:
"/somePersonalBigWikiProjectInvokedBySkill.md claude review our current tooling and infrastructure, how can we 5x our deployment speed, then search the web for <some SaaS company> and put a proposal together to get it implemented at the organization and include a 5 year cost benefit analysis and ... "
i look around and it feels like everyone is nerfing themselves. that latter question? people are just sending claude proposals left and right. my eyes have completely glazed over. is it really that hard to do some digging yourself? we're already ceding the ability to just go grab an architect or senior engineer and ask him what he thinks about how <some SaaS company> will fit with the broader suite of technologies and visions on the horizon. we're just skipping the pieces where we do a little discovery together and work together on an outcome. we're walking away with surface level understanding of many things.
this clearly has visible impacts on how we engage with each other, there's something there that I'm noticing and don't have the words for. it's mostly that people are less able to explain what it is they're talking about when pressed for deeper details, but also everyone's behavior is now different because AI sort of... makes them feel like they have definitive answers/strategies and they're no longer willing to have their ideas challenged. they no longer see that as a learning experience, a chance to learn from someone who has wisdoms who is already a walking wikipedia on something. the perfect technology for people who hate when someone with way more experience than them says "maybe not a good idea and here's why"
i've met some interesting people who are just... walking encyclopedias on some or many domains. incredibly smart people who have so much knowledge and wisdom and so many years of experience not just with tech but with people and failures and successes. i don't doubt for a second that the human brain is capable of holding an unbelievable index of information in a natural way that marries well with decision making processes that come from experience. i'm not sure what gap people are trying to close building themselves some proverbial great library here, but i would encourage people to just sit back and trust that their brain is still one of the greatest technologies at their disposal.
sendes
This is an already apparent problem in academia, though not for the reasons the article suggests.
It is not so much that the "tells" of a poor quality work are vanishing, but that even careful scrutiny of a work done with AI is going to become too costly to be done only by humans. One only has so much time to read while, say, in economics journals, the appendices extend to hundreds of pages.
Would love to hear if other fields' journals are experiencing a similar pressure in not only at the extensive margin (no of new submission) but the intensive margin (effort needed to check each work).
Daishiman
To be fair, a lot of academic fields are such that anything at a Master's level or above requires serious competence to judge and for anyone below there's no distinction between what's right and what looks right.
monocasa
I think this is why middle managers seemed to be the first acolytes to the church of llm supremacy.
It's a weird space in middle management where all of the incentives other than true competency in the role push you to abstract the knowledge work that you're managing, and that abstraction seems to well describable in embedding space.
wxw
Ultimately to understand a thing is to do the thing. And to not understand (which is ok!) is to trust others to, proxy measures or not. Agreed that the future of work is in a precarious place: doing less and trusting more only works up to a point.
`simulacrum` is a great word, gotta add that to my vocabulary.
slickytail
[dead]
coppsilgold
> The training doesn't evaluate "is the answer true" or "is the answer useful." It's either "is the answer likely to appear in the training corpus" or "is the RLHF judge happy with the answer." We are optimising LLMs to produce output which looks like high quality output.
It's not quite as dire as this. One of the main reasons why LLM's are getting better over time is that they are used themselves to bootstrap the next generation by sifting through the training set to do 'various things' to it.
People often forget that the training corpus contains everything humanity ever produced and anything new humanity will produce will likely come from it as well. Torturing it with current generation models is among the most productive things you can do to improve the next generation systems.
NickNaraghi
It's a funny thing to write, like an article in an old newspaper that aged quickly. I suspect that this will be wildly out of date within 2-3 years.
krackers
I think it's already out of date with verifiable reward based RL, e.g. on maths domain. When "correctness" arguments fall, the argument will probably just shift to whether it's just "intelligent brute force".
gipp
The set of tasks for which "correctness" is formally verifiable (in a way that doesn't put Goodharts Law in hyperdrive) is vanishingly small.
TheOtherHobbes
"stochastic genius"
cyber_kinetist
"The simulacrum is never what hides the truth - it is truth that hides the fact that there is none. The simulacrum is true." - Jean Baudrillard
Aligned with the theory of Bullshit Jobs - LLMs expose the fact that the white collar work most of us have been doing at this point were actually bullshit. When LLMs "fake" work, it actually hides the reality that there was no meaningful work here in the first place.
firefoxd
Everybody's output is someone else's input. When you generate quantity by using an LLM, the other person uses an LLM to parse it and generate their own output from their input. When the very last consumer of the product complains, no one can figure out which part went wrong.
balamatom
Well the last consumer is holding it wrong of course. Why? The last consumer is present, and everyone else is behind 7 proxies.
tkiolp4
I think this is pretty obvious for many of us in the industry. Unfortunately, there is so much money on the table that the big players will shove whatever they want down our throats
happytoexplain
"They sound very confident," was a warning a gave a lot on a project a year ago, before I gave up trying to get developers to stop blindly trusting the output and submitting things that were just wrong. The documentation of that team went to absolute shit because the developers thought LLMs magically knew everything.
Get the top HN stories in your inbox every day.
The article asserts that the quality of human knowledge work was easier to judge based on proxy measures such as typos and errors, and that the lack of such "tells" in AI poses a problem.
I don't know if I agree with either assertion… I've seen plenty of human-generated knowledge work that was factually correct, well-formatted, and extremely low quality on a conceptual level.
And AI signatures are now easy for people to recognize. In fact, these turns of phrase aren't just recognizable—they're unmistakable. <-- See what I did there?
Having worked with corporate clients for 10 years, I don't view the pre-LLM era as a golden age of high-quality knowledge work. There was a lot of junk that I would also classify as a "working simulacrum of knowledge work."