AI Self-preferencing in Algorithmic Hiring: Empirical Evidence and Insights

Daily Digest email

Get the top HN stories in your inbox every day.

hyperpape

I'll copy what I wrote on LinkedIn (note: I read roughly 25 pages, which is half the paper, and read it quickly)[0]:

"If I read the paper correctly, they don’t actually show that LLMs prefer resumes they generate.

Their actual method seems to be taking a human written resume, deleting the executive summary, having an LLM rewrite the executive summary based on the rest of the resume and then having another LLM rate the executive summary without the rest of the resume.

That’s likely to massively overstate any real impact, if you can even rely on it capturing a real effect.

I really wonder if I read that correctly, because I can’t come up with a justification for that study design."

[0] I couldn't help but mildly copy-edit before pasting here.

Edit: yes, the authors present a reason for their design, and an ideal version of my comment would've said that. I do not consider it much of a justification. See below: https://news.ycombinator.com/item?id=47987256#47987727.

b112

Could be an ad for 'use LLMs more'. A generic ad like this helps all in the market, but if you own 30% of LLM market share, it still helps you 30% of the time.

Now that I think of it, every other industry has an 'advocacy group', whether cheese, oil, or nutmeg. So surely there is now some sort of LLM 'consortium', and group funding studies like this just fuels the FOMO. You can be sure such groups exist, and are pummeling every government in the world thusly. But I bet they're also looking here.

After all, it's a circle. Uh-oh! HR is using LLMs, you'd better too potential employee! Then later? Uh-oh! The best employees you can hire are using LLMs, you'd better too HR!

They already FOMOed us into basically everything else, why not LLMs too?

delusional

[flagged]

aDyslecticCrow

There is some creativity in the rest of the CV, between what kind of experiences are included and how they are described. But that would be far harder to generate fairly.

In think choosing the summary is a fair design choice since it prevents the LLM from just... making up a perfect candidate.

"I'm a fullstack professor of software design with 90 years of experience expecting a junior internship position"

nearbuy

I assume they meant they can't come up with a reasonable justification.

hyperpape

Thank you, that's correct.

To be perfectly clear, I understand their justification for only _editing_ the executive summary, it is arguably reasonable, because editing the work history would risk altering the details in ways that compromise the measurement. This is a hard problem to solve (you might try reviewing the resumes for hallucinations, but I can't think of a precise study design that doesn't risk problems).

What is, imho, impossible to defend, is having the LLM only evaluate the executive summary in isolation, and reporting that as it preferring resumes it wrote.

What you've shown is that LLMs prefer executive summaries they wrote. But the overall impact on how they will evaluate your entire resume is not measured by this technique.

Worse, this isn't just "decent paper, bad summary", their abstract misreports their findings.

delusional

I doubt it since they, admittedly, didn't read it. The question he posed, about the paper, is answered in that very same paper. He has structured his whole reply to have the tone of uncovering the hidden caveat in the small print that invalidates the paper, when it's actually a straightforwardly stated assumption in their methodology section.

ekianjo

> They state that unlike the rest of the resume, which is largely factual

largely factual? A resume is usually more than a bunch of dates and titles of positions.

charliebwrites

Anecdata, sample size of one:

When I was looking for my next role after being laid off, I didn’t get much of a response with my human handmade resume despite my experience

Just for kicks, I asked ChatGPT to “Analyze my resume and give it a score for what percentage it was in” then I asked it to revise it to make it score as high as possible

I still tweaked and fact checked it but after I started sending that out, I got a much higher hit rate than before

But who knows, maybe the market changed, was a better time of year, etc

I still had to pass interviews and prove my worth. But it probably helped me get my foot in the door

leonidasv

Same thing happened to my wife as well. I helped her tailor her LinkedIn profile and resume with a lot of attention to detail: adding metrics, keywords, results, etc. Nevertheless, she never received any outreach recruiters and got very few application responses. It went like that for months, almost a year.

Then she asked ChatGPT 5.x for help. I was skeptical about the changes it recommended (and was skeptical at all about using AI for this given the homogeneification it tends to produce). But somehow it worked: few days later, a recruiter reached out, then another, then applications started moving forward, etc.

My guess is that, as LLMs are shoveled into every phase of the recruiting process, not having an LLM write your resume for you is now playing on hard mode. The LLMs reviewing resumes are downranking resumes and profiles that are not "speaking" the same language and activating the correct neurons, thus preventing you from moving forward. This contrasts with years ago when we had more humans in the loop and the pasteurised writing of GPT 3.5/4o would make you look less worthy. Again, just a theory, but...

andsoitis

> I helped her tailor her LinkedIn profile and resume with a lot of attention to detail: adding metrics, keywords, results, etc.

FWIW, when I see a resume with metrics and keywords, I immediately filter it out.

schrodinger

Same.

If it's something like "Refactored the apartment list service improving P99 Latency from 2s to 180ms", it definitely boosts the resumé in my mind. A good engineer would be measuring their impact and likely have numbers like that off the top of their head.

But if it's like "Increased revenue by $18.7M by reducing time-to-first-interaction latency from 2.3s to 117ms, increasing conversion by 47% and LTV by 28%," with the same fidelity on each bullet, I'm very skeptical.

I don't summarily reject AI-written resumés to be clear, as honestly, it's basically a necessity at this point to be competitive with others; it'd be putting yourself at a severe disadvantage on pure principles in a way that has no real positive net effect on society. Even if you disagree with AI resumé screeners, you're only hurting yourself — especially at a time that has the largest impact on your compensation (i.e. negotiating salary at job start is one of the most valuable ways to spend your time since it will pay you back every paycheck).

Though I _do_ tend to question resumés that look like they were written almost entirely by an LLM without the candidate providing significant context and refinement.

roymain

What i have observed, which is the hard (unpredictable) part for candidates is, they can't tell which kind of hiring manager they'll get. Some reject resumes with metrics, others use software that won't let the resume through without them. Most (non-subject matter experts) would just look for whatever they think will get past the filter.

what i am researching right now is if you (hiring manager) got 2 resumes from the same candidate, one hand-written (metaphorically) and the other built by chat-GPT or AI, which one would you call ?

mikeyouse

Which is a very “HN” sentiment when the vast majority of recruiters and hiring managers are absolutely not doing the same. Especially for roles outside of tech.

RajT88

Same. I am well aware how the metrics game goes - even inside the company it can be hard to disprove the metrics claimed, and people count on that. Even managers coach you on putting metrics you cannot prove or disprove.

hiAndrewQuinn

What counts as a keyword here? If you're hiring for a frontend developer and you see e.g. "Redux" do you just can it?

reillyse

You must be a pre 5.0 model so.

ai_slop_hater

Gigachad. Just don’t forget to signal somehow that you aren’t like everyone else, so that legitimate candidates can send their real resume instead of AI generated one.

j45

Having implemented more than a few applicant tracking systems, too many are so anchored in the past, that they would probably try to boil the ocean at once by letting AI loose on it, leaving an ability for ai resumes to ai applicant tracking systems.

The key insight here is humans are responsible for improved articulation to the ai, who in turn will improve the rest, and that can be as detailed and informative, and educational as the human likes.

newsy-combi

Kafkaesque

kafkaesque

The Orwellian nightmare is nigh

Esophagus4

There are services that will do this as well - I’ve used them both on my LinkedIn and resume with decent success.

spike021

I was recently job hunting and did something similar. Had it check my bullets and see if they "read well" and it suggested many many tweaks. I tried a few. I'm not sure how much more it helped the applications though.

fuzzy_biscuit

I've done as you described and then edited it down to sound human again.

amelius

I suppose the HR folks gave you a "+1 knows how to use AI".

grey-area

It seems more likely the HR people depend on LLMs to do the job of screening and LLMs unsurprisingly prefer LLM output and rank it highly.

It’s not lazy incompetence, it’s quietly getting the job done with 1% of the effort (that was a sarcastic pastiche, in case anyone was unsure).

zdragnar

It's not uncommon to get hundreds or thousands of applications per opening for web tech, if the position is advertised on LinkedIn or a similar job board.

They'd need to use some automation, even if it is just picking ten at random.

ben_w

Some will, others openly say on the job ad they will fail you for using AI.

izacus

And then still use a CV scanning service that rejects non-AI resumes.

dawnerd

I know if I got a resume from someone that had obviously used AI to generate it, it would be a pass.

luotuoshangdui

It makes sense. An LLM can definitely help polish your resume.

p_stuart82

that's the loop though. if GPT does the screening, people learn to write for GPT. once that loop exists, why would the company selling the filter want it gone?

davebren

Probably gonna get downvoted for this, but when you give an anecdote you don't have to preface it with "anecdata, n=1 sample size".

We know it's from your individual experience because it's a story about your individual experience. We've been doing this for all of human history. This is some kind of strange milieu of trying to always sound scientific, or it's fear of the "well akshually I'm gonna need to see a random placebo controlled trial", which is equally annoying.

Fezzik

It became necessary because, for years (decades), if you made a comment online that your personal experience informed you in such-and-such a way, the first comment would always be some moronic comment dismissing that personal experience because it is just one person’s experience. So, to avoid that idiocy, people started to preface their anecdotes by acknowledging that they know it is an anecdote. It sets the tone for the conversation.

davebren

Yeah but we can't let the insufferable dictate our way of speaking. In spoken language I hear it mainly by people that don't have a scientific background trying to sound more scientific.

peyton

I’ve been told explicitly to do what GP said, so it’s perhaps becoming word-of-mouth career advice at this point. In my case it told a different career story that is maybe more easily digestible.

benashford

Intuitively this feels obvious. Content generated by the model will be shaped by its training, therefore when reading it back it will resonate with that same training and have a positive view as a result.

Human when preparing a CV: "Make my CV more professional"

LLM many days later presenting a report to HR: "This CV is really professional"

There's probably more to it than that of course.

But it justifies my personal policy of using a different LLM family for code review tasks than for code generation tasks. To avoid the "marking your own homework" problem.

gzread

And not in human-interpretable ways. An LLM was told to behave in a certain way and then output random numbers. When the numbers were pasted to another LLM instance, it also behaved that way. I wish I remembered more about that study or had a link to it - it was fascinating.

mnicky

Wasn't it this one?

Article: https://alignment.anthropic.com/2025/subliminal-learning/

Paper: https://arxiv.org/abs/2507.14805

bendergarcia

We are without our consent introducing a party in between people. The models become the arbiters of who does and does not get a job. It feels problematic.

justonceokay

There will be a great arbitrage for people who do not use LLMs.

If your HR department is using ChatGPT to filter resumes, you’ll end up with people who used ChatGPT to generate resumes. I don’t want to make a “slippery slope“ argument, but my gut feeling is that the quality of your organization will deteriorate quickly.

On the other hand, I am a handyman/subcontractor. Almost all of my work comes through phone calls, texts, and one-off emails. I only work with people that are recommended by a trusted sources. I haven’t handled a traditional resume (mine or other people’s) in over eight years.

If I started interacting with somebody and they seemed like they were a computer, that would be the fastest way for me to know I should move on to another client. If they can’t take the time to interact with me, how am I supposed to perform hundreds of hours of physical labor for them?

bendergarcia

And I feel the common response of: well just use the model that’s available. Ai is and will probably always be resource constrained and profit driven, that means we will eventually see a world where poor people have worse resumes than rich people and there really won’t be any way around it because the man in the middle has the final say

adrianN

Not too long ago I bet resumes that were printed from a computer were preferred to resumes typed on a typewriter. What happened was that computers became commodities. It is reasonable to assume that LLMs will become commodified too.

YurgenJurgensen

That would hardly be surprising. Monospaced fonts make natural language a pain to read, so what that would prove is that well-presented resumes are preferred to poorly-presented ones.

This case is different, as the LLM output isn’t measurably better than the human output (unless you have a particular love of bland corpo-speak).

Nuzzerino

This is a terrible way to soften an obvious alignment failure with AI rollout.

undefined

[deleted]

falcor84

The ship has sailed as soon as hiring managers stopped reading cv's directly and we got recruiters as a profession.

ekianjo

before it used to be HR, so you always had a party in between "actual" people. HR (mostly) never cared about the CV, they just look at a checklist and see if it matches.

sneak

We already did that when we all created LinkedIn accounts.

sxg

Take a look at how things worked before (and still do): employers decide who get jobs based on a combination of personal biases, nepotism, and ulterior motives while applicants present distorted versions of themselves and network/pull strings to put the odds in their favor. That seems more problematic.

1attice

You would be surprised at the process in other industries. What you are describing is the tech job market specifically.

Other fields have their own problems, including credentialism and ballooning concomitant student loans, but do, by strict convention, not hire based on vibes or pulled strings. Often to their partial detriment, as the cure -- ie, strict oversight of hiring that also forces the hiring manager to ignore important implicit signals -- is alive and well in medicine, law, civil engineering, education, and the trades. Notable exceptions include entertainment, sales, real estate, and software engineering.

By optimizing for vibes, the tech industry gains "Spidey senses" in the hiring loop but pays for it in impartiality.

IMO this precipitated the DEI movement's advent, as it was seen as a way of remediating the drawbacks while preserving the information channel.

Without it, expect either homophily, and, eventually, a harsh and remedial credentialism.

sxg

I'm a physician and have recently been on both sides of the hiring process for new physicians and residents at a few different institutions. It's absolutely not meritocratic--you'd be shocked at how strong a role connections and pedigree play. The hard requirements are just table stakes, but the selection process from there is completely subjective and susceptible to all kinds of problematic biases. Generally people don't want to rock the boat and discuss this stuff openly, but it's absolutely a problem that needs to be pointed out.

rogermarley

I think resumes will eventually (or have already) become obsolete in tech. The SNR is so low, they offer very thin filtering value.

Even taking the tiny bits of the resume that are "hard signal", like GPA, certifications, prior roles, etc, it doesn't translate into their performance in the initial screening interview.

This is why what I think the industry sorely needs is examination consortia.

Rather than trying to guess capability from the name of the university they went to, leading tech companies creating standardized tests in various fields, and your test scores form your "resume", so that developers can just focus on improving their scores rather than wasting time on resume/application/repetitive-screening toil.

indiv0

Eventually even a system like that can be gamed, similarly to how Leetcode-maxxing and the like sprung up in response to typical SV interview questions. Studying for the job becomes studying for the test becomes studying for the pre-test test.

aDyslecticCrow

> standardized tests in various fields

This is itself a massively difficult problem. Standardised tests are bad indicator of topic understanding. (setting aside the massive incentive for blatant cheating)

You're effectively advocating for leetcode being effective hiring tool, which many would highly criticize.

rogermarley

But I think even if it were purely leetcode-like, devs would actually be quite happy with this, since at least you'd only have to do it once and then it's re-usable for every application.

At the end of the day it doesn't really matter what our opinions of good screening are, but what the salary-payers are. Personally I just rely on live (& conversational) task-based coding tests.

aDyslecticCrow

> our opinions of good screening are

I want competent and skilled coworkers. I care about our hiring process, and the hiring process of where I apply. Many modern screening processes are abysmal, and a abysmal screening process is reflected in the company and culture over time.

My experience of university exams makes it very clear that studying for test and studying to understand a topic are two different goals that collide or even contradict.

I dont want to hire anyone that studdied for the test instead of the topic. Placing any higher stakes on the test result encurrage the wrong behaviour and filters the wrong people.

I have friend who failed physics because they spent all their time writing their own kernel for mips assembly. And plenty of classmates who aced the exam by memorising prior year question examples.

who would you hire?

qwytw

Maybe just a lottery instead? Would be approximately as useful just way simpler.

Also don't all of the "enterprise" certificates already provide all that, anyway?

cyberax

It's hard to design tests for CS. Leetcode is too simplistic, it just tests the basic algorithmic knowledge that is nearly useless for regular software development.

rogermarley

Its purpose isn't really to test practical skills though, more just to screen for intelligence and conscientiousness (like a tournament who can take the most mental punishment), which are extremely useful in software development.

AlexB138

This may lead to some interesting gamesmanship. For instance, if I am applying to a company, and I know they use a certain applicant tracking system, and I know that ATS uses a certain model provider for its filter, I should then use that model to write the version of my resume I send to the company.

mft_

Good observation. There are so many versions of the future that just become an LLM arms race.

ivansmf

I suspect the entire industry uses "auto-raters", where an agent instance is used to scores the agent's output. The idea is similar in intent as using adversarial networks to train image generation, minus the human labelers. Raising the scores of the auto-rater then becomes the metric teams optimize, and it is no wonder the end result is that the agent scores its own generated content the highest.

danielodievich

So just to test, loaded qwen/qwen3-v1-30b locally, and fed my 100% human-written resume and asked it "Make this resume more professional".

Mucho bullets came out.

My sentence "I specialized in enterprise data modeling and worked on Cost of Goods Sold optimizations across entire customer base." became a bullet sentence "Specialized in enterprise data modeling and performance optimization, driving $5M+ in recurring cost savings across the customer base.".

The $5M+ sure sounds awesome, and clearly the corpus of resumes lean towards metrics, but its not true and I didn't ask the model to make up numbers.

Oh and it awarded me a "Bachelor of Science in Computer Science from University of California, Berkeley | 1996 – 1998" out of thin air. My resume has a SDE job between 1996 -1998. Oh man.

voncheese

Oh man is right! The making stuff up is going to make this problem even bigger.

There will be people that correct those hallucinations, in that scenario it’s “only” the applicants time that is wasted.

There will be other people that don’t correct those hallucinations, in that scenario the best case outcome is wasted time for the applicants and interviewers (who find the mistake later). The worst case scenario is people are hired who aren’t capable of doing the job and that’s all kinds of messy and inefficient for all.

mcv

Timely topic for me. My CV had grown to 7 pages, and I kept reading everywhere that it should be no more than 2, so I asked Gemini to rewrite it. Took a lot of time, because Gemini loves to exaggerate everything, but I'm quite happy with the result.

The first couple of recruiters I sent it to preferred my old 7 page CV. I guess they're not using enough AI yet.

onlyrealcuzzo

Further, LLMs consistently think LLM written content is "good".

Ask an LLM to write some design doc for you, wait until you get one that's very bad, send it to other LLMs and get their feedback, they will typically have good things to say.

Compare that to a very well written document you have. They will typically have a lot more bad things to say, even if the premise is solid.

Someone should study this.

LLMs clearly have a lot of value. But IMO this is very interesting and points out a weakness that's not entirely clear what the full ramifications of it are.

I suspect LLMs also have a major bias to code they write.

Take something universally considered to be well written like Redis, feed it to an LLM for feedback. They'll probably find much to pick apart (and a lot of it may be flat out wrong).

Feed the same LLM some clearly garbage LLM repository. Do they have a similar response as they do with design? Do they treat language different than code, and they're just susceptible to the way they write regular language that's different from logical code? Or do they have the same problem?

Has anyone done this?

drillsteps5

That's what people on both side have been doing for at least couple years already.

Recruiters scan resumes for the best match with LLMs, candidates use the same LLMs (there's only like 3 of them) to tweak their resume for better match. I don't know what research you need to see why that makes sense.

yagi0x00

This indicates that resumes created by the same model may have an advantage over those created by other model, so I suppose technically you may have a small advantage if an insider tells you the resume parsing tool is powered by Gemini as opposed to the other models.

My broader discomfort is that we are still learning about model biases while human biases are arguably better understood, and I don't like the ethics of rejecting a person based on criteria I don't fully understand.

drillsteps5

I wasn't saying that this is the optimal solution (it clearly is not). I was saying that it makes perfect sense for both sides - HR has their work automated and candidates have better chance to be noticed - and therefore became a common practice in many places.

The well has been already poisoned, to survive you have to get in on the action.

Don't want to play this game? Make connections, set up the network, and use it to get/stay employed.

aDyslecticCrow

It further makes expecting or spending the effort hand writing a proper introduction useless. Which then undermine the entire purpose of it.

visarga

When classifying resumes it is better to use the LLM as a feature extractor, think of 10-20 features you base your decision on, and extract them by LLM. The LLM only needs to do lower level task of question answering. Then you fit a classical ML model (xgboost for example) on the extracted features, based on company triage data points. This way you don't rely on the biases in the model, you can decide what criteria to use and how to judge cases without retraining the LLM. The feature extractor is generic, and the actual triage model is a toy you can retrain in seconds on new data points. It is also much more explainable, you can see how features influence decisions.

aDyslecticCrow

I'd rather my employers just does the classic of shredding random 80% and looking at the remainder properly.

cyberax

Ah, the good old "we don't need unlucky losers here" strategem.

Daily Digest email

Get the top HN stories in your inbox every day.