Breaking the 4Chan CAPTCHA

Daily Digest email

Get the top HN stories in your inbox every day.

cherryteastain

The part about bad Keras<->Tensorflow.js interop is classic Tensorflow. Using TF always felt like using a bunch of vaguely related tools put under the same umbrella rather than an integrated, streamlined product.

Actually, I'll extend that to saying every open source Google library/tool feels like that.

alecco

related (15 days ago)

https://news.ycombinator.com/item?id=42130881 on Francois Chollet is leaving Google

> "Why did you decide to merge Keras into TensorFlow in 2019": I didn't! The decision was made in 2018 by the TF leads -- I was a L5 IC at the time and that was an L8 decision.

Retr0id

something something Conway's law

Dachande663

Semi-related but I needed a CAPTCHA on my site[0] mainly to block comment form spam and settled on repurposing a fun method I’d seen before. Is definitely not foolproof (or hard at all), but I really liked making it.

[0] https://www.hybridlogic.co.uk/contact

vunderba

Reminds me of the Doom captcha.

https://vivirenremoto.github.io/doomcaptcha/

Dachande663

99% certain this is where I copied the idea from.

winrid

It says I've been blocked when I try to view that. Not on a VPN.

Dachande663

The site runs off of a tiny little server at home so I’ve got some very aggressive firewall rules. Anything from the usual bad countries, certain signatures etc are blocked. Reduced traffic to 1% of previous load.

efilife

What are the bad countries? Russia and china?

winrid

I'm in silicon valley in the USA on Comcast lol

EasyMark

Are you in a safari browser?

winrid

Chrome android

chamomeal

No way, that is a cool fucking captcha!!

tayiorrobinson

Cool, sure, good, probably not. I've never played Halo so I didn't entirely know what I was doing (do I shoot the blue guys too? it's not letting me through so I guess I do), and I don't doubt people couldn't even get what it meant by shoot. And god forbid anyone with disabilities that affects their mouse accuracy, or needs a screen reader tries to use it

Haven't looked at the devconsole but it'd probably be easily bypassed by someone dedicated.

remram

Agree on the first part, but for the second... I think it depends on what your threat model is.

If you want to stop a dedicated attacker ready to spend time to attack your site, it won't work, but nothing will. If you want to stop a generic bot going over the internet and submitting all forms it finds with spam, this will work, and might even work better than wide-spread solution for which the bot has a countermeasure.

It has the advantage of being novel for the user rather than doing the same Google/Cloudflare/... CAPTCHA for the 10th time that day.

account42

Cool as a one-off use on some random blog contact form. Infuriatingly annoying if used somewhere you have to solve it with any frequency.

bawolff

There is a reason why people moved away from distorted text based captcha. We are basically at the point where computers are better at them then humans.

https://www.usenix.org/system/files/conference/woot14/woot14... is a paper on the subject i think is really interesting

However a surprising amount of text based captchas can be solved in a few line shell script of, using imagemagik to convert to greyscale, dilate and undilate, then pass to teserract

However there are also sites like https://2captcha.net , so really captchas are more like putting a small min amount of effort.

noprocrasted

Just because you can technically crack them doesn't mean they're useless.

There's a significant amount of time, skill and effort that went into the solution from this post, and the end result doesn't generalize well (you'd have to start all over for a different kind of captcha).

The vast majority of spammers would not be able to replicate this; those who do would either make money legitimately, or focus their skills on juicier targets (if you have AI/ML skills and want to do nefarious things there are other options that pay much better than spamming).

Such captchas still work well at raising the cost of successful spamming above the expected payoff from said spam.

reaperman

So, I do this type of AI development for solving CAPTCHAs.

I can't get any real jobs that pay me for my more advanced skills. My primary sins were going to a second/third-tier university and some performance concerns in a portion of my previous roles due to divorce and burn-out. I make $80k/year in government IT, and $30-150k/year as the "AI" guy in a small 2-5 person group that offers a CAPTCHA-breaking API.

The spammers aren't the ones replicating this. They just pay B2B rates (combo of SaaS + Consulting, depending on client needs) to help them remove the roadblocks.

jostinian

I am a nafri with a PhD and engineering experience (with europeans), I can't make good living going the traditional way either with with remote jobs being impossible and no luck landing a visa.. I have built custom solutions for big name EU companies to keep an eye on the competition through scraping. captcha solving cloudflare bypass is a great part of that. Getting back at companies making the UX bad with captcha does feel good also.

HeckFeck

Why do you do this?

While I can appreciate the technical achievement, you know most users of forums and imageboards don’t want any AI content at all.

benreesman

If there were a totally 100% aboveboard way to do this in a net transfer of utility from Tessier-Ashopool SA to the typical web surfer I would be a superfan.

blackjackfoe

Is your company hiring? :)

ryandrake

Despite the spamming angle, I think CAPTCHA-breaking is, on the balance, noble and honorable work. These things are user-hostile blights on the web, and any effort towards making them disappear as useless is worthwhile. Sites worried about spam should invest more in automated spam classification/elimination instead of punishing real users with CAPTCHA-solving. Not that I can offer a solution--if I could, I'd be a millionaire.

ape4

$30-150k/year is a big range

TZubiri

Ahh the good ole dilemma of selling your soul, you study what you love only to destroy it for profit. Like an entomologist hired by a pesticide company.

I get it man, gotta make the bucks helping spammers advertise their shitty products, even if they destroy the internet.

fragmede

> there are other options that pay much better than spamming

Are there? Say you've got a felony record and can't get a legit AI/ML job at eg OpenAI/anywhere. What would you do instead? most of the options I can think of involve getting paid for doing things that are basically spam if you zoom out enough.

benreesman

I’ve got no criminal charges of any kind and I’d still want to know about any way to work without getting flagged as a known enemy of the Cartel.

I’m lucky that some people still want chops no matter the thought crime, I’m very grateful such excellent employers exist (love you guys).

But you’re never sure you’ll line up two such in a row, this isn’t the IBM until company casket and company funeral days. Makes life “interesting” even for a risk-taker.

andrewflnr

How many people are there like that, and how much damage are they collectively likely to do? If you're a random spammer, how hard will it be to hire that person? Again, not aiming for impossibility, just reducing the damage.

noprocrasted

There's plenty of mischief potential with "deepfakes".

hamilyon2

Captchas are now useful to distinguish well-intentioned bots (they stop whenever they see captcha) from malicious ones, which solve them, but still behave a lot like bots.

Well-intentional bots are first-class citizens

brookst

Wouldn’t a well-intentioned bot follow robots.txt anyway?

lostlogin

Do you complete the circle and do the good bot bad bot classification with a mod bot?

TZubiri

Interesting, subtle difference but I always thought of captchas as having computational difficulty, but that's clearly not the point as you say. The cost is not compute but developer time.

If you manage crack it at 1mhz per captcha or 1ghz or 1000ghz, it makes no difference, as the bottleneck is the network identifier (ip address/block)

While still a type of PoW, these economics are different than offline mechanisms like password hashing or crypto. Where a 1ghz cost is still significantly different than 1mhz.

atomicnumber3

The watershed of "good enough at programming to just get a real job" vs "can code enough to be really annoying to businesses, but not enough to hack it as a dev" is a lot more on the annoying side than you'd think.

I say this with the chagrin of someone who works on a cool software product that is also coincidentally really well-shaped to make people want to abuse it.

undefined

[deleted]

delfinom

>he vast majority of spammers would not be able to replicate this;

Eh? They just need to buy their software from someone that can. I would say many of the malware and spamware isn't created by every individual deploying it, but instead vendors that got good at it and decide to make revenue by licensing out their software to other bad actors.

brian-armstrong

Makes me wonder what comes next. Could we create a forum where every member must do a 15 minute video interview with a moderator? I know this "doesn't scale" but I think it could make for a funny gimmick.

matchamatcha

When I was a teenager, I stumbled upon a music forum that required phone interviews for signing up. They had other interesting sign up rules, like you could not have silly user names (judged by the admin). I guess it served as an effective filter for their member base..

lobsterthief

The silly username thing goes a bit too far though. It just means the admin will subjectively apply other rules. Doesn’t sound like a lot of fun.

jabroni_salad

private torrent trackers are/were doing that. It was really just to make sure you understood how p2p culture works and what the expectations are, and really easy to pass if you just followed a guide. However, I did see many people fail their interview.

drexlspivey

The famous RED tracker has a full on technical interview asking about:

* Audio Formats

* Transcoding

* Spectral analysis

and more.

This is the interview prep website: https://interviewfor.red/en/index.html

jmb99

Was there ever video interviews? Admittedly I wasn’t really paying attention but back when I was getting into what it was only IRC, and these days it still seems to be IRC anywhere that does interviews (otherwise class-restricted forum invites).

bdjsiqoocwk

[dead]

ggu7hgfk8j

We are increasingly moving to ID checks. Australia law just now. For all its faults it solves spam as side effect.

ranger_danger

There are lots of random ID documents available on dark networks however.

qqqult

It also makes it 100x more likely for you IDs to leak online as KYC companies are valuable targets that get hacked every month

bobsmooth

A small signup fee is much easier.

grishka

But it excludes people who don't have easy access to international banking.

undefined

[deleted]

3abiton

I think captchas are just another lind of defense to make it harder for actors abusing the system. It's not a solution, just a little (getting outdated) fortification.

poincaredisk

Small? From your own link, recaptcha v3 takes 10-15s and costs $1.3 for 1000 captchas. This is actually huge, and cost prohibitively expensive for many things where you would want to use it (like scrapping a large website).

costco

Depends on the website, but you don't get always get a recaptcha, so the cost is a lot lower than that. You usually get it if you're exceeding some rate limit or you're doing a sensitive action like registering.

RobotToaster

> so really captchas are more like putting a small min amount of effort.

At that point a proof of work captcha (mCaptcha.org is one, but there are others), is probably the best option. Especially with how any reasonably effective traditional captcha is an accessibility nightmare.

cubefox

It's completely unclear what a "proof of work" captchas is supposed to be.

jamesnorden

It's CPU intensive JS code that must run to get an output that must match something server-side, the idea is that it makes attacks/spam not economically viable to run.

porridgeraisin

Brave search uses it. From my limited understanding, it sends a time-consuming javascript function and its input to your browser, and has your browser calculate the output and send it back. The server matches your output with the expected output. I assume the server would pre-compute in some way? On the spectrum, it leans more towards being a spam-alleviating thing rather than a human-distinguishing thing.

marcosdumay

Almost always it's some variation of "give me a string with the SHA256 hash starting with 0.a471"

nyclounge

Wow Funcaptcha cost the most and it is open source.

mieko

If you're into this, here's my 2014 breakdown of the Silk Road CAPTCHA: https://github.com/mieko/sr-captcha

mbs159

Intriguing, thanks for sharing!

antirez

Appropriate response by 4Chan to this: simplify the human work given that anyway it's simple to solve via NNs. We are at a point where designing very hard captchas has high probabilities to increase the human annoyance without decreasing the machine solvability.

codetrotter

> simplify the human work given that anyway it's simple to solve via NNs. We are at a point where designing very hard captchas has high probabilities to increase the human annoyance without decreasing the machine solvability

Or disallow free users to post at all, and require everyone to buy the 4chan Pass for $20 USD per year if they want to post.

https://4chan.org/pass

This is already available to not have CAPTCHA. So if CAPTCHA is totally ineffective, it follows that they should do away with CAPTCHA and free users being able to post at all and everyone should buy the 4chan Pass if they want to post.

fullspectrumdev

This kills the board. Users will go elsewhere, fuck all people pay for pass.

jachee

And the spambots will follow them. Which kills the next board. Repeat ad nauseum until the end of the internet.

ranger_danger

Agreed, charging for accounts is the only halfway viable solution I have seen any service use that gives a sizable downtick in the sheer number of bots/spam.

Of course it's not perfect, and it will still happen, but I have yet to hear any better solutions. Please prove me wrong though!

jcpham2

This is known as a Sybil [1] attack and it lays the groundwork for stuff like Adam Backs hashcash [2] protocol and it’s basically why things like proof of work [3] have a monetary value today.

Very chicken and egg this entire field- defending against the spammers while simultaneously operating a “free” system. How to do it without making it prohibitively expensive to join the system…

Any free system will be abused yada yada yada

[1] https://en.wikipedia.org/wiki/Sybil_attack

[2] https://en.wikipedia.org/wiki/Hashcash

[3] https://en.wikipedia.org/wiki/Proof_of_work

poincaredisk

At this point I have to wait 90 seconds before making every post. (maybe because I don't persist cookies). I posted very rarely, but now I just stopped - I get it when someone shows me the door.

matheusmoreira

That would work. It would also kill the site.

efilife

What? So you use 4chan? It would completely kill what makes this website special

OkayBuddy44

[flagged]

codetrotter

As a large language model, I don’t have access to up to date weather information.

Nah. But whatever you’re trying to imply, I think it would make more sense to claim that I am Hiroyuki Nishimura or something.

For now, I’m going to prescribe you 6 months of abstinence from /pol/ and we’ll do another evaluation after that.

emptyfile

[dead]

YeahThisIsMe

We've been stuck at that point for at least 5, if not 10, years.

hackernewds

Just use Worldcoin retina scans next

gosub100

"Drag each symbol to the group that is most likely to be offended by it."

xp84

Ooh I love this, all off-the-shelf AI won’t touch it due to all their “safety” (aka anti-hurt-feelings) protocols

encom

4chan doesn't care about human annoyance. They just started doing a 15 minute post delay, which is infuriating. I had to whitelist 4chan in Cookie AutoDelete.

poincaredisk

Hi fellow cookie autodeleter, I experienced the same thing, but I just decided to stop posting. Whitelisting felt too much like giving in to terrorists. I'm considering just not going there in the future. Maybe after all this time I will finally be free.

Arnavion

Same. In my case I always use a separate incognito mode browser for posting and a regular locked-down browser with JS disabled etc. So I'd have to either give in and leave the incognito mode browser running in the background while I browser on the main browser, or give in and stop blocking as aggressively on the main browser, and I chose to do neither and just stop posting.

Given the schizos that are still present and drowing out the conversation in half the threads I read, there wouldn't be a point to posting anyway.

encom

See you tomorrow, anon.

matheusmoreira

Just stop posting there. The whole point of it is to post anonymously in a high traffic forum. The rate limiting timers have reduced traffic to the point many boards feel dead, and their solution to that problem is to sell accounts.

hsbauauvhabzb

What is NN?

numpad0

"AI" but pre-COVID

marcosdumay

Oh my!

Is the oversimplification from "deep neural network" into "AI" caused by the prevalence of brain-fog due to long COVID?

layer8

https://en.wikipedia.org/wiki/Neural_network_(machine_learni...

brodo

I am totally in favor of increasing the annoyance of 4chan users.

somat

I wonder if it would be better to pretend to have a captcha but really you are analysing the user timing and actions. Honestly I half suspect this is already going on.

If you wanted to go full meta "never go full meta" you would train a AI to figure out if the agent on the other side was human or not. that is, invent the reverse turing test. it's a human if the ai is unable to differentiate it's responses from normal humans responses. as opposed to marketing human responses.

Well now I have to go have a lay down, I feel a little ill from even thinking on the subject.

wraptile

That's kinda what every major captcha distributor does already!

Even before captcha is being served your TLS is first fingerprinted, then your IP, then your HTTP2, then your request, then your javascript environment (including font and image rendering capabilities) and browser itself. These are used to calculate a trust score which determines whether captcha will be served at all. Only then it makes sense to analyze captcha's input but by that time you caught 90% of bots either way.

The amount your browser can tell about you to any server without your awareness is insane to the point where every single one us probably has a more unique digital fingerprint than our very own physical fingerprint!

encom

This is how ClownFlare and its ilk, make life hell on the internet, when you use a "weird" browser on a "weird" OS.

jeroenhd

My experience is that IP reputation does a lot more for Cloudflare than browsers ever did. I tried to see if they'd block me for using Ladybird and Servo, two unfinished browsers (Ladybird used to even have its own TLS stack), but I passed just fine. Public WiFi in restaurants and shared train WiFi often gets me jumping through hoops even in normal Firefox, though.

I can't imagine what the internet must be like if you're still on CG-NAT, sharing an IP address with bots and spammers and people using those "free VPN" extensions donating their bandwidth to botnets.

gosub100

Re: your last paragraph, https://coveryourtracks.eff.org/

EFF have been running this for years. Gives an estimate about how many unique traits your browser has. Even things like screen resolution are measured.

zoltrix303

Would it be possible to serve a fake fingerprint that appears legitimate? Or even better mimic the finger print of real users who've visited a site you own for example?

nullpt_rs

yep, but it can get tricky.

some projects worth checking out: https://github.com/refraction-networking/utls https://github.com/berstend/puppeteer-extra

wraptile

Yes, that's what web scraping services do (full disclaimer I work at scrapfly.io). Collecting fingerprints and patching the web browser against this fingerprinting is quite a bit of work so most people outsource this to web scraping APIs.

barbolo

https://github.com/lwthiker/curl-impersonate

PUSH_AX

In that case why do I ever receive a captcha?

Pikamander2

It adds another layer of analysis. For example:

If the user solves the CAPTCHA in 0.0001 seconds, they're definitely a bot.

If the user keeps solving every CAPTCHA in exactly 2.0000 seconds, each time makes it increasingly likely that they're a bot.

If the user sets the CAPTCHA entry's input.value property directly instead of firing individual key press events with keycodes, they're probably either a bot, copy-pasting the solution, or using some kind of non-standard keyboard (maybe accessibility software?).

Basically, even if the CAPTCHA service already has a decent idea of whether the user is a bot, forcing them to solve a CAPTCHA gives the service more data to work with and increases the barrier of entry for bot makers.

sdk16420

I found several websites switched to 'press here until the timer runs out', probably they are doing the checks while the user is holding their mouse pressed, it would be trivial to bypass the long press by itself with automated mouse clickers.

kccqzy

That's what reCAPTCHA does.

benreesman

In my opinion the granddaddy of all 4chan CAPTCHA busts is still Yannick Kilcher’s GPT-J tune on “Raiders of the Lost Kek” set, and might be the coolest thing an LLM has ever done on video: https://youtu.be/efPrtcLdcdM?si=errY0PrEhnX9ylDw

chiph

Nearly a full minute of disclaimers and warnings about 4chan. That's got to be a record.

ValentinA23

>I released the model, the code and I evaluated the model on a huge set of benchmarks and it turns out this horrible, terrible, model is more truthful-yes more truthful-than any other GPT out there

Pikamander2

> The official TensorFlow-to-TFJS model converter doesn't work on Python 3.12. This doesn't seem to really be documented.

> TensorFlow.js doesn't support Keras 3.

I tried getting into some casual machine learning stuff a few years ago and more or less gave up because of stuff like this. It was staggering how many recent tutorials were already outdated, how many random pitfalls there were, and how many "getting started" guides assumed you were already an expert.

sigmoid10

As someone who has been working in ML for years, I can only recommend to stay away from anything recent. Grab an old bayesian statistics textbook and learn the fundamentals, then progress to learning the major frameworks like Pytorch. Try to write every part of a CNN, RNN and Transformer architecture and training pipeline yourself the first time (including data loaders, but maybe leave out CUDA matrix kernels). Stay the hell away from wrappers for other people's wrappers like Langchain. Their documentation is often not just outdated, but flat out wrong regarding the fundamentals. Huggingface is great if you know the basics and thus how to fix things if their standard wrappers break.

rohansuri

Any book you would recommend?

sigmoid10

You can try Theodoridis if you can find a first or second edition. It is old enough to not be diluted by the recent craze but still recent enough to cover all the necessary fundamentals. There is also a new edition coming out soon, but that seems to have been heavily tainted by the ChatGPT hype.

ChrisMarshallNY

That’s like spending a few hours, learning to take the lid off your septic tank.

blackjackfoe

Little bit, but at least you learned something :)

gherkinnn

Oddly enough, I find most of 4chan less brainrot inducing than Twitter, even pre-Musk.

JasserInicide

It's still brainrot, it's just on the opposite end of the political spectrum.

irusensei

Back when Llama was leaked on /g/ 4chan's /g/lmg/ was the best place to be up to date with local models. It still might be but not so much.

People think 4chan is just /pol/ when in fact more boards exist and their users don't really appreciate when /pol/ leaks into their threads.

tovej

There's no smart algorithm for sorting posts, and there's a limited number of active threads, so it's not rage baiting in quite the same way. Only active threads stay alive though, so it has the exact same issue as twitter and other social media, only engaging content is served to users, and the most engaging things are rage bait, conspiracy theories, and porn. Things that get someone riled up enough to respond.

thrance

I have bad news for you, then...

Cumpiler69

What's the bad news?

meowface

I am a liberal and also genuinely find many 4chan boards less politically awful than current Twitter most of the time.

The chronological sorting at least offers some diversity of opinion. The first 50 replies to a 4chan thread about Trump (in the right board) will usually contain many, maybe even mostly, anti-Trump posts. On Twitter you usually need to scroll through the sea of blue checkmark replies for a while to find even one anti-Trump post.

Some 4chan boards are majority neo-Nazis who want all minorities expelled or murdered. But stumble across a particular Twitter thread and it's the same thing but with even more ideological uniformity within the thread, and with 4000 neo-Nazis in the thread instead of 60.

That said, both sites definitely are not great to use if you aren't very right-wing.

salawat

...Don't underestimate the things to be learned studying a septic system.

morkalork

Following the links to the captcha solving service you can read profiles of the humans doing the work where its pitched as more ethical than them working in hazardous factories!

tumsfestival

I can only imagine how much worse they'll make the captcha after stuff like this picks up speed with the users all the while being ineffective against the bots.

rany_

I really doubt that they're the first to do this.

undefined

[deleted]

OmarShehata

captchas are broken, forever. There is no way to prevent bots without also preventing a bottom tier of human users (visually impaired people, old people, or just impatient people). Like this xkcd [1] comic suggests, we need to just focus on rewarding and punishing specific behavior, regardless of whether the agent is human or not

[1] https://xkcd.com/810/

shortrounddev2

Jokes aside, we don't want any bots at all. Even if they're posting constructive comments, we should interact with humans, not machines

hsbauauvhabzb

That doesn’t mean that webcrawlers have no legitimate value (think: search indexers) or illegitimate value (think: intellectual property theft via data scraping for AI purposes), and bots which communicate while they have no place, aren’t going to go away.

Philpax

In the interest of provoking discussion: why?

If a bot can meaningfully pass and act as a productive member of the community, what does it matter?

echelon

I think a better approach is to make account creation frictionful (eg. charge money, set karma thresholds, require an invite, etc.), score each account, and ban or time out accounts when they break community rules.

But an even better approach would be to go fully P2P and leave the scoring and ranking and filtering at the end nodes, with the possibility of friendly networks of interest group peers assisting with the task. BitTorrent for social media, pgp signed accounts, fully flexible annotation and ingestion. It's also less subject to cabal-based censorship.

webstrand

PoW like hashcash (not a cryptocurrency thing) might be a better solution. Users could even delegate solving the PoW puzzles to a 3rd party for low power devices like phones. But it imposes a cost on spammers that's inescapable.

jeroenhd

That assumes spammers are using their own hardware to post. If they're using a botnet, they don't care about CPU cycles. Botnets would probably become even more profitable in that model.

cchance

I mean at some point ... the average visitor is dumber than the AI and your now just blocking dumb people

OmarShehata

yes, we're creating websites that are gated by IQ tests. This isn't the way

hsbauauvhabzb

I’d like to believe I have at least an average IQ and I can’t pass half the google captchas.

Whether or not a square is part of the motorbike when it’s either the rider or a few pixels of the wheel is subjective and fuzzy. Fuck google for not making these questions clear cut enough that answers aren’t disputable.

undefined

[deleted]

djbusby

*you're

makifoxgirl

This project also solves the 4chan captcha https://github.com/moffatman/chan

Daily Digest email

Get the top HN stories in your inbox every day.