Brian Lovin
/
Hacker News
Daily Digest email

Get the top HN stories in your inbox every day.

ciroduran

I stopped being concerned about email harvesting years ago, I just simply leave the email on my website. Spam handling is okay enough, I guess.

But I like this review of techniques, even the simplest ones are very effective, that surprised me.

jrmg

I’ve had my email address in a `mailto:` link in plaintext on my then-web-site, now-blog, since the early 2000s, and spam is no real problem. There are a few spam messages in my spam mailbox per day.

Perhaps my provider’s just great at filtering spam - but I kind of doubt it’s better than the major players (for years I’ve used Zoho for email - and it’s ‘okay’ enough that it’s not worth switching).

undefined

[deleted]

GeoSys

I agree that email addresses get leaked eventually.

However, LLMs are quite good at generating spam and I think soon will evade most filters.

BorisMelnik

you know what's funny is that llms are also good at detecting spam as they are generating it. I've got an automation that scores incoming emails and it's getting better and better each day (also more expensive haha)

SV_BubbleTime

I can’t explain it well, but I think there is an asymmetric issue here… that the ability for an LLM to write a plausible email, and the ability for an LLM to detect that it’s spam are mismatched.

If an LLM and make a plausible email, the best another LLM can do is to rank it as plausible. Blackbox creation and detection have to be on the same level.

Perhaps if you said the detection LLM had all your context and websearch. That it could know that a Penny Pollytree at Coco Co isn’t a real person, but… that just seems like burning a ton of coal to detect fraud where the creation LLM was able to easily come up with the fictitious spam cheaply.

The real story here is this will go beyond email verification. That every system we have is going to need to up its security. Paper birth certificates and social security cards and email addresses and all manner of identity is going to need new systems of auth. The challenge will be to prevent authoritarian centralization.

Gigachad

I doubt it. Most of the signals spam filters use these days are reputation based. You have to build up your domain and IP reputation for a long time first.

embedding-shape

> You have to build up your domain and IP reputation for a long time first.

Or buy/rent domains/IPs that have good reputations, as there are services that specializes in just bringing up the reputation for stuff so they can sell it once "good". Same exists for user accounts for various platforms like reddit and so on.

Cthulhu_

And so the arms race continues.

unilynx

> But I like this review of techniques, even the simplest ones are very effective, that surprised me.

because harvesters don't care until one technique gets massive use. if you come up with a unique but simple enough scheme for your sites and keep a few dozen email addresses out of their reach.. they've still gathered a million addresses. it's not really worth their effort to get the last 0.0001% of extra email addresses

so it's best to just not advertise your solution and make sure it doesn't get n any outside traction - if it gets popular the harvesters will defeat it

jmaw

The author of the article mentioned that they are using it as a honeypot to detect when bots (or rather authors of the bots) implement a work-around for the obfuscation technique. Which is pretty smart!

e40

I’m up to more than 1,500 spam emails a month, with my email on the corp website.

janderson215

Is it mostly people trying to give you their mixtapes?

kevincox

I've also been like this. But if as the article suggests trivial options like HTML entities or elements with display:none will keep my email out of >90% of harvesters I'm reconsidering as they seem to have no downside other than an extra couple of bytes on the wire.

Yaggo

Same here, the address will eventually leak some way anyway.

I never got SpamAssassin working very well, but since moving my email hosting to Apple (from my own server), spam has not been a problem.

blitzar

I swear my apple hosted mail spam filter works in reverse. The inbox is full of spam and the legitimate messages (including apple billing notifications) in the spam folder.

kqr

I have a hypothesis email scrapers don't parse HTML at all. I suspect they search the raw bytestring for @ characters and take whatever's on either side of it. That probably gets them as many addresses as they can realistically use at a fraction of the cost, given how expensive HTML parsing can be.

(Similarly, I'm sure most links can be found by searching the bytestring for "href" and taking what's to the right of it.)

This would explain why HTML entities are so effective.

On the other hand, surely the TLS handshake is far more expensive than HTML parsing? Maybe it's to avoid parser failure modes that consume a lot of resources?

Someone

> This would explain why HTML entities are so effective.

Could also be that they learned that sending spam to obfuscated addresses doesn’t gets much response. Such messages might get filtered out more and/or addressees might be less inclined to reply to it.

BorisMelnik

it really varies, you are correct most modern ones search the byte string for @ characters but there are probably hundreds of different methods out there in black hat marketing circles to scrape emails.

mcmcmc

Haven’t heard “black hat marketing” before but that’s very fitting for a lot of the “growth hackers” out there

curiousObject

I believe you’re right. But sometimes, you really have to think about how mad your adversary is.

A dog will keep biting long after that is a disastrous plan.

j45

Token based extraction around the @ is definitely one way that can work with a few tweaks.

nabbed

It's odd. My email address is included un-obfuscated in ~90 commits to a popular open source repo on github. I also use this same email address for a mailing list associated with this OSS project. As far as I can tell, I've never received a single spam email in the 8 years I've had this email account.

When I view a commit on the github UI using view source, I can see the commit author's email address just as text with no special handling. It's bracketed by "<" and ">", so maybe that's enough to confuse harvesters.

I just looked at the spam folder of one my personal accounts (where I sign up for services), and it has got tons of stuff, most recently 2 or 3 with the subject "YOU PERVERT! I RECORDED YOU!".

It seems spammers are doing less harvesting and more purchasing of email lists from service vendors.

Macha

I have a wildcard address at my domain. The most common email addresses for spam are:

- git@mydomain.com

Presumably harvested from GitHub or gitlab

- contact@mydomain.com / admin@mydomain.com

Not actually an email address ever used, presumably people just guessing these exist from convention.

- <first name>@mydomain.com

I mean, if you know my name you can probably guess this but also this has been my primary email address for outbound email and so has ended up in marketing lists etc.

- ap@mydomain.com, finance@mydomain.com

This is a very recent trend but I've been getting emails to made up addresses like these ones quoting forged emails from myself (with various titles like CEO or CFO attached) claiming to authorize payments to other parties, usually backdated, and then asking that I process their invoice ASAP because look how long ago the CEO said it should be paid. I guess my website has ended up in some list of businesses despite being a personal site.

Ironically, the address that was in plain text in my HN profile for like 15 years gets very minimal spam.

bit1993

Good stuff, but I think the title should be Email address obfuscation. Thank you for sharing I guess, but spammers can now learn from this too (:

layer8

Yes, people using “email” for “email address” in contexts where it could also mean “email message”, which “email” more frequently means, is really annoying.

ghywertelling

https://www.gregegan.net/

Contact details: [any mailbox] [at] [the domain name of this web site]. Please don’t ask me to give interviews, sign books, appear on podcasts, attend conferences or conventions, or provide feedback or endorsements for works of fiction, scientific theories, or slabs of text disgorged by chatbots.

I have no idea how to decipher this obfuscation.

0x3f

What's difficult about it? You know the domain, gregegan.net. You know the @ symbol, presumably. Then put literally any valid text before the @.

0xEF

Completely unrelated to the conversation, but our user names are remarkably similar.

ghywertelling

Is that even possible? Shouldn't the recipient email id need to be created first to be addressable?

ProllyInfamous

Really surprised this [very well-written] article didn't suggest the fantastic technique of owning an entire domain (although author's own examples obviously include unique handles@ for each tested practice).

Then you can hand each recipient an absolutely unique email which isn't just ole "name.morewords@" period trick — block those which receive SPAM.

----

OR: the even "easier" lifestyle of just not using email (like me). Obviously this is difficult for modern living, but that's what temp email is best for [i.e. circumventing ubiquitous `REQUIRED` email address fields].

nzealand

I've been doing that for two decades. Most of the spam comes directly to my primary gmail. Because I shared that with friends and family. And at least some of my friends and family shared their entire contact list with the wrong app at least once.

This article however is talking about publishing your email address on a public website. It matches my experience, that simple javascript concatenation stops 100% of spam. Not that I would or ever did trust my primary email address to that.

ProllyInfamous

This is your configuration error (likely just using a simple catch-all)?

When configured correctly each family member can reach you at a custom handle@, even seeing this custom reply address in response emails from you.

----

But yes, you're correct about the purpose of OP's article (website obfuscation). The topic-overlap is so close that it's still worth mentioning, IMHO.

BeetleB

Years ago, I considered your approach. Programmatically create a custom email address for each person I wanted to talk to.

Then I hit upon a simpler solution. Have one email address. Happily share publicly. And whitelist the sender's email addresses. Emails not in the whitelist go into a quarantine folder that I glance at once in a while.

It's almost equivalent in efficacy, but much simpler to implement.

ProllyInfamous

I don't have a phone ringer anymore, but when I did whitelist-only is how I screened incoming calls. Your method for email sorting has the advantage of being reviewable (verse entirely blocking specific handles@) — and much easier to implement/maintain.

vlucas

I recently noticed an uptick in cold emails and spam after publishing my new website. After a few weeks, I asked Claude/Cursor to obfuscate the email for spam protection in the mailto: link, and thy both used JavaScript with data attributes.

Something like:

``` <a href="#" class="js-mailto ${className}" data-email-user="${local}" data-email-host="${host}" data-email-subject="${sub}" > ${children} </a> ```

And then some light vanilla JS to stitch it together. Works in the browser, and spam has dropped off a cliff since.

Croak

One trick is having an tarpit email adress on your website. It is hidden using CSS so no real visitor sees it but it is visible in source. If your mail server recieves mail for that adress you can just block that IP for 24h.

MichaelApproved

This sounds like bad advice and would result in blocking google and other major ESPs.

I occasionally get spam from people who took the time to create gmail accounts. Based on this advice, the honey pot email address would get spam from a Gmail account and your script would block Gmail servers.

Croak

There exist lists of email providers. Those you can whitelist, ie. they can't get on the blacklist. Even then they would only be blocked temporarily. There also exists postmaster@domain.com which should not filter at all. I am aware that you are able to abuse said system but if you monitor logs those issues would only be temporary.

notpushkin

Yeah, I mean, you can personally vet those domains/IPs?

badsectoracula

Some time ago i was wondering if the common "me at foobar dot com" you still see a lot of people do actually helps at all, especially now with LLMs, so i searched for some common "obfuscation" techniques and found this site (not the 2026 update, but the previous - it was a few months ago). Then i wrote a simple LLM query with a bunch of examples from the site[0] (the tool is just a frontend for a commandline program that uses llama.cpp and Mistral Small 3.1 in Q4_K_M quantization since it loads relatively fast and is fine for simple prompts). AFAICT it could reveal anything that wasn't relying on CSS tricks or JavaScript.

Like others mentioned, though, personally i haven't bothered by email harvesting for years now since spam filters seem to do a decent job. I have my email posted in plaintext here (which i bet is harvested very often) and in various other places and the occasional spam i get is eclipsed from "spam" from services i've actually signed up for (coughlinkedincough).

[0] https://i.imgur.com/ytYkyQW.png

Terr_

IMO a better approach would be individualized addresses.

Imagine someone visiting your blog who wants to e-mail you can burn some CPU cycles to "earn" an address that hasn't been given out to anybody else, e.g. user+TOKEN@example.com, where it is algorithmically-unlikely for them to be able to guess a different TOKEN that will work. Then if abuse occurs, you can just retire that one address. (In a non-interactive context, like a paper ad, you could just generate one yourself.)

Naturally, this would be best with an e-mail client that is aware of the scheme, and with a mail-service that has some API for generating new addresses, such as if you want to cold e-mail somebody and use a new from/return address.

Some years ago I had the fanciful idea of doing it with a phone-app, where it manages creating new addresses as-needed, disabling them, and keeping notes about who you gave them to.

terrabitz

Sounds like a similar approach to this service: https://addy.io/

I use it all the time in conjunction with Bitwarden to generate unique emails per site. You can have notes in each email, and they show up in a small banner on in the forwarded email. And each one is individually disable-able, so you can easily cut it off if you see spam from it.

I was really interested in this space and made my own homegrown tool for this. I used it for a while until I discovered Addy and switched over. IIRC there are similar services by Mozilla, Apple, and Proton.

Macha

I would expect that a llm based scraper is going to be better at parsing an email address from your instructions than some of the more inattentive people who's emails you might want to receive. So I think some of the dumber mitigation measures that still block the simple regex bots from this topic are probably a better bet now.

xkbear89

The most underrated point here is that data breach lists have made web scraping almost irrelevant as a spam vector. If your email was in the Ticketmaster, LinkedIn, or Adobe breaches, it is already in every serious bulk mailing list regardless of how carefully you obfuscate it on your site. That said, obfuscation still makes sense for addresses that have never been in a breach -- particularly for new projects or personal sites where you have a clean slate. HTML entities plus a simple JS reassembly catches the vast majority of unsophisticated scrapers with basically zero maintenance overhead.

Anamon

Just my anecdata, but several of my addresses have been in two-digit numbers of leaks, and my spam load is relatively low, at least compared to some of the numbers I read in this thread. Per leaked address, I get maybe 5-10 a week. So to me it doesn't seem like leaks are a major source for spammers.

xkbear

[dead]

binaryturtle

When I wrote my own brainf*ck interpreter (in C) at the start of the year I was really struggling to find a use for the language. Eventually I had the idea to obfuscate emails on my websites with the language.

Basically each email gets written as a brainf*ck program and stored in a "data-" attribute. The html only includes a more primitively obfuscated statement "Must enable Javascript to see e-mail." by default which then gets replaced by another brainf*ck interpreter (in JS) with the output of the brainf*ck code. Since we only output ASCII we can reduce the size of the brainf*ck code by always adding 32 to each value it outputs. The Javascript is loaded from what seemingly looks like a 3rd party domain. There we filter basing on heuristics and check if the "referer" matches before sending out the actual interpreter code.

Of course all this would not help if a scraper properly runs things through Javascript too.

Recently I read you soon will be able to run DOOM via CSS, so certainly it should be possible to have a brainf*ck interpreter in CSS? That would be the next step… just to get rid of the Javascript, but then I'm okay with all the downsides of using Javascript just for the e-mail obfuscation.

Anyway… I also regularly (at least once a year) rotate those public contact addresses.

Borealid

How does this approach meaningfully differ from having javascript that XORs the email with a random sequence of bytes stored in that JS?

binaryturtle

It's more fun? :)

/edit

And you can combine both approaches: XOR'ing the code first for good measurements. :)

robotswantdata

How does that work if the scraper takes a screenshot to feed to a LLM or OCR?

yummypaint

That seems like a very expensive way to crawl the internet

robotswantdata

Scrape normally collect emails, if no email seen take screenshot and OCR OCR is cheap and REGEX is cheap

woctordho

It would be interesting to show bf code rather than the actual email on the webpage. A lot of OCR systems struggle with this kind of repeated symbols where the exact count is required.

dandersch

Very interesting. It seems for his own email the author has opted for a combination of the CSS display none technique and a XOR cipher:

  <span class="hidden email"><b>999a8f84898f98</b>aa<b>878b8386c4</b>999a8f84898f988785989e8f84998f84c4898587</span>

lenwood

I noticed that, too. Technically I think this is a version of JS conversion. Interesting that he doesn't specifically mention XOR in the article. He does suggest combining methods though. I suspect this is effective.

Bender

They left off html cgi form. Generate the email on the web page and the server sends the email after performing some basic sanity checks and anti-spam on the form and web server itself such as solving some CSS puzzle or winning a game of DOOM.

undefined

[deleted]
Daily Digest email

Get the top HN stories in your inbox every day.