Get the top HN stories in your inbox every day.
alkonaut
Pseudonymity isn’t anonymity. The problem of course is if you first had a non-anonymous account (e.g with your real name) and then for some reason switched to a pseudonym you thought wasn’t linked to your real identity.
The bottom line I guess is: if you did that, your choice is basically to never post at all or to create an account where you practice some kind of opsec to try to disconnect it from your previous account (e.g run it through a service that rewrites the text).
c22
I always assumed that some actors had this capability and acted accordingly. The existence of a publicly available tool merely democratizes this power. Anyone seeking complete and perpetual anonymity should have been practicing better opsec from the beginning.
bheadmaster
What opsec measures could one take against language analysis?
Everyone has a particular style of writing, and the only way of hiding it I can think of is not writing anything at all.
c22
* Make a small number of posts per account
* Run your posts through machine translation services
* Pay someone on fiverr to write a synopsis of your content, then post that
* Intentionally vary your style, build rich back stories and identities for your nyms, including telltale mannerisms.paganel
> What opsec measures could one take against language analysis?
Be aware that it's almost impossible to defeat it, and act accordingly. More exactly, if you think that a tech-capable state actor is coming after you then you should be outside the reach of where said state actor holds its monopoly of power. Real text-anonymity is gone for good.
hinata08
use the same words and logics as everyone / some mainstream show like SNL ? Or force yourself to use completely different words every time ? Be open minded ? Comment under different posts to hide your interests ?
Use short sentences ? Use newspeak ? Copy the features of other users ?
I also talk to ppl from various backgrounds. I could prove you why you need to vote for any candidate, thanks to the logics of coworkers. I used to copy the style of MS documentations to refactor code, so that juniors could get started right away. Before that, I used to learn English. My patterns and word usage where just a medley of recently viewed content.
I'm sure it can be achieved.
You just need to read enough, or be exposed to so much content than you can choose your patterns.
Oxidation
> Use newspeak
Would only work if enough people did it. Otherwise all your accounts will stand out as being "the newspeak person"
theCrowing
GPT3?
adriand
This argument, which the creator of the tool also uses, strikes me as essentially, “I’m going to harm you to show you that there is potential for you to be harmed”.
To make a crude analogy, what if I punched you in the face to demonstrate that you should consider learning kung fu? Or perhaps to indicate that, if a corporation wanted to, they could hire people to assault you and there would be little you could do about it? At the end of the day you’ve got a busted nose either way. Are you really better off?
icegreentea2
Your analogy makes a fundamentally different assumption - which is that you weren't already being physically assaulted or that your risk of being physically assaulted was truly negligible.
OP and creator assume that attempts at creating privacy online are always under active attack. The threat environment already exists. Publication of this tool does not meaningfully change the threat environment.
Some more analogous situations might be:
* Publicizing that police do not quickly respond to 911 calls for assault or break and entry in certain neighborhoods * Selling lockpicks and publicizing how to pick standard tumbler locks * Publicizing which bike locks can be cut with hand tools
The real fundamental disagreement is on the assumption of if your privacy is already under attack by competent actors.
tptacek
People have been doing this kind of analysis for decades; all that happened today was that you got a demonstration of it, using a particularly simple approach.
tjrowalway
HN should more to support privacy. Allow anonymous posts instead of forcing users into the pattern of throwaways that are linkable. Karma right now also makes it hard to use a throwaway per comment. getting downvoted on a first post might lock your account for some hours, so you are forced to build up karma with lots of posts.
+ give users a way to delete their profile once posted. If not whole comment history, at least remove their username from the db.
c22
If you're only using the account for one comment why do you care if it gets locked?
eddsh1994
* It only works on accounts with 10k characters, alts tend to be for throwaway comments like your account
* Deniability - dang has matches that are close but probably not him, you can just say it isn't you
dang
None of the matches I saw for dang were me - but I think in this case the matcher was mostly picking up accounts that post a lot of links to past HN threads.
f38zf5vdt
I finetuned T5, a text-to-text transformer, with a loss function based on the cosine similarity of any given text with statements made by individual HN users. It's like Dreambooth but for text. I have been posting under 40 different alt accounts that approximate the top users of this site textually and match them within an 80th percentile. AMA.
costco
Are you serious? That is way cooler than the alt detector site. Depending on how much extra work you're willing to do I think TAILS is actually looking for a program to do this because they had some problems with Anonymouth.
caprock
...a loss function based on the cosine similarity of any given text with statements made by individual HN users.
A. Could you elaborate on the mechanics and particularly the loss function calculation? For example, did you take the top N users by karma, volume, something else? Does the loss get calculated as an average sim against each of the top N, or do you mean pool the top N first, or something else? Or is it one model per user?B. How do you manage the posting and accounts? Just manually?
jimrandomh
Wait, you're posting AI-generated spam under 40 different accounts? That sounds very unethical.
f38zf5vdt
It's a text to text model. You train it on a sentence from the user phrased differently against their real sentence. Then you can write what you want, and the transformer alters the style to match another identity.
jimrandomh
Oh, I see; you're using it as a writing aid (to change your style) but the content you're posting is basically written by you. I have no problem with that, and that seems pretty cool!
asdff
Please link git repo
xg15
It might just as well happen that some accounts are falsely detected as alts even though they really aren't. If people place too much trust in that tool, that might also have consequences for the account holders with little the account holders can to to prevent that.
Btw, how is having an account system without any option to close or delete the account - and which might contain your real name as username - compatible with the GDPR?
TheHideout
Alternate take - if you submit content you think is interesting, you can now see people who submit similar interesting content and go check out their posts. I tried it and found some stuff I'm genuinely interested in.
cableshaft
I have no alts on this site (I just take the negative karma when I have something nice to say about crypto), but it still provided a list of 20 accounts, some with higher similarity scores than pg had (in the example).
Maybe I should be friends with these people then?
https://stylometry.net/user?username=cableshaft
EDIT: Did a quick cursory glance at the top five on my list, and despite pretty different subjects they talked about I can kind of see the stylistic similarities.
hirundo
My nearest neighbor is more distant than yours at .48. I suppose that makes my word choices more {weird,distinctive}. I'd like to see a list sorted by such a distance measure. Would voices with high distances have more distinct thought processes, or just a lesser grasp of the language? Would it be a measure of conceptual or cultural diversity?
I can't predict whether editing out highly diverse voices would increase or decrease signal to noise. Editing out low diversity voices would turn this into a stranger place.
godot
Hi potential alternate universe self!
I'm #20 on your list and you are #1 on my list. :)
nobody9999
>So for some people who assumed anonymity this could be anything from unimportant to awkward to a real problem.
If that's the case, then that was a bad assumption, IMHO.
I don't assume anonymity. I assume pseudonymity.
Then again, I don't use alt accounts either. And, interestingly, the closest "match" for me is 0.51 on the site you mentioned.
Which, I guess, is both good and bad. Good in the sense that I'm expressing myself as me. And bad in that if folks were to use this dataset to compare to other sites where I also post, my activities on multiple sites could be correlated. Which would probably annoy me, but not for the reasons you may think.
sys_64738
My closest match is 0.528 so I guess I don’t know what that means in terms of the weighting but it seems I’m more alike to others than you! I only have one HN account FWIW.
undefined
cinntaile
It seems awfully naive to assume that certain people on this forum wouldn't work with or utilize this kind of tech to try and identify alts. It's hard to tell if it's accurate or not without having knowledge of actual alts.
This isn't the first time someone posted site like this by the way, nothing bad came of it.
fragmede
If you're really paranoid/can realize the opsec ramifications, you'd note that submitting your username to a 3rd party site reveals a link of some sort between the submitted username and the IP the request came from.
greenpeas
How would it know which of the usernames that I searched are my account(s)?
rich_sasha
One consequence is me realising just how meta and stuck-up-its-own-derriere HN is. I knew it's bad but I wasn't expecting a meta response to a meta post.
Perhaps we should just iterate. What does this post reveal about HN? Discuss.
Get the top HN stories in your inbox every day.
This post https://news.ycombinator.com/item?id=33755016 shows that many HN alt accounts have been exposed, not through hacking but stylistic analysis.
HM famously does not permit deletion of previous accounts or comments.
So for some people who assumed anonymity this could be anything from unimportant to awkward to a real problem.
Presumably now people will routinely search for alt accounts of any HN commenter and bring what they find into the discussion.
It’s not a hack, but in many ways the implications are similar to those of a hack.
How do you feel about this?
@dang what do you think?