Skip to content(if available)orjump to list(if available)

What Happened to Tagging? (2019)

What Happened to Tagging? (2019)


·May 21, 2022


Besides what the article goes into about auto-curation of social feeds reducing self-curation, the counterintuitive answer is that decentralized tagging requires strong centralization to work.

You need:

- agreement on what should be and what should not be tagged in a given domain

- standardized terminology (no multiple variants of tags)

- consistent grammar and formatting across all tags

- software support for tag editing that makes it easy to adhere to established tagging rules

- mechanisms to explain tagging rules to new users, at scale

- mechanisms to punish malicious/spam tagging (e.g. user history/reputation + bans)

Usually, all of these conditions together are only found in highly niche and specialized forums that care a lot about the quality of their content. While most large social platforms today do have some kind of tagging system (e.g. hash tags on Twitter/Instagram), the usefulness of these systems is generally limited due to the inherent difficulties of co-ordinating so many diverse users who have varying interests.


These are very nearly the exact opposite of the tagging ideas/motives on, an early popular system with tagging. There were lots of people who made similar arguments at the time as well! I thought they were wrong then and nothing has really appeared along the lines of what you're describing to convince me otherwise but it's probably worth taking more whacks at.


Ah, interesting! I never used myself, but from what I understand, it's fairly similar to Instagram (for example) in how tags work, optimizing for ease of use rather than ease of finding specific content. In my opinion, this is almost certainly the right decision for any platform where "absurdly detailed search" is not job #1, and I'm pretty sure I would have argued the same way as you did.

That said, having seen some of the centralized, intricate tagging systems out there, that let you filter down from Earth to one specific ant in the blink of an eye, that's what I think of when I think of "tagging" that's really effective. YMMV, but I would argue that if you can't type in 10 different tags and get 1 result that's exactly what you're looking for, tags aren't really delivering on their promise.


no. tags were a separate field, and you were shown the tags you used as prominently as the things themselves.

note that it was built as a memory aid so that you had a chance of finding something again later. your idea that it needs be exact, perfect, and precise or it won’t work is silly.


We did suggest tags as the intersection of "tags used by others for this url" and "tags you have used in the past" to increase cohesiveness.


Hmm yeah, kinda, or at least I remember it slightly differently - that the sort of 'winning point' for it was that it was useful for individual users because it helps you pick out tags you already have or add a new tag you didn't have, plus it tells you something about the url. The purposes overlap, of course and it's been a while.

The thing above is much closer to what some of the librarians were really into.


image boards like danbooru and similar websites are an example of the things mentioned by the parent comment. and from my personal experience they are the best implementation of tags I've seen on the internet. they are not perfect and still have lot of room to improve on but they are way better than what's used and available elsewhere.

they have their own description, their get moderated, other people can add tags, they can report them, you can alias tags, see related tags, and get feedback on them.

disclaimer: most of these image board are NSFW.


Commonly called a folksonomy.

These are easy to establish, but can be quite difficult to maintain and rationalise.


Commonly called a folksonomy.

Yeah, it's not, really. And thankfully.


> Usually, all of these conditions together are only found in highly niche and specialized forums that care a lot about the quality of their content.

Ooh do any of these still exist? If you know of any I'd love the links to look at how they're doing.

I was an inveterate tagger, debating taxonomies and ontologies late into the night (I have now forgotten the difference between the two!) and tried to run a curated forum. Eventually I gave up for most of the reasons you highlight - but mainly because I realised no one was as OCD about classification as I was.

In another life I would have run and catalogued a university library.


Stack Overflow exhibits (or exhibited) all the points that parent mentioned. If you look at [the discussion of tags on the Meta site][0], and especially what's called ["burnination"][1] you'll see these issues being hashed out over time.

To sustain a tagging system like that it takes dedicated and invested individuals, and the corollary of that is that such people tend to generate a lot of discussion.

[0]: [1]:


The social cataloging site Rate Your Music has a very in-depth genre tagging system. For each album and track, users debate and vote on which primary genres and secondary genres apply. For example Radiohead's OK Computer has Alternative Rock and Art Rock primary genres and a highly controversial Space Rock Revival secondary.

Each genre has a lineage of parent genres so each release tagged with a genre must also be a part of each parent genre. For example: Electronic > Electronic Dance Music > House > Tribal House. Also: Rock > Metal > Thrash Metal > Technical Thrash Metal.

There's a queue for submitting proposals for new genres and modifying the definitions of existing ones. There's also a complex chart system for filtering releases by genres, types, and descriptors. I think I last heard there were ~1300 music genres on the site.


Some good examples have been posted prior to my reply here --- I'll reiterate Archive of our Own (fanfiction) and Danbooru (anime porn) as two fairly big sites with well-maintained tagging systems.

Both sites have abundant guides and documentation about their systems and it's very interesting to see how they manage the real-world complexity of their domains.

Here are some good entry points if you're interested:

Archive of Our Own:

Danbooru: (linked pages are text-only, but individual tag pages, as well as the rest of the site, can be highly NSFW)


Building an effective tagging system can be much harder than people realize. I once worked on a tagging system for a collection of math problems. I thought I could code a simple tagging model, and let users tag their own math problems, and it would become much easier to find the problems you're most interested in.

Then I realized that tags like algebra 1, Algebra 1, Algebra I, Alg I, and all other variations should mean the same thing. So I started to develop a closed set of tags. That led to a fascinating rabbit hole about taxonomies that I don't even remember how to speak about clearly at this point.

That project is still a work in progress, and it's left me with immense respect for people who build well-structured systems that involve tagging.


Two impressive site-wide systems I've seen are the categories of Wikimedia Commons (multimedia) and tags of Archive of Our Own (fanfiction). The Commons guideline[0] elucidates its system and interesting ontological theory well. It's scope is extremely broad, aiming to simultaneously include any possibly useful categorization scheme,[1] and overall is a fairly freeform (ideally) directed acyclic graph. Variations are handled with redirects and disambiguation pages in a typical wiki manner, with the limitation that individual category uses must have the canonical name. Ao3, in contrast, has a schema of sorts, and synonyms are made equivalent during resolution (its tags FAQ[2] is also an interesting read).

I tried to write a more thorough comment but also struggled with being coherent. Thus, some ideas, only briefly:

- At an even higher level, the web itself and the overlapping userbases/communities ('intersectionality', without the discrimination--the original set-theory kind?) of individual sites can also be considered a way of organizing content

- Thus, analogously: Search engines replaced directories and webrings as algorithms did tags. The present SEO meta, though...

- Generalizing from Commons, all Wikimedia wikis (Wikipedia, etc.) have parallel category structures, only less developed due to the greater reliance on links. So do most wikis in general, though Wikimedia also unifies categorization and structured data with Wikidata. From there are knowledge graphs and databases in general, wrapping back around to Google trying to determine the Knowledge Graph item that each query refers to.


[1] all the typical keying on depicted people, things, times, and places, plus the ways that we categorize those. Niches from 'horizontal bicolor blue and white flags‎' to 'Luxembourgish pronunciation by gender‎', 'trams on route 709', 'ships with 6 funnels'. There's a tool (now called vCat) to visualize categories, some outputs here:


Edit: specific examples


> tags of Archive of Our Own (fanfiction)

On a similar note, Danbooru-style image boards often have highly developed tagging systems, ranging from tags for specific characters or artists to tags for art styles, poses, or even specific features which happen to appear in the artwork (like "hat bow" or "blue eyes").


Just for fun, here are your examples applied to Commons (and a conjecture that tag systems naturally converge as they become more fine-grained):


There's also a tool to intersect or subtract categories hidden in the dropdown of the 'Good pictures' button at the top right.

[0] (NSFW-ish)


> I tried to write a more thorough comment but also struggled with being coherent.

how fitting.

I guess it's always about neighbourhoods. In your street, in your pew, in your bookshelf, inside your brain, in your zettelkasten.


I just used synonyms and a tag hierarchy ( nested sets).

Works pretty well.


I ended up building out a hierarchy as well. But figuring out the structure of that hierarchy was not trivial at all. How does the name of a repeatable class (Algebra 1) fit with the name of a specific class (Algebra 1 Fall 2020 Section 2)? How does that relate to an area of math like algebra, geometry, number theory? How does that relate to things like context (ie problems about Minecraft, Lego, Physics, etc.)

I developed a closed system of tags, and then gave people the ability to define aliases.


Tagging doesn't work because there is an incentive to falsely tag to sell stuff. Tag sites with ads for cat products as "dog", "bird", "groceries", "boots", "cars" on the off chance that you'll get your ad for cat products in front of some random customer's eye balls.

It's exactly the same incentive as spam email. It takes zero effort, costs nothing, and if you get 1 hit in a million you still make a profit.

You can see this issue in play easily on soundcloud where people will tag their music with whatever tags they think will get their tracks played. You can also see it on all the porn sites where people re-upload porn with ads inserted or overlayed for their pay-for-porn site and then tag the porn with whatever they think will get click throughs.

you might claim with enough tags you'll be able to tell the accurate tags from the inaccurate tags but I've seen no evidence that that actually works. My guess is it's partly that the only people who have an incentive to tag are the content creators (or content re-uploaders) and they have no incentive to follow any rules.

Further, Agreeing on tags is nearly impossible. Consider "man" vs "woman" and all the political discussion around that. There's a conflict between those that want the tag for their identity, and those that want the tag to be useful for filtering. As much as I respect people's identities it's more useful for filtering if I can search for "brunette" and only see brunettes. And, if you find some other tag to use for the filtering then it's only a matter of time before someone demands they be identified with that new tag.


Seems like this would be easily countered by weighting each tag by something like log(n tags) or something.

Basically have just a few tags? They count a lot.

Have a bazillion? They count next to zero


Also let users report bad tags, vote up/down on tags, assign a vangaurd to protect specific tags, basically do anything about the problem.


What's to keep your competition from flagging your own valid tags?

Or you your competition's?


Users are not given enough power to penalize bad actors. What do you expect?

I suspect if tags were implemented properly, companies would make less money.


> You can see this issue in play easily on soundcloud where people will tag their music with whatever tags they think will get their tracks played.

When a Lo-Fi Chillwave stream also includes Grindcore Death Metal tracks, it’s an especially annoying taxonomy misapplication.

Most tag spam is less obvious to people, but still makes for dirty data.


See the list of issues Cory Doctorow identified back in 2000:

(Posted by @andyback below.)


My time to shine, I guess.

When I invented tagging it was as personal curation process, it was designed for people to recall things back to themselves later. It was an organizational schema. It has mostly disappeared.

What we have now is mostly tagging so OTHER people can find it. Which leads to all sorts of bad incentives.

Nobody really built collaborative tagging but it needs a bunch more support than hashtag-this and hashtag-that to really work. For example, we showed people tags that they have used before so they were gently pushed to reuse tags for more organizational cohesiveness.


Hey, and thanks for bringing the concept of tagging to the world!

The idea that tags were meant for the individual makes a lot of sense. That’s how I used delicious. It was like my bookmarks folder, but links could be in multiple folders.

When it comes to collaborative tagging, are there any successful examples that you’ve seen? Or sites you feel are using tags in interesting or surprisingly useful ways?


Oh, I miss - UI / execution / social discovery, all those fantastic finds... :)

I love tagging a lot, but the problem with tagging is that only a handful of software/services use look-up for tag1 AND tag2 (AND tag3...) filtering. It's such a simple concept to filter all used tags based on already selected if I'm making a query using tags. I can not understand how people don't get that without this tagging is more-or-less useless.

Few months ago I discovered Bibsonomy [0,1], which is open-source, written in Java, but far beyond my abilities to deploy it.

I've been using Obsidian for almost two years for my notes exclusively and it's a life changer, but devs do not seem to be interested in implementing this simple filtering mechanism in Tags Pane when working with tags [2], which defies the purpose of using tags extensively (like allowed).





Thank you for making the concept of tags! Even though it's not perfect, I'm sure you've probably saved anywhere between thousands to perhaps millions of man hours globally (if not more)! That is an achievement very few people can claim.


>Nobody really built collaborative tagging

Why don't you build it now, or at least finance somebody who does? Is it too risky to build because the big social networks will copy it and thus destroy any successful exit?


I was on a call with Science to potentially take over delicious but the momentum was super low back then and IIRC 99% of it was porn. Should have taken that poison pill in hindsight.


I loved It was a core part of my browsing experience. Any useful link I stumbled upon got tagged and saved. The popular links were very useful, too. Then it got sold to people who couldn't figure out how to make money off it without ruining it.

I still miss the functionality of being able to quickly find every interesting webpage I've ever seen (using tags). A way to supply that functionality in the modern world would be a visited pages search feature on Google or Chrome. Or a search feature for the content of pages I've bookmarked.


Pinboard ( is still a thing, and the developer bought the domain too I think.

In any case, it has at least replicated the functionality, and then added more, such as archiving page contents. The tags are still there too, and it prompts with other users' tags when you add a bookmark. Oh, and an API, which is very useful for programmatic use of the data once saved.


Pinboard doesn’t have the social features that had—you can’t see the list of others who bookmarked a link, for example.


In my opinion, social sharing usually requires free accounts. If the account costs money, then the social sharing features are probably less important. Pinboard says it is for introverts.

If proper free accounts with all functions are supported, then advertising is a likely revenue model.

To combat this problem, I've added sharing over email, SMS and Slack to my project, which is a personal search engine and document manager. There's no reason to not allow a very basic and free account function to read these shared links, without a lot of auth and registration getting in the way. The service already has the guest's email address, phone number or Slack account from the other user. This can be used to send a token for easy login. Limits would be added so these accounts can only store a few URLs themselves unless they upgrade.

This keeps privacy protected, hopefully.


Yes, you can see people who bookmarked a link (if they haven't made it private).


I'm building a personal search engine/document management system that uses tags similar to how worked. URLs and screenshots can be saved via the browser, or by instructing the system to crawl it (which gets done with Firefox/webdriver). It's a like a split brained version of the Grub crawler. It also supports uploading PDFs and images.

Tags, objects, labels, synthesized commentary, etc. are provided by machine learning models and GPT3. Eventually the pipelines will be customizable, so running a plant identification model will be possible. Full text search and analytics is provided via a customized Solr deployment manager. I've built a unique UI for it based on my original cut of a simple timeseries interface at Loggly. Love using it, but have no idea if others will want to pay for it. I seriously hate ads, trackers and user privacy violations.

  merry-zebra|> !crawl
  merry-zebra|> Please wait while I index
  merry-zebra|> Site has been indexed. An image of the site will be added in ~10 seconds.
  merry-zebra|> ...
  merry-zebra|> updated 2022-05-21T18:55:06Z
  merry-zebra|> ID UmXyyk3tZJdGZW4uv
  merry-zebra|> title What Happened to Tagging? (2019) | 
 Hacker News
  merry-zebra|> description The article discusses the potential reasons why "tagging" (i.e. adding labels to content for organizational purposes) has declined in popularity in recent years, despite its usefulness.
  merry-zebra|> URL
  merry-zebra|> Tags #What, #Happened, #Tagging, #2019, #HackerNews, #News
  merry-zebra|> ...
  merry-zebra|> To search me for the document, click on one of the action links.
  system=> Do you have any comments about this webpage, @merry-zebra?
  merry-zebra|> I find tagging to be extremely useful for organizing content. I think the decline in popularity is likely due to the fact that it can be time consuming to tag everything, and people are often lazy. However, I think it is worth the effort to tag things, as it makes it much easier to find what you're looking for later on.


I've built something similar for my crawler at — It does OpenGraph extraction, finding the body, summarizing it, tag & entity extraction, sentiment analysis, Oembed, stock symbol detection, screenshot & favicon, etc.


Great job on this! It looks fantastic and I think you'll do well. I like how you moved to token use for logins. Passwords are dumb.

I thought about these types of features for (which is NOT done, but operational), but it was too much work. Glad I put it off, because you did a much better job.

I'm adding a !biztoc command to Mitta for search, but it would be cool to be able to add some post parameters like to post as well.


What's the best way to reach you? I am building something with a lot of the same ideas and would love to talk shop. Trying to network with others in the collaborative search/organization/knowledge space.

My email is also in my bio.


Nice work. How do you crawl dynamic websites that barely use links, or those which have scraping countermeasures like Amazon?


Thanks! I use GPT3 to synthesize a title and description from the URL and also use it to generate a description if the site simply lacks one. I use webdriver running Firefox to image the site. Some DOM information can be pulled that typically isn't blocked, but it isn't implemented yet.

My argument for these companies to allow a "scraper" like mine, is that I'm adding their full URL and tags for the user, on the user's behalf. I'm not scraping URLs or doing breadth/depth crawls. I ask for a single page the user gives me, then take an image only that user can see, unless they chose to share it with someone over email or Slack.

When a site implements block "crawlers" from certain IP blocks, I've written an extension for Chrome/Firefox which allows the user to image the screen and upload it. This adds the site to the index just like if they asked the site to crawl it. I gave up on scrolling the window, however. 0.5 seconds per screen grab limit in Chrome now.

It also supports image uploads, so if the user wants to just use their own screenshotting method, they can just upload the image. Extraction of text and synthesis of titles and descriptions can be handled by GPT3 (as well as URL synthesis from keywords, command translation and Solr query synthesis).

I'm working on training a model to tell me whether or not it's an image, web page screenshot or a desktop shot.


Is this a project you intend only for yourself? Or is it going to be a product?


It's a hosted service that will be available as well as an on premise deployment for companies.


Completely agree. Delicious was like the perfect bookmark manager. Then it went to complete shit and ever since then I’ve barely bookmarked anything.

Honestly though I don’t think bookmarks serve much of a purpose anymore. Like I’ll just search my history if I need something specific. Or maybe I‘ve just forgotten how useful they are.


Am I missing something because I use Firefox? I bookmark and tag every interesting site I come across. Is tagging not a thing in Chrome?


> A way to supply that functionality in the modern world would be a visited pages search feature on Google or Chrome

I've been wondering about a plugin that does that. Maybe built over this?

I am ABSOLUTELY CERTAIN this does not yet exist.

(Easier to type that last sentence than actually Google for it).


Why couldn't such a thing be a local browser extension or similar?


For sometime there were also "machine tags", basically a triple tag invented (I think) at Flickr[0]. It was an interesting concept, you could automate relationships between different contexts, for example between Flickr and[1].

I used it for a while, then I always wondered why nothing similar has ever emerged, maybe because after the first wave of "social sharing" excitement of web 2.0, every walled garden has basically double locked their gates. And this is maybe what happened to tagging in general.




The concept of machine tags is the core premise of RDF. RDF is essentially the standardized way of describing relationships in a structured way (in XML). In fact, an early version of RSS was based on RDF (RSS 0.9 stood for "RDF site summary).

One of the downsides is that it's pretty hard for "average" folks to produce these feeds. There's a steep learning curve for modeling the relationships. Getting other sites to agree on a format, use it, and maintain it without breaking compatibility was hard.


I’m not sure I’m following, how is the machine tag format setting up automatic relationships and contexts? How is it helpful and how might you see it being effective today?

Edit: only saw the first link. Seems second link breaks it down but can’t review it yet. Got pulled away. Thanks!


Machine tags aren't tags at all. Not all metadata is tagging.


Right around the time the author was celebrating Tagsgiving, I was in Library School, and tagging was a hot topic around those parts. The consensus there was: "this is great and all, but there's a reason we have controlled vocabularies and classification systems. We'll see, we'll see."

I was all in on the possibilities for "folksonomies" and user tagging. However I have to admit that I have not seen many examples of where uncontrolled tagging was all that useful at scale.

To organize information, you need experts, with training, time, and a reason to get it right. Or, you can do it with an arbitrarily sophisticated, mostly theoretical ML system. But neither of these solutions benefit from having user tags.


Add enough tags and then you have a gawdawful mess and you need tags to organize your tags.


I think this is a great use case for some algorithm to help you combine tags ( by recognizing synonyms/plurals, text summarization, crowd-spurcing, something else?). Then it could keep you "on the rails" when tagging and periodically ask if you want to combine tags that seem similar.


Perhaps, but after even just a few relatively short attempts to start organizing some of my files with tags I don't think this would be sufficient. I found the meaning of tags frequently started to drift. What I cared about and why just wasn't that consistent. Never mind being consistent with hair-splitting judgement calls in categorization.

And the more you tag, the more difficult it is to fix. Either you retag everything to fit the new standard or you accept that trying to retrieve things by tag will return some weird set defined by the intersection of your changing definition over time and the time at which you applied the tag.

I don't doubt a more structured and principled approach would help, but I found it just ended up soaking up tons of time, and thought, without actually providing much back.


I’ve always thought that Gmail’s hierarchical tagging (‘folders’) is a great solution to the organization problem.


Same. I've making and remaking a bookmarking/notetaking site for my personal use over the years, and this is the solution I landed on. They look like and can be organized like folders, but you can quickly add items to multiple folders. I think it's working well for me so far.


That’s awesome. Just this past week I’ve started looking for a Chrome extension that does the same.


Doesn’t that always happen? Some people will push the tools so far that they lose all their initial usefulness.


I’m not sure tags died, TikTok certainly seems to be built around tags and it has over a billion monthly users. They are also key to Instagram discovery but feel a little less important there, though I don’t care much for that platform and could be wrong.


Curated tags, including canonizing one variation of a tag and making all the others with the same meaning synonyms:

Of course, where a single word has two or more meanings, synonyms don't make sense, so go with Wikipedia-style disambiguation.

Also be aware if your community has specialized jargon, uses multiple human languages, patois, creole, dialects, or pidgin.

Allow multi-word tags, but settle on a single casing/separation and enforce it: camelCase, snake_case, and kebab-case are some choices.

Prefer plurals "landscapes" over singulars "landscape".

See also


Tagging requires mental energy and some level of abstraction prowess - and might still be misleading. Social media is geared towards making the user expend as little mental energy as possible - and then organize the information they provide anyway for the advertisers using behavioral patterns or some variant of AI. This is probably considered by the industry to provide more reliable information - its like the difference between asking people to explain ethical behavior compared to recording what people actually do in reality.

So, we have a "tagging" model driven by advertising needs, that discourages our own need to tag (intellectually categorize) the content we consume. Instead of moving forwards, towards a more accurate tagging system that supports reflection and concept organization, it seems to me (in my pessimistic moments) that we are moving backwards into an online world where the only ones that know what we are doing are the machines.


Tagging never died.

“Tags” became known as “Labels”.

Labels are core functionality of Gmail, GitHub Issues, and more today.