These are very nearly the exact opposite of the tagging ideas/motives on del.icio.us, an early popular system with tagging. There were lots of people who made similar arguments at the time as well! I thought they were wrong then and nothing has really appeared along the lines of what you're describing to convince me otherwise but it's probably worth taking more whacks at.
Ah, interesting! I never used del.icio.us myself, but from what I understand, it's fairly similar to Instagram (for example) in how tags work, optimizing for ease of use rather than ease of finding specific content. In my opinion, this is almost certainly the right decision for any platform where "absurdly detailed search" is not job #1, and I'm pretty sure I would have argued the same way as you did.
That said, having seen some of the centralized, intricate tagging systems out there, that let you filter down from Earth to one specific ant in the blink of an eye, that's what I think of when I think of "tagging" that's really effective. YMMV, but I would argue that if you can't type in 10 different tags and get 1 result that's exactly what you're looking for, tags aren't really delivering on their promise.
no. tags were a separate field, and you were shown the tags you used as prominently as the things themselves.
note that it was built as a memory aid so that you had a chance of finding something again later. your idea that it needs be exact, perfect, and precise or it won’t work is silly.
We did suggest tags as the intersection of "tags used by others for this url" and "tags you have used in the past" to increase cohesiveness.
Hmm yeah, kinda, or at least I remember it slightly differently - that the sort of 'winning point' for it was that it was useful for individual users because it helps you pick out tags you already have or add a new tag you didn't have, plus it tells you something about the url. The purposes overlap, of course and it's been a while.
The thing above is much closer to what some of the librarians were really into.
image boards like danbooru and similar websites are an example of the things mentioned by the parent comment. and from my personal experience they are the best implementation of tags I've seen on the internet. they are not perfect and still have lot of room to improve on but they are way better than what's used and available elsewhere.
they have their own description, their get moderated, other people can add tags, they can report them, you can alias tags, see related tags, and get feedback on them.
disclaimer: most of these image board are NSFW.
> Usually, all of these conditions together are only found in highly niche and specialized forums that care a lot about the quality of their content.
Ooh do any of these still exist? If you know of any I'd love the links to look at how they're doing.
I was an inveterate tagger, debating taxonomies and ontologies late into the night (I have now forgotten the difference between the two!) and tried to run a curated forum. Eventually I gave up for most of the reasons you highlight - but mainly because I realised no one was as OCD about classification as I was.
In another life I would have run and catalogued a university library.
Stack Overflow exhibits (or exhibited) all the points that parent mentioned. If you look at [the discussion of tags on the Meta site], and especially what's called ["burnination"] you'll see these issues being hashed out over time.
To sustain a tagging system like that it takes dedicated and invested individuals, and the corollary of that is that such people tend to generate a lot of discussion.
The social cataloging site Rate Your Music has a very in-depth genre tagging system. For each album and track, users debate and vote on which primary genres and secondary genres apply. For example Radiohead's OK Computer has Alternative Rock and Art Rock primary genres and a highly controversial Space Rock Revival secondary.
Each genre has a lineage of parent genres so each release tagged with a genre must also be a part of each parent genre. For example: Electronic > Electronic Dance Music > House > Tribal House. Also: Rock > Metal > Thrash Metal > Technical Thrash Metal.
There's a queue for submitting proposals for new genres and modifying the definitions of existing ones. There's also a complex chart system for filtering releases by genres, types, and descriptors. I think I last heard there were ~1300 music genres on the site.
Some good examples have been posted prior to my reply here --- I'll reiterate Archive of our Own (fanfiction) and Danbooru (anime porn) as two fairly big sites with well-maintained tagging systems.
Both sites have abundant guides and documentation about their systems and it's very interesting to see how they manage the real-world complexity of their domains.
Here are some good entry points if you're interested:
Archive of Our Own:
Danbooru: (linked pages are text-only, but individual tag pages, as well as the rest of the site, can be highly NSFW)
Two impressive site-wide systems I've seen are the categories of Wikimedia Commons (multimedia) and tags of Archive of Our Own (fanfiction). The Commons guideline elucidates its system and interesting ontological theory well. It's scope is extremely broad, aiming to simultaneously include any possibly useful categorization scheme, and overall is a fairly freeform (ideally) directed acyclic graph. Variations are handled with redirects and disambiguation pages in a typical wiki manner, with the limitation that individual category uses must have the canonical name. Ao3, in contrast, has a schema of sorts, and synonyms are made equivalent during resolution (its tags FAQ is also an interesting read).
I tried to write a more thorough comment but also struggled with being coherent. Thus, some ideas, only briefly:
- At an even higher level, the web itself and the overlapping userbases/communities ('intersectionality', without the discrimination--the original set-theory kind?) of individual sites can also be considered a way of organizing content
- Thus, analogously: Search engines replaced directories and webrings as algorithms did tags. The present SEO meta, though...
- Generalizing from Commons, all Wikimedia wikis (Wikipedia, etc.) have parallel category structures, only less developed due to the greater reliance on links. So do most wikis in general, though Wikimedia also unifies categorization and structured data with Wikidata. From there are knowledge graphs and databases in general, wrapping back around to Google trying to determine the Knowledge Graph item that each query refers to.
 all the typical keying on depicted people, things, times, and places, plus the ways that we categorize those. Niches from 'horizontal bicolor blue and white flags' to 'Luxembourgish pronunciation by gender', 'trams on route 709', 'ships with 6 funnels'. There's a tool (now called vCat) to visualize categories, some outputs here: https://commons.wikimedia.org/wiki/Category:Wikimedia_catgra...
Edit: specific examples
> tags of Archive of Our Own (fanfiction)
On a similar note, Danbooru-style image boards often have highly developed tagging systems, ranging from tags for specific characters or artists to tags for art styles, poses, or even specific features which happen to appear in the artwork (like "hat bow" or "blue eyes").
Just for fun, here are your examples applied to Commons (and a conjecture that tag systems naturally converge as they become more fine-grained):
There's also a tool to intersect or subtract categories hidden in the dropdown of the 'Good pictures' button at the top right.
 (NSFW-ish) https://en.wikipedia.org/wiki/Seedfeeder
> I tried to write a more thorough comment but also struggled with being coherent.
I guess it's always about neighbourhoods. In your street, in your pew, in your bookshelf, inside your brain, in your zettelkasten.
I just used synonyms and a tag hierarchy ( nested sets).
Works pretty well.
I ended up building out a hierarchy as well. But figuring out the structure of that hierarchy was not trivial at all. How does the name of a repeatable class (Algebra 1) fit with the name of a specific class (Algebra 1 Fall 2020 Section 2)? How does that relate to an area of math like algebra, geometry, number theory? How does that relate to things like context (ie problems about Minecraft, Lego, Physics, etc.)
I developed a closed system of tags, and then gave people the ability to define aliases.
closed system and aliases are good. Taxonomies, not so much. See https://oc.ac.ge/file.php/16/_1_Shirky_2005_Ontology_is_Over... and http://www.dlib.org/dlib/january06/guy/01guy.html
Seems like this would be easily countered by weighting each tag by something like log(n tags) or something.
Basically have just a few tags? They count a lot.
Have a bazillion? They count next to zero
Users are not given enough power to penalize bad actors. What do you expect?
I suspect if tags were implemented properly, companies would make less money.
> You can see this issue in play easily on soundcloud where people will tag their music with whatever tags they think will get their tracks played.
When a Lo-Fi Chillwave stream also includes Grindcore Death Metal tracks, it’s an especially annoying taxonomy misapplication.
Most tag spam is less obvious to people, but still makes for dirty data.
See the list of issues Cory Doctorow identified back in 2000:
(Posted by @andyback below.)
Hey, and thanks for bringing the concept of tagging to the world!
The idea that tags were meant for the individual makes a lot of sense. That’s how I used delicious. It was like my bookmarks folder, but links could be in multiple folders.
When it comes to collaborative tagging, are there any successful examples that you’ve seen? Or sites you feel are using tags in interesting or surprisingly useful ways?
Oh, I miss del.icio.us - UI / execution / social discovery, all those fantastic finds... :)
I love tagging a lot, but the problem with tagging is that only a handful of software/services use look-up for tag1 AND tag2 (AND tag3...) filtering. It's such a simple concept to filter all used tags based on already selected if I'm making a query using tags. I can not understand how people don't get that without this tagging is more-or-less useless.
Few months ago I discovered Bibsonomy [0,1], which is open-source, written in Java, but far beyond my abilities to deploy it.
I've been using Obsidian for almost two years for my notes exclusively and it's a life changer, but devs do not seem to be interested in implementing this simple filtering mechanism in Tags Pane when working with tags , which defies the purpose of using tags extensively (like Del.icio.us allowed).
Thank you for making the concept of tags! Even though it's not perfect, I'm sure you've probably saved anywhere between thousands to perhaps millions of man hours globally (if not more)! That is an achievement very few people can claim.
>Nobody really built collaborative tagging
Why don't you build it now, or at least finance somebody who does? Is it too risky to build because the big social networks will copy it and thus destroy any successful exit?
I was on a call with Science to potentially take over delicious but the momentum was super low back then and IIRC 99% of it was porn. Should have taken that poison pill in hindsight.
Pinboard (https://pinboard.in/) is still a thing, and the developer bought the deli.icio.us domain too I think.
In any case, it has at least replicated the del.icio.us functionality, and then added more, such as archiving page contents. The tags are still there too, and it prompts with other users' tags when you add a bookmark. Oh, and an API, which is very useful for programmatic use of the data once saved.
Pinboard doesn’t have the social features that del.icio.us had—you can’t see the list of others who bookmarked a link, for example.
In my opinion, social sharing usually requires free accounts. If the account costs money, then the social sharing features are probably less important. Pinboard says it is for introverts.
If proper free accounts with all functions are supported, then advertising is a likely revenue model.
To combat this problem, I've added sharing over email, SMS and Slack to my project, which is a personal search engine and document manager. There's no reason to not allow a very basic and free account function to read these shared links, without a lot of auth and registration getting in the way. The service already has the guest's email address, phone number or Slack account from the other user. This can be used to send a token for easy login. Limits would be added so these accounts can only store a few URLs themselves unless they upgrade.
This keeps privacy protected, hopefully.
Yes, you can see people who bookmarked a link (if they haven't made it private).
I'm building a personal search engine/document management system that uses tags similar to how del.icio.us worked. URLs and screenshots can be saved via the browser, or by instructing the system to crawl it (which gets done with Firefox/webdriver). It's a like a split brained version of the Grub crawler. It also supports uploading PDFs and images.
Tags, objects, labels, synthesized commentary, etc. are provided by machine learning models and GPT3. Eventually the pipelines will be customizable, so running a plant identification model will be possible. Full text search and analytics is provided via a customized Solr deployment manager. I've built a unique UI for it based on my original cut of a simple timeseries interface at Loggly. Love using it, but have no idea if others will want to pay for it. I seriously hate ads, trackers and user privacy violations.
merry-zebra|> !crawl https://news.ycombinator.com/item?id=31459103 merry-zebra|> Please wait while I index https://news.ycombinator.com/item?id=31459103. merry-zebra|> Site has been indexed. An image of the site will be added in ~10 seconds. merry-zebra|> ... merry-zebra|> updated 2022-05-21T18:55:06Z merry-zebra|> ID UmXyyk3tZJdGZW4uv merry-zebra|> title What Happened to Tagging? (2019) | Hacker News merry-zebra|> description The article discusses the potential reasons why "tagging" (i.e. adding labels to content for organizational purposes) has declined in popularity in recent years, despite its usefulness. merry-zebra|> URL https://news.ycombinator.com/item?id=31459103 merry-zebra|> Tags #What, #Happened, #Tagging, #2019, #HackerNews, #News merry-zebra|> ... merry-zebra|> To search me for the document, click on one of the action links. system=> Do you have any comments about this webpage, @merry-zebra? merry-zebra|> I find tagging to be extremely useful for organizing content. I think the decline in popularity is likely due to the fact that it can be time consuming to tag everything, and people are often lazy. However, I think it is worth the effort to tag things, as it makes it much easier to find what you're looking for later on.
I've built something similar for my crawler at biztoc.com — It does OpenGraph extraction, finding the body, summarizing it, tag & entity extraction, sentiment analysis, Oembed, stock symbol detection, screenshot & favicon, etc.
Great job on this! It looks fantastic and I think you'll do well. I like how you moved to token use for logins. Passwords are dumb.
I thought about these types of features for Mitta.us (which is NOT done, but operational), but it was too much work. Glad I put it off, because you did a much better job.
I'm adding a !biztoc command to Mitta for search, but it would be cool to be able to add some post parameters like https://biztoc.com/post?title=foo&url=https://zombo.com to post as well.
What's the best way to reach you? I am building something with a lot of the same ideas and would love to talk shop. Trying to network with others in the collaborative search/organization/knowledge space.
My email is also in my bio.
Nice work. How do you crawl dynamic websites that barely use links, or those which have scraping countermeasures like Amazon?
Thanks! I use GPT3 to synthesize a title and description from the URL and also use it to generate a description if the site simply lacks one. I use webdriver running Firefox to image the site. Some DOM information can be pulled that typically isn't blocked, but it isn't implemented yet.
My argument for these companies to allow a "scraper" like mine, is that I'm adding their full URL and tags for the user, on the user's behalf. I'm not scraping URLs or doing breadth/depth crawls. I ask for a single page the user gives me, then take an image only that user can see, unless they chose to share it with someone over email or Slack.
When a site implements block "crawlers" from certain IP blocks, I've written an extension for Chrome/Firefox which allows the user to image the screen and upload it. This adds the site to the index just like if they asked the site to crawl it. I gave up on scrolling the window, however. 0.5 seconds per screen grab limit in Chrome now.
It also supports image uploads, so if the user wants to just use their own screenshotting method, they can just upload the image. Extraction of text and synthesis of titles and descriptions can be handled by GPT3 (as well as URL synthesis from keywords, command translation and Solr query synthesis).
I'm working on training a model to tell me whether or not it's an image, web page screenshot or a desktop shot.
Completely agree. Delicious was like the perfect bookmark manager. Then it went to complete shit and ever since then I’ve barely bookmarked anything.
Honestly though I don’t think bookmarks serve much of a purpose anymore. Like I’ll just search my history if I need something specific. Or maybe I‘ve just forgotten how useful they are.
Am I missing something because I use Firefox? I bookmark and tag every interesting site I come across. Is tagging not a thing in Chrome?
> A way to supply that functionality in the modern world would be a visited pages search feature on Google or Chrome
I've been wondering about a plugin that does that. Maybe built over this? https://lunrjs.com/
I am ABSOLUTELY CERTAIN this does not yet exist.
(Easier to type that last sentence than actually Google for it).
Why couldn't such a thing be a local browser extension or similar?
The concept of machine tags is the core premise of RDF. RDF is essentially the standardized way of describing relationships in a structured way (in XML). In fact, an early version of RSS was based on RDF (RSS 0.9 stood for "RDF site summary).
One of the downsides is that it's pretty hard for "average" folks to produce these feeds. There's a steep learning curve for modeling the relationships. Getting other sites to agree on a format, use it, and maintain it without breaking compatibility was hard.
I’m not sure I’m following, how is the machine tag format setting up automatic relationships and contexts? How is it helpful and how might you see it being effective today?
Edit: only saw the first link. Seems second link breaks it down but can’t review it yet. Got pulled away. Thanks!
Machine tags aren't tags at all. Not all metadata is tagging.
I think this is a great use case for some algorithm to help you combine tags ( by recognizing synonyms/plurals, text summarization, crowd-spurcing, something else?). Then it could keep you "on the rails" when tagging and periodically ask if you want to combine tags that seem similar.
Perhaps, but after even just a few relatively short attempts to start organizing some of my files with tags I don't think this would be sufficient. I found the meaning of tags frequently started to drift. What I cared about and why just wasn't that consistent. Never mind being consistent with hair-splitting judgement calls in categorization.
And the more you tag, the more difficult it is to fix. Either you retag everything to fit the new standard or you accept that trying to retrieve things by tag will return some weird set defined by the intersection of your changing definition over time and the time at which you applied the tag.
I don't doubt a more structured and principled approach would help, but I found it just ended up soaking up tons of time, and thought, without actually providing much back.
I’ve always thought that Gmail’s hierarchical tagging (‘folders’) is a great solution to the organization problem.
Same. I've making and remaking a bookmarking/notetaking site for my personal use over the years, and this is the solution I landed on. They look like and can be organized like folders, but you can quickly add items to multiple folders. I think it's working well for me so far.
That’s awesome. Just this past week I’ve started looking for a Chrome extension that does the same.
Doesn’t that always happen? Some people will push the tools so far that they lose all their initial usefulness.