When I say “alphabetical order”, I mean “alphabetical order”

sebastiano.tronto.net

Daily Digest email

Get the top HN stories in your inbox every day.

armchairhacker

I agree with Microsoft/Google/KDE's order. The author's situation is extremely rare, and the situation where someone wants "10" to be before "9" is far more common. Moreover, desktops don't label this sorting "alphabetical" (E: and it would really be "lexicographic"*), they label it "by name" (an informal criteria), so technically they're not lying.

> I miss the time when computers did what you told them to, instead of trying to read your mind.

You may be looking at that time through rose-tinted glasses. I don't like when computers lie to me either, but "mind-reading" is really helpful in ways we take for granted, like autosave. Desktops can have an option to sort files truly alphabetically, but the more common case should always be the default; that's the definition of "intuitive".

* https://news.ycombinator.com/item?id=45404022#45405279

Wowfunhappy

I will add that I'm plenty "smart" enough to understand that "10" comes before "9" in a strictly alphabetical sense, and I still want my file managers to sort "9" before "10".

I don't want to put leading zeroes before every all the single digit numbers in my file names. (And then potentially go come back later and add even more leading zeroes once the maximum number reaches three digits.)

---

I split all of my audiobooks into chapters. I use the format "Chapter 01.mp3" (or "Chapter 001.mp3" when there are > 99 chapters) because some (all?) MP3 players are too stupid to sort numbers properly and I want my audiobooks to work everywhere.

This works, but it looks kind of ugly and creates extra work—yes I have scripts to automate it, it's still an extra step—and it would be great if I could just trust that every device will understand numbers.

AdieuToLogic

> I don't want to put leading zeroes before every all the single digit numbers in my file names.

> ... it would be great if I could just trust that every device will understand numbers.

Strings are not numbers, even if some part of their content "looks like a number."

> I will add that I'm plenty "smart" enough to understand that "10" comes before "9" in a strictly alphabetical sense, and I still want my file managers to sort "9" before "10".

Problem is, this is your preference for a specific situation. Which may not be another person's preference in the same situation nor yours in a different situation.

So what are programs to do?

Display strings in a consistent, documented, manner. Which is lexicographical ordering in all cases lacking meta-data to indicate otherwise.

Wowfunhappy

> Display strings in a consistent, documented, manner.

IMO, "Treat any sequence of digits as a number for the purpose of sorting" is consistent. I'm not sure if it's documented—I've never needed to look up the documentation—but if it's not, the developers could certainly fix that.

> this is your preference for a specific situation.

Sure, but we generally make decisions based on which situations we think will be most common. I think having ten or more things (screenshots, audio samples, whatever) named "Thing 1" – "Thing 10" in a folder is extremely common. And if Thing 10 comes before 9, it's really annoying!

Let's say I have a directory of 32 numbered files. Under the author's preferred sorting method, they'll get displayed:

If I download a folder with files like this, I basically have to pause whatever I'm doing and edit the files to have leading zeroes before I can make sense of what I'm looking at.

throw10920

> Strings are not numbers, even if some part of their content "looks like a number."

Irrelevant and intentionally obtuse. Filenames can't be anything but strings - there's literally no way to mark part of a filename as "this is an integer", so the idea that "strings are not numbers" is ridiculous because the only way to encode numbers (which people constantly want to encode) is as part of a string - which means that parts of filenames are numbers, because that's exactly how people use them.

> Problem is, this is your preference for a specific situation. Which may not be another person's preference in the same situation nor yours in a different situation.

> So what are programs to do?

> Display strings in a consistent, documented, manner. Which is lexicographical ordering in all cases lacking meta-data to indicate otherwise.

These do not follow from each other.

First, the assertion that "peoples' preferences are different, so we shouldn't pick an overwhelmingly common preference" is laughably false. The vast majority of computer users (which happen to not be people on HN) prefer "sort numbers by number rather than by UTF-8 value", so that's simply the correct way to sort.

Second, even regardless of the above, there's nothing preventing a "by name" sorting from being consistent and documented.

Either way, this line of reasoning is just wrong.

Scarblac

> I will add that I'm plenty "smart" enough to understand that "10" comes before "9" in a strictly alphabetical sense

Strictly speaking 9, 1 and 0 are not in the alphabet so can't be sorted alphabetically.

And I think most "normal users" wouldn't expect that programmers generalize the alphabet like we do.

tripzilch

> This works, but it looks kind of ugly

Maybe I'm weird but I prefer the way zero padding looks :)

I personally think the misalignment of lines where the numbers have different lengths looks (a lot) uglier than having zero padding. Sometimes it even throws _me_ off because the numbers have different lengths and ... well it just doesn't look sorted to me! :)

So the bonus of zero padding is that it'll be sorted correctly even if the file manager tries to be "smart" and sort incorrectly.

marcosdumay

Well, that's not alphabetical order.

It's great if DEs build this and give it a name. It's even better if they have a different one that deals with SI prefixes too. But it's not good if "alphabetical order" means that.

pseudalopex

What desktop environment called this alphabetical?

pixl97

I mean, nine does come before ten in alphabetical order.

sandreas

> I will add that I'm plenty "smart" enough to understand that "10" comes before "9" in a strictly alphabetical sense, and I still want my file managers to sort "9" before "10".

Amen.

> I split all of my audiobooks into chapters. I use the format "Chapter 01.mp3" (or "Chapter 001.mp3" when there are > 99 chapters) because some (all?) MP3 players are too stupid to sort numbers properly and I want my audiobooks to work everywhere.

Well, some car and kitchen radio manufacturers will probably never get this right. In my car (which tends not to be brand new) they even messed up UTF-8 chars, which gets me laughing every time a track has them. It's become a running gag with my wife, "Oh, listen up, it's &%=?! again".

> (all?)

Well, I kind of hate to say this, but Apple got this right with the iPods. They even regarded the metadata fields `sort-*` (e.g. sort-album), movement-name (for series) and movement-index (for part). With these fields they really group and sort my audio books as I expect it to be.

I even wrote my own software to fill these tags appropriately, so that I don't need to split my audio books. I'm pretty happy using `m4b` files - an mp4 / m4a container with chapter support, which is supported perfectly fine on my iPod Nano 7g and my Android Phone (using Audiobookshelf[1] and Voice[2]). After all these years, the iPod Nano 7g to me is the PERFECT portable audio book player with 2 exceptions: Repairability and the proprietary Apple headphone remote protocol [3].

1: https://audiobookshelf.org

2: https://github.com/PaulWoitaschek/Voice

3: https://tinymicros.com/wiki/Apple_iPod_Remote_Protocol

Wowfunhappy

There’s a couple of reasons I don’t use m4b files:

- A lot of my audiobooks come as mp3, and converting to m4b (which is AAC based) would mean loosing quality.

- Some MP3 players (even those that support AAC) don’t support M4B.

- I want playback to stop automatically at the end of a chapter, unless I actively decide to start the next chapter. (Admittedly, some MP3 players don’t have an option for this anyway and will always start the next track. This annoys me.)

- Even with chapter metadata, I find it difficult to seek through a 10+ hour m4b file. Seeking through a 10 – 60 minute chapter is more manageable. (Of course, this doesn’t always work out; A Memory of Light has a single chapter that’s more than ten hours long. Whatever, I want to split in a way that follows the author’s structure, and Sanderson purposefully chose to write one extremely long chapter.)

I probably sound like I regularly switch between 20+ different models of MP3 player. In fact, I mostly use my computer or iPhone these days; however, I expect my audiobook collection to outlast any one piece of hardware.

fsckboy

[flagged]

eloisius

And maybe someone else uses “American” style dates in their file names mm-dd-YYYY, can those also be put in correct order for those users?

Jaxan

That is just silly notation used by a minority in this world ;-)

derriz

I'm not sure I agree. I think I could be convinced if there was a unique and universal representation for numeric values using characters.

But we have so many textual representations of numeric values that I'm assuming the "mind-reading" goodness only works for a small subset. And the subset will be somewhat intuitive for developers but unlikely to be so for non-technical people.

For example, does the order handle numbers with fractions (decimal points)? If yes, does it require a at least one leading digit (zero)? Does a.12345 come before or after a.345?

Does it handle thousand separators? What about international thousand and decimal separators (e.g. Euro-style . for thousand separation and , for decimal separation).

Does it handle scientific notation?

If the answer is no to any of these questions, it's likely to lead to surprise/confusion.

It's like a feature request that initially sounds reasonable and useful but once you explore the requirements in detail you realize there are too many edge cases to be able to meet the request in a non-brittle way.

Certhas

The sort rules are simple (1). Treat any consecutive sequence of digits as a number when sorting. So for example version numbers (which must be massively more common than decimals in filenames) work correctly, and 5.9 is indeed smaller than 5.10 and the latter is not identical to 5.1 .

Given that this idea goes back more than two decades, has been the default behaviour of the most used OSes for many years, with no major outcry, I think empirically we can be fairly certain that it does not routinely lead to a lot of surprises and confusion.

(1) https://en.m.wikipedia.org/wiki/Natural_sort_order

derriz

> The sort rules are simple

In considering the simplicity of the rule, I think you're using a developers perspective here where we automatically classify numbers and have a clear mental model of the separation between value and representation.

But I'm not sure how simple it would be to explain to a non-technical user why size_5, size_10 and size_15 are in order but size_0.25, size_0.5 and size_0.75 are out-of-order.

> with no major outcry

I'm regularly amazed at how little non-developer/technical users complain about strange and confusing behavior.

xigoi

> Treat any consecutive sequence of digits as a number when sorting.

Based on this description, I have no idea how the following would be sorted:

• photo.jpg

• photo1.jpg

• photo01.jpg

• photos.jpg

undefined

[deleted]

tobyjsullivan

There is a standard algorithm - CLDR collation. There are several options available but, generally speaking, it’s a standard.

The specific option for numeric sorting is “kn”.

As far as I can tell, every operating system and many other interfaces tend to use this standard algorithm.

https://www.unicode.org/reports/tr35/tr35-collation.html#CLD...

ori_b

> If the answer is no to any of these questions, it's likely to lead to surprise/confusion.

Worse, if the answer is yes to any of these questions, it's also likely to lead to surprise/confusion. The only way to win is not to play.

systoll

The entire idea that numbers would be treated on a character by character basis rather than as numbers is somewhat intuitive for developers and not for non-technical people.

The answer to all of those questions is no for lexicographic ordering. Lexicographic ordering leads to surprise and confusion as a result.

> It's like a feature request that initially sounds reasonable and useful but once you explore the requirements in detail you realize there are too many edge cases to be able to meet the request in a non-brittle way.

It's been on windows and macOS for coming up on 25 years, and is in practically every modern UI. It’s reasonable.

queenkjuul

Are filenames likely to include those representations? I feel like probably not (can you even include commas in Windows filenames?)

More to the point of the article--if you want things sorted by date, sort by date. I think most laypeople aren't looking at long CHAR1234_5678 filenames anyway, they're looking at thumbnails and dates.

mulmen

> can you even include commas in Windows filenames?

Yes.

> Use any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following: The following reserved characters:

< (less than)

> (greater than)

: (colon)

" (double quote)

/ (forward slash)

\ (backslash)

| (vertical bar or pipe)

? (question mark)

* (asterisk)

https://learn.microsoft.com/en-us/windows/win32/fileio/namin...

crazygringo

> if you want things sorted by date, sort by date

Unfortunately it doesn't work. When I copy the files, they all get new dates in whatever random order they happened to be copied in.

derriz

The most common date format used in Europe uses period separators so can often appear in filenames. Commas are probably more rare. Things like versions are often fractional like v1.3 or v1.11 and can appear embedded in filenames.

coldtea

Ah, the classic filenames with decimal points and scientific notation in them, so common...

II2II

Here's a different scenario: filenames with dates in them. Consider September Budget and October Budget. September is the equivalent of 9, October of 10. Which comes first for natural sorting? Remember, the file modify date may not be useful here since you may have wrapped up the September budget on October 1st while the prior edit to the October budget may have been on September 20th.

The problem is that there is no such thing as natural, and it is quite hard to determine what is more common. (Quite often more common is culturally dependent or, worse, contex dependent).

Certhas

Agreed.What's more, the idea that people learn to put leading zeros is wrong and impractical, unless you know in advance how many digits you need. When you go from version 5.9.17 to 5.10.0 you don't go back and relabel every existing folder as 5.09.17.

The today standard way of sorting is well defined, unambiguous, and natural. Lexographic has its place, but user facing interfaces ain't it.

whatevertrevor

Had this in the Beat Saber mod manager recently. The game released 1.40.10 and my mod manager suddenly thought that game went backwards from 1.40.9

hakfoo

I had a similar fun problem with a little tool for use with an ATSC TV tuner.

For context, while NTSC program selections were typically indexed by channel ("ABC here is channel 4, NBC is channel 6"), ATSC uses "subchannels" like "12.1" or "21.5". I had assumed these could be safely stored as a decimal type.

Then one of the broadcasters here introduced both "42.1" and "42.10" and it broke the key model in the underlying SQLite database I kept the channel info in.

worik

Just no

User interfaces that try to be cleaver are a pita.

Keep it simple, and avoid the confusion with corner cases that otherwise will baffle users. Like this

TuringTest

Lexicographic order is great when you need an unambiguous criterion that will work the same in every implementation; but you only need that for automated processing, i.e. for coding.

For user-facing presentation, having 5.9.xxx before 5.10.xxx is simpler; the corner case that baffles users is having 5.1 and 5.10 before 5.2.

rs186

LOL I can tell you don't have the experience of designing UI and shipping product to end users

> Keep it simple

What's simple? Good defaults make things simple, which means putting 9 before 10 in case, for the reason explained by parent.

ploxiln

I think the only problem is that it's a surprise and mystery, particularly because "dumb" alphabetical sort has existed forever. When they "fixed this" for the 99% of regular users cases, they should have made it as separate "smart natural sort" option separate from the "strict alphabetical sort" option (next to date, size, etc). Simple and obvious, rather than surprisingly different from the decades of experience that even non-technical users already have.

wvenable

It's not just the one decision though; there are literally thousands, maybe tens of thousands, of these decisions in most software. You want every single one of them to have an option? You want it to support every single combination? At some point, it is ridiculous. Sometimes you just have to decide how your software is going to work and not leave every single decision to the user.

eviks

You don’t let every decision to the user, you make good defaults, but leave the option to override to the user! And thousands isn’t scary as long as groups/tags/search work, so what’s ridiculous about empowering the user?

mcdonje

How the files sort seems kinda important. It gets at the core behavior of the program. It's not something superficial like a default icon, which the user probably can change.

oneeyedpigeon

It may be one of thousands of decisions, but it's one of a handful that are exposed in the user interface as a fundamental action.

ploxiln

There's such thing as too many options, and there's also such thing as too few. This is one of the important ones. I'd say that macOS, Gnome, and Windows have definitely hidden or removed a lot of important options in the past decade, and despite the modern slickness mesmerizing people into thinking they're easier to use, they're actually harder to use as a result.

(I say this as a professional developer and power-user of all 3 desktops over the past 25 ish years, who also helps non-technical family and friends a few times every year. Some people will be like "oh I'm so bad at computers lol" or "oh this is a piece of junk huh" but really the UI just got dumber in the name of "ease of use", and the expert has to be called in to decipher it.)

lstamour

I might be wrong on this, but I vaguely recall that on macOS back when you could commonly option-click to reveal advanced options, if you held option when clicking a sort it would change how it sorted from alphabetical to lexical or vice versa. I’m not a thousand percent sure of it, though, I think when I needed it I was able to set a directory preference via terminal to change how a specific directory was sorted and it was an option there. MacOS had (or has) a lot of buried options which I presume date back to its origins as a Unix as well as a convenience to its developers. A lot of the command line utilities were hacked calls to graphical settings code though, so it wasn’t very stable version to version as the UI calls changed and nobody prioritized non-UI bug fixes or breaking changes. These days CLI is nearly forgotten or assumed to be an exploit vector - see Screen Time data for example.

armchairhacker

But the alternative would be a surprise to people who assume "by name" will order numbers, including those who are new to technology (and I think most non-technical people who sort things manually unknowingly order numbers).

We want to minimize surprises and mysteries, but computers have so much hidden complexity it's impossible to eliminate them. If users were shown a full description of how every feature on their computer worked before using it, they'd quickly start ignoring the descriptions. There should probably be a tooltip or "manual entry" for "by name" for those who are curious, and it should never be labeled "alphabetical" because it's not. But cases like the author's, where he assumes a feature works differently than most people (including the designers) assume, can't be helped.

undefined

[deleted]

SkiFire13

> and the situation where someone wants "10" to be before "9" is far more common.

I guess you mean "after"? Otherwise it seems to me you're agreeing with OP.

> desktops don't label this sorting "alphabetical" (E: and it would really be "lexicographic"*), they label it "by name" (an informal criteria), so technically they're not lying.

FYI the more formal name for the "by name" order is "natural sort order".

messe

> I guess you mean "after"? Otherwise it seems to me you're agreeing with OP.

Depends on which direction you're sorting in, no?

DrammBA

> Depends on which direction you're sorting in, no?

In a vacuum: yes. In this particular case: no, because we have the article's context clarifying that we're talking about ascending order.

zweifuss

You mean file9 before file10?

I have some beef with microsoft, that you can only change this at the Computer level, not per user (see registry key below). Also they call it natural sorting for users, but logical sorting internaly. Unify your termini!

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\Explorer] "NoStrCmpLogical"=dword:00000001

nerdile

To change it per user, set it in the user's hive instead of in the local machine hive (e.g. HKEY_CURRENT_USER instead of HKEY_LOCAL_MACHINE)

yegle

TIL they are called "hives". Windows Registry is an interesting thing. Even casual users have to interactive with it once or twice w/o fully understand it.

https://learn.microsoft.com/en-us/windows/win32/sysinfo/regi...

pdonis

> I agree with Microsoft/Google/KDE's order.

I don't. I want string sorting to be string sorting. Filenames are strings.

I wouldn't mind if there was an option to tell the file manager to do this "wrangle numbers out of strings and treat them as numbers" thing--so that I could turn that option off, and others who want that behavior could turn it on.

But for this to be the default, without even a way to change it (except in Dolphin, it looks like)? That seems daft to me.

Btw, I use Trinity Desktop, and I just verified that in TDE's version of Konqueror, the sorting of filenames is the same as for ls on the command line, e.g., 'item-10.txt' comes before 'item-9.txt'. Another good reason for me not to have switched to a more "modern" desktop.

> The author's situation is extremely rare

I don't think it is. But that's really beside the point. The computer is my tool. If it doesn't do what I want or expect it to do, it's a bad tool for me. And designers of tools shouldn't be making assumptions about how I want to use it. They should be giving me ways to tune it to how I want to use it.

> "mind-reading" is really helpful in ways we take for granted, like autosave.

I don't use autosave either. I don't want the computer to assume when I want to save a file. The computer is too stupid to know that.

zapzupnz

I generally agree with your points (and love TDE) but

> I don't use autosave either. I don't want the computer to assume when I want to save a file. The computer is too stupid to know that.

That’s why, with auto save systems, you flag/name a version as your canonical save point.

Rather like a video game, I’d rather have the autosaves and not need them, because I generally save the game myself, than not have them at all.

A computer can be helpful and obedient at the same time, when it’s done correctly and puts the user in control.

pdonis

> with auto save systems, you flag/name a version as your canonical save point.

You mean each saved version is stored separately, like a version control system?

A system like that would be fine (in fact I use version control all the time for this kind of thing). But that's often not how auto save is implemented; the auto save just clobbers the last version you saved. That's the kind I don't use.

whycome

The file sorting isn’t something relegated to niche users because of the prevalence of tv episode file name sorting (eg S01e01) and it has necessitated the leading zeroes to make it work properly with “alphabetical sorting”.

queenkjuul

And that would sort correctly with both methods, though, especially when each "field" is delineated (e.g. Show.S0XE0Y.Episode.Name.HEVC.1080p.mkv)

whycome

You’re saying that files with s1e10 and s1e9 would place 9 first?

epistasis

This is reminding me of the whole "Worse is better" essay and debate:

https://news.ycombinator.com/item?id=27916370

The author wants the "worse" sort, one based on ASCII/Unicode codepoints, without any intelligence for numbers that 99% of GUI users want.

For their purposes, they've assumed something about the implementation, to the point that a convenience feature is actually a misfeature for them. But the author here is probably a developer, or close to one, so they do not represent the needs of most people using computers.

Understanding the target audience for your product results in very different design decisions. Better is better might be great for products, but worse is better is probably better for systems that need to grow and evolve.

wvenable

It's an issue of mental models. As a developer, his mental model is one of how naive software would sort items with mixed numbers in them. Most people, of course, naturally sort 10 after 9 -- their mental model doesn't contain software developer assumptions.

BeFlatXIII

> The author wants the "worse" sort, one based on ASCII/Unicode codepoints, without any intelligence for numbers that 99% of GUI users want.

I want the author's opinion on how caplital and lowercase letters should be sorted. Do they follow strict ASCII/Unicode codepoints, or do they normalize into actual alphabetical order and sort upper/lower within each letter?

_9ptr

And where do you sort the letter ä? (After a is correct in German, but I think Swedish does it differently.)

dvdkon

This feels like the right moment to mention "ch", which is considered a letter in orthodox Czech, sorted between "h" and "i". The problem is, you can't reliably distinguish between "ch"-the-letter and "ch" as just "c" and "h" combined, which are present in loan words but also some original Czech compound words.

So if you're doing it "properly", sorting strings in Czech involves understanding the etymology of every word.

jcynix

That's why we have all this LC_* stuff in Linux, which you can configure to your needs:

  export  LC_MEASUREMENT="de_DE"
  export  LC_MONETARY="de_DE" 
  export  LC_PAPER="de_DE"                             
  export LC_CTYPE=de_DE.UTF-8  
  export LC_MESSAGES="en_US.UTF-8"        
  export LC_RESPONSE="en_US.UTF-8"  
  export LC_TIME=en_US.UTF-8

Mix in your Swedish or Swaheli, maybe even the Vatican State:

   e.g. de_DE, sw_TZ, it_VA (not guaranteed ;-).

sebtron

> I want the author's opinion on how caplital and lowercase letters should be sorted. Do they follow strict ASCII/Unicode codepoints, or do they normalize into actual alphabetical order and sort upper/lower within each letter?

I prefer the strict ASCII / Unicode sorting (all capitals first, then all lowercase).

jowea

Asciibetical sorting

card_zero

> most people using computers

> the target audience

Which is it? Those should be different groups.

"Most people" have incoherent ideas that can't even be used. So instead a designer cherry-picks some ideas - setting the agenda - and declares that they're popular. That doesn't make them good ideas. Also, "most people" are easily influenced and will like the terrible things that they've been told to like.

AlienRobot

>Understanding the target audience for your product results in very different design decisions

This is an excuse. Just add an option to sort both ways. It isn't hard.

There is no target audience in this planet that benefits from less options or less features. Even if you had the features under an "advanced mode" UI that's still a better software than not having the feature in first place.

Have people forgotten the 80/20 rule? Most features will be used by only a small slice of users, that doesn't mean they're out of scope.

Sorry, I'm just kind of exhausted of software not being able to do the most obvious things because it didn't align to some perfect vision of how the user should be.

pteraspidomorph

> There is no target audience in this planet that benefits from less options or less features.

I'm currently involved in UI design and, to my frustration, adding more options or features seems to send a vocal minority of the user base into a foaming-at-the-mouth violent rage. It's like any change resets the entire contents of their brain, and it's our fault we're making things so confusing for everyone...

And let's not get started on how we're wasting time adding things that they don't personally need, and therefore no one could possibly need, ever. No, clearly by adding this sorting method, we must have directly stolen development time from the feature they want, which is a personal attack directed at them and every member of their family going three generations back.

efreak

> any change resets the entire contents of their brain

That's because it does. Consistency is incredibly important.

The problem isn't that you're adding a feature, the problem is that you're adding a feature in an obtrusive way. Add as many features as you like (while preserving performance), but keep the day-to-day UI as stable as you possibly can. Place entry points (buttons) for new features in menus first, and make sure they're both used frequently and by many users before moving them to a crowded toolbar (and then give good thought about where it belongs on said toolbar/menu). Don't remove features unless they're truly problematic, and don't change UI.

bmn__

It is best to not engage with these demons.

KDE welcomes configurable complexity, Gnome deemphasises it. I am glad that broad user choice exists.

userbinator

The most irritating circumstance for this is looking for files named with a hash:

    3ea4f...
    ...
    97dce...
    ...
    126b9...

This is one of the settings I immediately turn off on Windows via the registry key mentioned in the other comments here.

I miss the time when computers did what you told them to, instead of trying to read your mind.

These days, it's more like "trying to change your mind". I absolutely hate the "the user is wrong" authoritarian mentality that unfortunately has infected a ton of software, even open-source.

krick

Exactly. This is even more annoying when it isn't exactly a hash, but some gibberish you cannot really make sense of, which does have a numeric section in them: like a user ID, or unix time, or who knows what else it could be, but you are trying to visually find a file abcd89764237 somewhere after abcd683426834, and it isn't evident why you cannot, unit you notice that the latter has more digits in its "ID" for some reason.

antonyh

It looks like GTK & KDE both suffer from this - I get this behaviour in Thunar and in Dolphin. This is the kind of thing that makes me lose sleep. It's the same on MacOS too, at least in the latest version.

zahlman

> Well, apparently all these operating systems have decided that no, users are too dumb and they cannot possibly understand what alphabetical order means. So when you ask them to sort your files alphabetically, they don’t. Instead, they decide that if some piece of the file name is a number, the real numerical value must be used.

Well, no. You don't actually ask them to sort in alphabetical order. You ask them to sort "by name", and that is up to their interpretation. And they choose the interpretation that (per their reasoning, and possibly some actual data) seems most likely to correspond to what the user wants.

Maybe future versions of those OSes will add a rule that says that if any of the number groups have leading zeros then it reverts back to actual alphabetic order. Or maybe they'll give you configurable options. (Maybe some of them already do.)

jameshart

Clearly a leading zero means the number is in octal (but only if all the subsequent digits are between 0 and 7). I think that would lead to the most intuitive results.

sebtron

> And they choose the interpretation that (per their reasoning, and possibly some actual data) seems most likely to correspond to what the user wants.

Yes, that make sense, but the problem is that this interpretation changed in the last 10 (15? 20?) years. It used to be that "by name" meant "by name, il alphabetical / lexicographical order" in pretty much every file manager.

pseudalopex

Microsoft and Apple changed to natural order in 2001.

janc_

It never was "alphabetical" but rather an order determined by the numeric index into the used encoding table.

undefined

[deleted]

KuSpa

Reminds me of https://xkcd.com/1172/

johanyc

nice. first time seeing this

JoshTriplett

I almost always want the version-sorting that's being presented in this article, rather than an "alphabetical" sort. But on the other hand, it absolutely seems like a valid bug that this is presented as an "alphabetical" sort, rather than something like "alphabetic/numeric" or similar. In other words, a problem of labeling rather than one of sorting.

plorkyeran

It’s not being presented as an alphabetical sort, though. The author assumed that sorting by name meant an alphabetical sort, but that’s not how it’s labeled.

lsaferite

In fairness, sorting by name has, for many years, been an alphabetic sort. Doing a mixed alpha/numeric sort is a relatively new thing.

pseudalopex

Natural sorting is relatively new in KDE. But in Windows since 2001.

undefined

[deleted]

lisper

Yeah, exactly. The behavior described is actually very useful. The problem is imposing it on the user with no warning or option to turn it off.

sebtron

Author here - I Agree both with you and with the parent's comment. Having two options in the "sort by menu" - like "Name (natural)" and "Name (strict)" or something - would have solved everything.

parineum

> The problem is imposing it on the user with no warning or option to turn it off.

You can say that about every single design decision made about every product.

The gripe about this particular feature seems misplaced because almost all users will want the sort that's offered and the actual alphabetical sort is likely the desire of a more advanced user who, in fact, is offered a choice through registry editing and/or using a more advanced cli option for the occasion they might need an alternative sort.

This is a sensible default.

lisper

> You can say that about every single design decision made about every product.

No, that's not true. Many aspects of my computer's UI are user-configurable.

bee_rider

Notably, some versions of “sort” on Linux have version sort nowadays. sort -V

I actually don’t know exactly how it works internally and it is a little bit magical, but I use it all the time when looking through my files because it just sorta works in most cases. Of course a nice thing about it is easy to turn on or off.

xerox13ster

The term for the sort in the article is called lexical, but the problem is the people are stupid.

The average user does not know the difference between lexical and alphabetic sort

Someone

https://www.unicode.org/reports/tr10/#Contextual_Sensitivity:

“There are additional complications in certain languages, where the comparison is context sensitive and depends on more than just single characters compared directly against one another,

[…]

Numbers. A customization may be desired to allow sorting numbers in numeric order. If strings including numbers are merely sorted alphabetically, the string “A-10” comes before the string “A-2”, which is often not desired. This behavior can be customized, but it is complicated by ambiguities in recognizing numbers within strings (because they may be formatted according to different language conventions). Once each number is recognized, it can be preprocessed to convert it into a format that allows for correct numeric sorting, such as a textual version of the IEEE numeric format.”*

I think those file browsers made the right choice, even given that they don’t (as in this example) always do the right thing.

afrisch

But -10 is smaller than -2, right?

JoshTriplett

Filenames rarely have negative numbers in them, and it'd usually be ambiguous whether they were negative or dash-separated positive.

ZoomZoomZoom

I know you jest, but this just further demonstrates why Natural Sorting is complicated and might not be the best default choice.

my_photos_at_-3c

my_photos_at_-10c

Do users want smaller numbers first, or do they want them in counting order, away from zero?

pwdisswordfishz

That's a hyphen, not a minus sign, silly.

meindnoch

I thought this was pretty well known. E.g. the macOS Foundation library even exposes NSString.localizedStandardCompare() [1] which implements the sorting algorithm used by Finder, and should be used by any well-behaved macOS application. Windows uses StrCompareLogical [2].

[1] https://developer.apple.com/documentation/foundation/nsstrin...:)

[2] https://learn.microsoft.com/en-us/windows/win32/api/shlwapi/...

freetime2

I would have assumed it worked the same as ls, so I found the article interesting. But now that I know, I think this way is better.

I can’t think of any case where I would need purely alphabetical sort. In most photo browsing apps, photos will be sorted by timestamp rather than filename. If I really needed it to sort properly in file explorer, I would try sorting on created date. And failing that I would probably just normalize the file names.

nielsbot

I tried it just for kicks.

The Finder sorts these as:

    IMG_20250820_095716_607.jpg
    IMG_20250820_103857_991.jpg
    IMG_20250820_103903_811.jpg
    IMG_20250820_055436307.jpg
    IMG_20250820_092016029_HDR.jpg
    IMG_20250820_092440966_HDR.jpg
    IMG_20250820_092832138_HDR.jpg

Whereas `ls -l` gives me

    IMG_20250820_055436307.jpg
    IMG_20250820_092016029_HDR.jpg
    IMG_20250820_092440966_HDR.jpg
    IMG_20250820_092832138_HDR.jpg
    IMG_20250820_095716_607.jpg
    IMG_20250820_103857_991.jpg
    IMG_20250820_103903_811.jpg

kens

Sorting so "foo9" is before "foo10" is called natural sort. I found out about natural sort a week ago and I am thrilled that my programs now print their output in a sensible order. Give natural sort a try and see if it improves your life too :-)

I found the magic two lines of Python to do a natural sort here, by the way: https://stackoverflow.com/questions/11150239/natural-sorting...

jcynix

Natural sort is an Option in sort(1):

  for i in $(seq 2 10) ; do
    touch img_$i-hn.txt
  done

  ls img_* | sort -V
  img_2-hn.txt
  img_3-hn.txt
  img_4-hn.txt
  img_5-hn.txt
  img_6-hn.txt
  img_7-hn.txt
  img_8-hn.txt
  img_9-hn.txt
  img_10-hn.txt

And we have "sort -h" to sort the output of e.g. "du -sh *" properly.

Edit: formatting and add sort -h

d1sxeyes

I am surprised how many people are comfortable calling sorting numbers alphabetical sorting (including TFA).

In true alphabetical sorting, sorting numbers is undefined behaviour. Both of these sorting methods are valid extensions of alphabetical sorting, and which you prefer is just that: a preference.

So actually when he says ‘alphabetical order’, he does not, in fact, mean ‘alphabetical order’.

geon

Yes. This is called ”natural order”.

thrance

I personally call it "ASCII sorting", or "UTF-8 sorting".

bapak

Maybe it's just me but I don't miss this at all:

  Image-1.jpg
  Image-11.jpg
  Image-2.jpg

The only time natural sort bit me was with nonsensical names like <md5>.jpg

ziml77

Nah that's not just you. That is an unnatural way to sort things because that's not how numbers are ordered. I remember when Windows changed to sorting numbers by their value and, despite my programmer brain finding it strange in a way, I was super happy to have files display in an order that actually made sense.

RHSeeger

I think it depends on the person. That order is exactly what I expect and want.

jbeninger

Same here. I was surprised at everyone here who prefers the more-complicated-but-arguably-more-intuitive lexical sort. Naive alphabetical sorts break some expectations, but don't produce any weird edge cases.

I wonder if there's an age divide at play here, where those of us who grew up with the naive alphabetical sort prefer it.

sneak

More importantly, it is how computers work, and how computers have worked for many decades.

Anyone with experience expects them to work this way. Trying to be clever to cater to the inexperienced only harms both groups.

crazygringo

Computers have been sorting with natural sort for decades. By now, it is "how computers work".

Were you under the impression this was something new?

crazygringo

You prefer looking at photos in that weirdly particular shuffled order that isn't the order they were taken in?

sparkie

The mistake is software which doesn't follow a recognized standard for date/time representation in its filenames. Ie, RFC 3339, ISO8601 or their union/intersection[1] (but preferably just ignore ISO8601 because its overcomplicated and RFC3339 is simpler and more intuitive).

In OP's examples, the filenames are YYYYMMDD_hhmmssssss, which is neither valid ISO8601 nor valid RFC 3999, as the former doesn't accept underscores (only 'T'), and the latter doesn't accept basic format dates (YYYYMMDD), only the equivalent of extended format (YYYY-MM-DD).

And if dates in file names simply used the extended format, the problem disappears. The lexical order is the natural order.

Alternatively, file managers that treat any digits as a number should be improved to recognize when a sequence of digits is not actually a number but a date/time, and order those chronologically. This might occasionally produce a few false positives, but I'd suspect it would be a rare occurrence.

[1]:https://ijmacd.github.io/rfc3339-iso8601/

kalleboo

If I want to sort by date, I sort by the "Date" column, not the file name

PetahNZ

If I wanted to sort by date taken I would do just that using the EXIF data on them.

dfxm12

I get it, but if all these major operating systems are handling this same ambiguous [0] situation in the same way, perhaps one needs to reevaluate their mental model or expectations.

Am I out of touch? No, it's the operating systems who are wrong

0 - numbers are not part of the alphabet.

pbw

"I created the Alphanum Algorithm to solve this problem. The Alphanum Algorithm sorts strings containing a mix of letters and numbers. Given strings of mixed characters and numbers, it sorts the numbers in value order, while sorting the non-numbers in ASCII order. The end result is a natural sorting order."

https://web.archive.org/web/20210207124255/http://www.daveko...

JoshTriplett

There are many older instances of that, such as "versionsort" from various Linux tools and libraries. I think this has likely been independently recreated several times, with various subtle differences.

Daily Digest email

Get the top HN stories in your inbox every day.