Get the top HN stories in your inbox every day.
amluto
JonathonW
UUIDs (as GUIDs) in Windows predate RFC 4122, so I don't think it's that unreasonable that they're not compliant with the spec (since the exact contents of a UUID aren't typically that important, but consistency in how you produce them is).
The GUID implementation in Windows derives from the DCE RPC specification [1]. That's where the multibyte integers in the RFC 4122 specification come from (they're stated the same way in the DCE RPC spec), and it doesn't explicitly specify their endianness. It does call them "NDR integers", but NDR integers can be little endian or big endian depending on implementation. DCE specifies a mechanism by which you'd indicate which you're using in an RPC call, but that data's not included in the UUID format-- you get whatever byte order the system's decided to use, which, for Windows, is little-endian.
[1] https://pubs.opengroup.org/onlinepubs/9629399/apdxa.htm#tagc...
firebird84
It's actually worse than that. The first 3 groupings (textually) of the uuid might be little endian while the other 2 are big endian. Learning this cost me more time than I care to admit.
https://en.wikipedia.org/wiki/Universally_unique_identifier#...
formerly_proven
Making mixed-endian even more haunted is quite the achievement. I congratulate whoever did this at Microsoft for their lasting contribution.
amluto
This is consistent with the misguided structure in the RFC. The first three fields (the time fields) are multibyte integers. The remainder is just bytes. The dashes in the textual representation are just there to confuse you.
throwaway9870
I had to write some EFI/GPT code earlier this year and was dumbfounded to learn this. This is right up there with the mork file format.
qwerty456127
Why would this matter (except when you build an index for a big database)?
Note:
> 6.8. Opacity: UUIDs SHOULD be treated as opaque values and implementations SHOULD NOT examine the bits in a UUID to whatever extent is possible.
legulere
When you translate between UUIDs in binary and in text form and communicate with other code the binary UUIDs. The other code might expect the uuids in a different encoding.
cpach
What the…?
This fact will haunt me in my dreams :-p
blueflow
This mishap lives forth in the UUID stored in a machines DMI data, as well in GPT partition tables, which are required when EFI is used. It would be really cool if we had some replacement for EFI that would not harbour these kind of painful legacies.
dtech
On the other hand, this is an easy to implement conversion, while changing such a fundamental thing from EFI sounds pretty hard, making it not worth it.
blueflow
You cannot fix this with an conversion because you do not know if your UUID is correct or needs conversion. DMI data for example has inconsistent endian-ness depending on the vendor. So if you have a UUID sticker on a new server, you still have two options which UUID the machine will send during PXE, either the printed UUID in big-endian encoding or in the microsoft mixed-endian encoding.
Use BIOS boot instead of EFI, it has less legacy to implement: PE executables, FAT file system, Win64 ABI
WorldMaker
Raymond Chen's post on UUID sort orders has been indispensable to me at various points: https://devblogs.microsoft.com/oldnewthing/20190426-00/?p=10...
It's fun to note that the worst possible UUID sort order in existence isn't Microsoft's fault but Sun's. They missed the "unsigned" catch to those integers in the struct and Java sorts UUIDs as signed integers. (Which is why sometimes you'll notice in for instance Android apps Guids get sorted such that that 8000... < FFFF... < 0000... < 7FFF...)
throw868788
He's linked it here: https://devblogs.microsoft.com/oldnewthing/20190913-00/?p=10.... Its a good read. The LE format, especially with UUID 7 like UUID (seq UUID's), seems to be easier/faster to sort just using binary sorting on LE architectures which is most computers nowdays. Interesting to see all these tradeoffs.
Someone
> Hint: do not use integer types in C code for portable data structures. ntohl, etc are a mess. Just use arrays of bytes.
I don’t see how that helps much. If a developer forgets to call ntohl on multi-byte integer fields, I don’t trust them to correctly convert said integers to arrays of bytes, either.
kevincox
If you write it the naive way it works.
uint8_t bytes[4];
uint32_t = bytes[0] << 24 + bytes[1] << 16 + bytes[2] << 8 + bytes[3];
The endianness is whatever you write in the indexing and will be the same across architectures.e4m2
Unfortunately, the naive way also turns out to be wrong in C. uint8_t gets promoted to a signed int when shifting, which in turn causes undefined behavior for specific input. One way of fixing this is casting to the desired type before the shift, thus avoiding surprising conversions.
On a side note, compiler warnings and sanitizers help with this kind of stuff greatly, use them if you have the option: https://godbolt.org/z/8oq9GTcze
secondcoming
I thought Windows uses GUIDs, not UUIDs
takeda
It looks like GUID is synonymous with UUID, but the name GUID also implies that it could contain that bug/feature mentioned[1]
[1] https://en.wikipedia.org/wiki/Universally_unique_identifier#...
quotemstr
Little endian won. I don't see the point of maintaining theoretical compatibility with big endian systems that are at best esoteric right now and soon to be extinct. Likewise, every reasonable platform aligns data fields on natural alignment these days. It's just a waste of effort to make software portable to evolutionary dead ends.
mort96
This isn't about compatibility with big-endian machines. This is about compatibility between different uuid libraries, potentially on different operating systems, all on little endian CPU architectures.
ntauthority
Every existing library follows the standard which specifies host byte order, which usually means little endian. The Rust library cited a few levels up this chain ignored that, somehow assuming big endian, and then had to correct for this mistake.
undefined
wolverine876
You're saying it won for this application (UUIDs)? Universally?
What is the most common remaining use of big endian?
tech2
TCP/IP might be a reasonable example. There's a reason "network byte order" and "big endian" are the same thing.
undefined
GiorgioG
I used to be a big proponent of using UUIDs for database PKs but I've found them inherently difficult to work with. It's much easier to remember/recognize an integer based PK when troubleshooting a data problem.
This isn't to say you shouldn't use UUIDs at all, but I much prefer to use an "ExternalId" column of UUID type if you don't want to expose your integer based PKs externally.
cameronh90
UUIDs have a few distinct advantages: you'll never run out, you don't need a roundtrip to find out what they are after saving them, they often make a good partitioning key and it makes things easier if you ever need to combine multiple data sources together in migration and recovery type scenarios. I also quite like how they're unique across all data sources and tables, so if you just encounter a random contextless UUID in the wild, for example in a support ticket, you can probably still find what it refers to.
They are quite unwieldy though. There are a few compact representations you can use in URLs which make it a bit less ugly, but they can make your database and logs quite bloated, in particular if you've got a large number of small records.
feoren
> if you ever need to combine multiple data sources together in migration and recovery type scenarios
This insane idea that combining data sources is a rare event in some unusual "migration and recovery" scenarios is one of the most poisonous and yet pervasive ideas in all of database design. You are always combining multiple data sources, all the time. Users submitting data from a form is a data source. Test, staging, and production deployments, with multiple of each. External APIs. Multiple clients. Eventual consistency. Replication. Microservices with distributed systems. Or even sharing any common data at all between different systems, like unit conversions, chemical data, country names, engineering constants, etc.
Anyone who even considers using a single authoritative source for all entity identity either better be making a system in an underground bunker that will never talk to any other system. Otherwise they are making a serious and extremely avoidable mistake. Never use auto-incrementing IDs.
It's even wrong in a monolith! Why does everyone abandon this idea of "separation of concerns" and "single responsibility principle" and "bounded contexts" and proper abstraction and limited communication between system parts and literally every design principle they've ever been taught when they go to design a database? It all just goes out the window! Why do you guys bother reading books about system design if you ignore them when you build a database? "Multiple systems communicating with each other" should apply recursively all the way from deployment and external integration down to individual functions. That means database, too. Auto-incrementing IDs are anathema to that.
bruce511
While your tone might be a tad hyperbolic, I agree with your basic premise.
If I could go back and tell my 30 year ago self one tip, it would be to use uuids over auto-increments. And this is back when that was expensive - in disk space and database time.
Instead I'm stuck with my design, and as time has passed the real cost of auto-Inc has slowly revealed itself.
What's interesting to me though is that this view is not universal. I get a lot of push-back when promoting uuids, but I can really only speak to my experience.
remram
A problem here is that UUIDs are not actually a silver bullet against collision. You will never generate a UUID that will collide with someone else's, but it is easy for someone else to collide with you, either on purpose or because they made changes to data from your platform before sending it back (PKs tend to leak in URLs, APIs, data exports, etc).
I find that I have to implement a system for re-numbering incoming data on migration/import anyway, so the advantage of UUIDs is not that huge.
CharlesW
> There are a few compact representations you can use in URLs which make it a bit less ugly…
Any thoughts on where to find best-practices guidance? I need to create an external ID scheme for several million items. hashids (hashids.org) seems interesting, but I have anxiety about choosing a solution with weaknesses that I can't identify given my current level of experience in regards to this.
munawwar
I took a shot at the math behind this at https://www.codepasta.com/databases/2020/09/10/shorter-uniqu...
Using the equation listed in the article I couldn't generate a collision so far. Yet, I still check (in code) for id collision, and pick new id, just to be 100% sure.
selcuka
You may consider encoding the uuid in base58 as it is shorter, safe to use in URLs, and easier to debug as it doesn't use characters that can be confused such as 0, 0, I, and l.
RhodesianHunter
Something like Snowflake ids has many of these advantages while being integers.
Merad
> UUIDs have a few distinct advantages: you'll never run out
I'm curious what kind of applications are limited by the range of bigint values? I have no doubt that such applications exist somewhere, but most software engineers won't ever come close to encountering those limits. Even if you have a table that is consistently consuming a billion (with a B) bigint id values _every second_ (is that even feasible with current hardware and RDBMS software?) you won't run out for almost 300 years.
jandrewrogers
Some famous RDBMS bugs had their root cause in this kind of reasoning, because the model embeds assumptions on how those IDs will be generated and used which may not hold true in the future for reasons that are difficult to anticipate.
For example, Oracle had a 48-bit ID rollover bug many years ago that by all calculations should never occur in real systems. This calculation was made under the assumption that the IDs were mostly actually used. However, many features added later necessitated generating or reserving vast numbers of IDs in bulk, a low-cost optimization, the vast majority of which were ultimately discarded. It got to the point where very large systems started running out of these IDs due to the fact that such a low percentage were used in the way the designers had anticipated.
Extremely large systems do not run into the limitations of bigint because at that scale the identifiers are naturally segmented, often implicitly.
Gigachad
For some reason a lot of things used to default to 32 bit primary keys and you absolutely did run out of those and its been a major issue for a lot of apps for the last few years. But yeah, you'll never run out of bigints SQL servers couldn't cope with the amount of data that would require.
thayne
Also, using an incrementing id can lead to to information disclosure if they are visible in user facing APIs. For example, if I know an id for my user, document, cart, etc. Then I know all ids lower than that are probably also valid.
zbuf
"Unique IDs" _can_ be super really easy to work with if they're not so baffling complicated.
A random string generated using quality randomness can be adjusted to length to suit the quantity of data (negligible probability of a collision) which in most cases is very short.
It's easy to increase the length as you get more data.
They are visually very different for each item of data.
They're evenly spread which means they hash/index well.
You can tune a subset of characters if you want to decrease ambiguity eg. when exchanged by voice (no zero vs. letter O, upper/lower case etc.)
And a final bonus, when working with user input only a a short prefix is needed to uniquely identify an item (in contrast, it seems like UUIDs deliberately share a common prefix)
I'm very happy to concede I must be missing something here, and would be interested to know. But the above approach has served me well in a range of uses.
I can see how UUIDs work, and perhaps "looks like a UUID" is a useful feature. But reading the URL above and a bit of Wikipedia doesn't give me much to go on as to _why_ any of this is happening, and why the hyphens aim to retain meaning to what is ostensibly a 'unique' number.
jasonwatkinspdx
Using purely random ids in your database destroys locality. They mention this in the introduction:
> Non-time-ordered UUID versions such as UUIDv4 have poor database index locality. Meaning new values created in succession are not close to each other in the index and thus require inserts to be performed at random locations. The negative performance effects of which on common structures used for this (B-tree and its variants) can be dramatic.
The V7 ids work similarly to what you like, as they're just a unix timestamp and 74 bits of pseudorandom data (they present several different schemes you could use to generate this randomness, but the basic birthday bound says we'd need to be above 100 billion id's generated in a single millisecond to worry about collisons. Obviously most systems are nowhere near that territory.
So using these id's gives you the practical advantages of random uinique ids, but with the performance of autoincrement ids.
jandrewrogers
100 billion UUIDs per millisecond is the 50% collision probability threshold. Achieving an acceptable collision probability for most applications would limit the UUID generation rate to more like thousands of UUIDs per millisecond.
Even if one was not generating millions of UUIDs per second on average, the risk of spiky temporal distributions when generating UUIDs would still need to be considered.
zbuf
Thanks for drawing my attention to that, it's the useful answer I was looking for; my use cases haven't been bound by write performance in this manner. However, I'd still be considering carefully before making use of these UUID schemes.
jxcole
We used almost this exact scheme for app id indices and the curious problem we had to design against was inadvertent profanity. At some point we decided to just never use vowels to avoid ever having a complaint about 12f*ck if in the URL
jasonwatkinspdx
Another approach is to use something like EFF's dice words lists. One of the smaller lists in particular is interesting as it's 6^4 words, filtered for profanity, and where all words have both a unique 3 letter prefix and an edit distance of 3. That makes them robust for the use case of someone reading out the phrase to someone typing or such.
Never using vowells is a smart idea I wish I'd used in the past. Previously when I've needed something like this I've used other dictionary lists vs EFF's, and those were not curated sufficiently to avoid some really unfortunate combinations.
manigandham
Use integer IDs and a library like Hashids for friendly alphanumeric representations: https://hashids.org/
This particular implementation is available in dozens of languages.
manigandham
> "They're evenly spread which means they hash/index well."
What do you mean by this? Why would you hash it further? Hash distribution is primarily down to the hashing algorithm, not the input data.
Also indexes are better with somewhat ordered and smaller data. A 64-bit int sequential counter is much faster and half the size, and compatible everywhere without the annoyances of a UUID.
chrismorgan
I’ve been working on a robust scheme for encrypted sequential IDs, which is done, including library implementations in Rust, JavaScript and Python, pending just a smidgeon more writing about it and reviewing a decision on naming. You store an integer in the database, then encrypt it with a real block cipher, and stringify with Base58. I have three modes: one for 32-bit IDs, using Speck32/64 and producing 4–6 character IDs; one for 64-bit IDs, using Speck64/128 and producing 8–11 character IDs; and one hybrid, using the 32-bit mode for IDs below 2³² and the 64-bit mode above that, providing both a forwards-compatibility measure and a way of producing short IDs as long as possible. Contact me (see my profile) if you’re interested, or I’ll probably publish it in another day or two. Trouble is that I’ve been getting distracted with other related concepts, like optimally-short encoding by using encryption domains [0, 58¹), [58¹, 58²), …, [58¹⁰, 2⁶⁴) (this is format-preserving encryption; the main reputable and practical choices I’ve found are Hasty Pudding, which I’ve just about finished implementing but would like test vectors for but they’re on a dead FTP site, and NIST’s FF1 and FF3, which are patent-encumbered), and ways of avoiding undesirable patterns (curse words and such) by skipping integers from the database’s ID sequence if they encode to what you don’t want, and check characters with the Damm algorithm. If I didn’t keep getting distracted with these things, I’d have published a couple of weeks ago.
(I am not aware of any open-source library embodying a scheme like what I propose—all that I’ve found have either reduced scope or badly broken encryption; https://github.com/yi-jiayu/presents encrypts soundly, but doesn’t stringify; Hashids is broken almost beyond belief and should not be considered encryption; Optimus uses an extremely weak encryption.)
UUIDs are crazy overkill in any situation where you can have centralised ID allocation. Fully decentralised? Sure, 128 bits of randomness or mixed clock and randomness or similar, knock yourself out. But got a master database? Nah, you’re just generating unreasonably long values that take up unnecessary space and make for messy URLs and such.
iroddis
> UUIDs are crazy overkill in any situation where you can have centralised ID allocation.
Except that’s specifically the use case of UUIDs: to have a decentralized method to generate unique IDs with minimal chance of collisions. If you have centralized control, of course there will be options with more attractive properties: they aren’t dealing with the same constraints.
chrismorgan
Sure, decentralised allocation is the intent of UUIDs, but in practice UUIDv4 is used extremely widely where it’s completely unnecessary, because it’s a convenient way of generating non-sequential IDs with good tooling support (UUID libraries, UUID data types in some databases, &c.).
mappu
I've done something similar to obfuscate private DB IDs in a large existing application - just ensure they're all Skip32-encoded in all query parameters with an app-wide secret.
It works well but you have to be very disciplined to catch every case individually. Using GUID PKs from the start just removes this entire category of problem.
chrismorgan
If it requires discipline, you’re doing it wrong. For best results, it should be threaded through your ORM or equivalent so that you can’t do it wrong, that at your app layer you never get raw IDs.
yencabulator
I dabbled with doing that. XTEA is a 64-bit block cipher, which suits 64-bit IDs pretty well, and well you can't really ask for much more without using larger IDs. I use z-base-32 to stringify -- I considered base58, but I think case-sensitive URLs are not nice.
Looks like
user=1 doc=1
-> /users/a7e34gz71r4ig/documents/xs69f1c878rzq
user=42 doc=13
-> /users/am8hng8rnoopg/documents/9othzs4tgujrw
I have the code in Go (it's near-trivial), but ended up doing something else for that project.Also, I hate that Postgres doesn't actually have unsigned integers.
kortex
> It's much easier to remember/recognize an integer based PK when troubleshooting a data problem.
How often are you actually relying on memory of an ID to troubleshoot a problem? I mean sure, if you are scanning visually, it's good to recognize the same ID over and over again, but my ability to do so caps around 4-6 characters. So I just look at the last 4 chars regardless when fast scanning.
I use copy-paste for any time I need to transport IDs between contexts (that isn't just scripted, which is best). Having a copy-paste stack (Alfred, Raycast and others have this feature) is a huge game changer here.
marcus_holmes
I have exactly the reverse experience.
I do comparisons using ```<uuid fieldname>::text like '<first few chars of the uuid>%``` when debugging in a command line. Or just copy/paste the whole thing. Yes it's marginally more annoying than integers, but only marginally.
I have several times wondered why I was getting no match on a query. And then discovered that I was using a user_id on an account_id field. UUID's have saved me from shooting myself in the foot so many times.
Aeolun
How do UUID’s help in that situation? You get no result at all instead of the wrong result?
marcus_holmes
yes, that.
The extreme case (that I had and is the one where I was finally convinced that uuid's save me from myself) was setting admin permissions on a user. I accidentally copy/pasted their account_id instead of their user_id. If I had been using integers I would have given admin permissions to a random user and never been aware of it. But because I was using uuid's I got a nice, safe "updated 0 records" response and knew there was a problem.
tpetry
That‘s in my experience also the best approach. I wrote an article a few days ago about the exact thing: An integer auto incrementing PK with an UUID you use externally:
hn_throwaway_99
That is how I used to do (and currently still do) DB design, but honestly I think if DBs start supporting UUID v7s well that I would use that as the sole primary DB key as well as the external ID:
1. They are still sorted in increasing timestamp order (at millisecond granularity), so they should have good DB index characteristics.
2. At the same time, they contain 62 bits of randomness, would should pretty much eliminate IDOR attacks if there is a bug elsewhere that isn't doing proper access checks. Not good enough for secure tokens, but just good defense against access permission check bugs.
That is, you should basically get the best of both worlds: ordered keys with enough randomness to make ID-increment attacks infeasible.
RobertRoberts
Yep, I want UUID v7, because right now I am using ULID and it's fantastic, but I'd like more official and wide support as well.
tylerscott
Just seconding this as a sane way to use UUIDs IME. Basically the sequential integer PK is the “internal ID”. IIRC in SQLite regardless of type or existence of PK there is a private sequential integer. Super handy pattern to use.
ciupicri
Are you talking about ROWID? [1]
> Except for WITHOUT ROWID tables, all rows within SQLite tables have a 64-bit signed integer key that uniquely identifies the row within its table. This integer is usually called the "rowid". The rowid value can be accessed using one of the special case-independent names "rowid", "oid", or "_rowid_" in place of a column name. If a table contains a user defined column named "rowid", "oid" or "_rowid_", then that name always refers the explicitly declared column and cannot be used to retrieve the integer rowid value.
> [...] If an INSERT statement attempts to insert a NULL value into a rowid or integer primary key column, the system chooses an integer value to use as the rowid automatically. A detailed description of how this is done is provided separately. [2]
evil-olive
my personal favorite UUID replacement, when not concerned with external compatibility or standards-compliance: a 96-bit random value, base64-urlsafe encoded to 16 ASCII characters
>>> import secrets
>>> secrets.token_urlsafe(12)
'uUBpBk2eENDslHyw'
with UUID, you can store it as a compact 16 bytes in a database, but it needs 36 characters if you want to embed it in a URL or a JSON payload. and there's a temptation to be "clever" and strip out the hyphens to shave off 4 bytes and create a nonstandard UUID format. by comparison these have one single canonical representation that is always a 16-character URL-safe string.citrin_ru
Being impossible to remember to me is a minor problem compare to how much RAM and HDD space wasted on UUIDs across all systems being built nowadays. In addition to being large than say 64bit integer ID they are almost not compressible. DB compression will not reduce used by UUID space in DB, logs with UUIDs are bigger and compression not a great help here either.
lewisl9029
I've been using ULID for a while now, which analogous to UUID v7 but with a different (better IMHO) string representation. They've been awesome for using as sort keys in dynamo for instance, since they're lexicographically sortable as strings.
But one thing I'm still wary about is exposing these IDs with millisecond-precision time components to end users, since I've seen multiple discussions here on HN about the potential for timing attacks.
How worried should I really be? Do people have useful heuristics on the kinds of data where it's safe/unsafe to expose timing information, or should I just only expose a separate UUID v4 externally across the board just to be safe?
djbusby
ULID has like 80 bits of random per millisecond. I found some calculator online that showed the probability of collision on 80bits was very, very low even across time much longer than millisecond. I think knowing when an ID was created is harmless - and if it isn't there is a design problem that is larger than ID choice.
There is another lib called hashid which can be used if masking that ID algo is important.
lewisl9029
> I think knowing when an ID was created is harmless - and if it isn't there is a design problem that is larger than ID choice.
I hope you're right, since I've been cautiously operating under that assumption so far. Cautiously, because of warnings, from people better versed in cryptography than I, in discussions like this one: https://news.ycombinator.com/item?id=29805433. I still haven't quite been able to internalize when this should vs shouldn't be something I should be concerned about, hence the comment.
doliveira
I came up with a scheme in which the random part of the ULID is a slice of the hash of a UUID, which seems to work fine because I'm generating it all server side, I guess? It works very well for insertion into NoSQL databases, for instance.
So I'd also like to know the threat model for these timing attacks.
stevesimmons
This had appeared a few times before at earlier stages of the draft process...
https://news.ycombinator.com/item?id=28088213 [244 comments]
Personally, I really like UUIDv7, except I transform the UUID to a 25-character string which has all the same properties except it doesn't look like a UUID. The last thing I want is my index of time-sortable UUIDs getting contaminated with with some UUIDv4 fully random ones. Since UUIDs may be generated and persisted in a distributed manner, it's a simple way to at least spot this.
ikornaselur
The UUID version is in the actual ID, so you would be able to spot the version as it isn't fully random.
Although you light mean just visually you'd spot the difference much easier!
theptip
Can’t you just check the “version” bits and reject if it’s not 4/7? Or are you worried about someone generating a completely random (non-compliant) set of bits that happens to parse as a v4/7
operator-name
With the introduction of UUIDv8, my fun script to generate vanity uuids[0] can finally be spec comfortant!
davidjfelix
I've been staring at the supposedly readable and memorable uuid for like 4 minutes without any idea what it says.
operator-name
Yeah, looking back I could have done a bit more than simple substitution. Since the sentences don't gave meaning, interpreting between 1 as i vs l is especially difficult.
5eedbed5-f05e-b055-ada0-d15ab11171e5
seedbeds-fose-boss-adao-disabilities
"Memorable" was definitely tongue in cheek, as was the spec bending.
kbumsik
UUIDv7 looks interesting, but how is it different from ULID [1] in practice? I was considering using ULID for a upcoming new project because it is lexicographically sortable but it looks like UUIDv7 just can replace that.
[1]: https://cran.r-project.org/web/packages/ulid/vignettes/intro...
rekwah
As the author of a popular ULID implementation in python[1], the spec has no stewardship anymore. The specification repo[2] has plenty of open issues and no real guidance or communication beyond language implementation authors discussing corner cases and the gaps in the spec. The monotonic functionality is ambiguous (at best), doesn't consider distributed id generation, and is implemented differently per-language [3].
Functionally, UUIDv7 might be the _same_ but the hope would be for a more rigid specification for interoperability.
[1]: https://github.com/ahawker/ulid
kortex
I've bee using ULIDs in python for about a year now and so far have been super happy with them, so a) thank you for maintaining this! b) I always felt a bit uneasy about the way the spec describes the monotonicity component. Personally I just rely on the random aspect as I am fortunate enough to say that two events in the same millisecond are effectively simultaneous.
At that point, it's basically just UUID7 with Crockford base32 encoding, more or less.
IMHO the in-process monotonically increasing feature of ULID is misguided. As you mention, distributed ids are a pain. The instant you start talking distributed, monotonic counters or orderable events (two threads count as distributed in this case), you need to talk things like Lamport clocks or other hybrid clock strategies. It's better to reach for the right tools in this case, vs half-baked monotonic-only-in-this-process vague guarantee.
RobertRoberts
Thank you, I've been using ULID for a while now, and it serves my purposes. But I have long term support concerns.
UUIDv7 really seems like the sweet spot between pure INT/BIGINT auto incrementing PKs and universally sortable universal ids.
duxup
I’m always amazed how much work goes into these.
Meanwhile 99% of the time i just call a function when “I just need something unique here”, -calls function-, and it works!
Thanks everyone!
Daegalus
I maintain the Dart UUID library. For anyone using dart, or wants to see one of many implementations of v6,v7 and a custom v8 UUID, feel free to look at my in-progress branch linked below, I plan to merge it in once they add different string representations in a future draft (I've been involved in the conversations).
kdps
My main concern with random-based UUIDs always has been running out of entropy and cause the application to remain in a blocking state (e.g. as described here: https://blog.fastthread.io/2022/03/09/java-uuid-generation-p...). Not due to any negative experiences that I made myself, but due to a particular colleague of mine who sees it as a dealbreaker.
Is this an actual issue? Most people don't seem to care when talking about random UUIDs. The target platform of our applications is mostly Kubernetes on cloud environments, if that makes any difference.
Why I'm asking: UUID Version 7 looks quite interesting to me, and the document describes rand_a and rand_b just as "pseudo-random data"... which made me think that in the context of "uniqueness per millisecond", a source of entropy is conceptually not required. However, chapter 6.6 clearly advises the usage of CSPRNGs, so I guess the overall problem remains :(
astrange
“Running out of entropy” is not possible; that’s a property of ancient PRNGs written by confused people.
Even if your PRNG could run out of entropy, rdrand would give it all it needs.
ciupicri
From the documentation for java.security.SecureRandom from Java 17 [1]:
> Note: Depending on the implementation, the generateSeed, reseed and nextBytes methods may block as entropy is being gathered, for example, if the entropy source is /dev/random on various Unix-like operating systems.
[1]: https://docs.oracle.com/en/java/javase/17/docs/api/java.base...
astrange
Like I said, ancient PRNGs written by confused people.
yencabulator
On current day Linux, /dev/random will only block during bootup, never after it has once started.
swyx
(poster here)
this RFC is not new new, but is still pretty new and i was surprised to learn that UUIDv7 and v8 are being worked on.
the context is i keep a list of uuid impls and knowledge for my own reference. posted this up today simply because I got a PR from some subscribers https://github.com/sw-yx/brain/pull/36
gigatexal
Uuidv7 looks really interesting. I like that the article mentions all the other projects that attempted to fix uuid issues for different use cases.
nixpulvis
Maybe I'm being slow right now, but can somewhat help me understand why Max UUID is ever specifically useful?
okr
My guess is, that it is just a constant, ready to be used for bitwise operations.
nixpulvis
Yep... I was just being slow. I thought the specialized variant was a new kind of UUID (being inverse of RFC4122), but it's just the inverse of RFC4122's Nil UUID; a single value.
Sorry for the silly comment. This value is just a bunch of binary 1s.
Get the top HN stories in your inbox every day.
Fortunately this is a bit less relevant today as Windows loses market share in database and server applications, but:
UUIDs have historically massively screwed up endian handling. While this new draft discusses sorting UUIDs as strings of octets (bytes) and the text of RFC4122 is fairly explicit about most significant bytes coming first, the C UUID structure in RFC 4122 appendix A is entirely misguided:
Those aren’t bytes — they’re integers of various sizes. (Hint: do not use integer types in C code for portable data structures. ntohl, etc are a mess. Just use arrays of bytes.)I don’t know the whole history, but MS somehow took this structure at face value and caused problems like this:
https://github.com/uuid-rs/uuid/issues/277
So, if you want to do anything (e.g. sorting) that depends on the representation of a UUID (or even depends on converting between string and binary representations), be aware that UUIDs coming from Windows may be little-endian. In my book, this is a Windows bug, but opinions may differ here.