Reasons to Prefer Blake3 over Sha256

Daily Digest email

Get the top HN stories in your inbox every day.

tptacek

I'd probably use a Blake too. But:

SHA256 was based on SHA1 (which is weak). BLAKE was based on ChaCha20, which was based on Salsa20 (which are both strong).

NIST/NSA have repeatedly signaled lack of confidence in SHA256: first by hastily organising the SHA3 contest in the aftermath of Wang's break of SHA1

No: SHA2 lacks the structure the SHA1 attack relies on it (SHA1 has a linear message schedule, which made it possible to work out a differential cryptanalysis attack on it).

Blake's own authors keep saying SHA2 is secure (modulo length extension), but people keep writing stuff like this. Blake3 is a good and interesting choice on the real merits! It doesn't need the elbow throw.

pbsd

While there is more confidence now on the security of SHA-2, or rather the lack of transference of the SHA-1 approach to SHA-2, this was not the case in 2005-2006 when NIST decided to hold the SHA-3 competition. See for example the report on Session 4 of the 2005 NIST workshop on hash functions [1].

[1] https://csrc.nist.gov/events/2005/first-cryptographic-hash-w...

honzaik

Also, the NSA is currently recommending to change SHA3/Keccak inside Dilithium and Kyber into SHA2-based primitives... https://groups.google.com/a/list.nist.gov/g/pqc-forum/c/SPTp...

twiss

For those who didn't click the link, it should be noted that they're suggesting this because it would be easier to deploy (in places that have a SHA-2 implementation but not SHA-3), not for reasons related to security or anything like that. Looking at the responses, there's also some disagreement on whether it would offer equal security for the particular use case of ML-DSA and ML-KEM (as the final version of Dilithium and Kyber standardized by NIST will be called).

hellcow

> they're suggesting this because it would be easier to deploy (in places that have a SHA-2 implementation but not SHA-3), not for reasons related to security

That’s a bit absurd, right? Sure, the NSA didn’t overtly say, “we propose you use SHA-2 because we can break it.” That doesn’t mean it’s secure against them.

We can’t look at their stated justification for supporting one algorithm over another because the NSA lies. Their very _purpose_ as an organization is to defeat encryption, and one tactic is to encourage the industry to use something they can defeat while reassuring people it’s secure. We need to look at their recommendations with a lot of suspicion and assume an ulterior motive.

pclmulqdq

Most people who publicly opine on the Blake vs. SHA2 debate seem to be relatively uninformed on the realities of each one. SHA2 and the Blakes are both usually considered to be secure.

The performance arguments most people make are also outdated or specious: the original comparisons of Blake vs SHA2 performance on CPUs were largely done before Intel and AMD had special SHA2 instructions.

ianopolous

The author is one of the creators of blake3, Zooko.

tptacek

Sorry, I should have been more precise. JP Aumasson is specifically who I'm thinking of; he's made the semi-infamous claim that SHA2 won't be broken in his lifetime. The subtext I gather is that there's just nothing on the horizon that's going to get it. SHA1 we saw coming a ways away!

rainsford

Who I'm sure actually is informed, but in this particular case is tweeting things that do honestly sound like one of the uninformed commentators pclmulqdq mentioned. I'm not sure why, since as tptacek said, blake3 is good and maybe even preferable on it's own merits without venturing into FUD territory. And if you still wanted to get into antiquated design arguments, picking on SHA256's use of a construction that allows length extension attacks seems like more fair game.

ianopolous

Would be interesting to hear Zooko's response to this. (Peergos lead here)

omginternets

What do you mean by "weak" and "strong", here?

_ugfj

There are fundamentally two kinds of attacks, preimage which splits into two:

In a first-preimage attack, you know a hash value but not the message that created it, and you want to discover any message with the known hash value; in the second-preimage attack, you have a message and you want to find a second message that has the same hash. Attacks that can find one type of preimage can often find the other as well. A weak algorithm allows this to be done in less than 2^(hash length) attempts.

And then there is collision: two messages which produce the same hash. A weak algorithm allows this to be done in less than 2^(half of hash length) attempts.

Source: https://www.rfc-editor.org/rfc/rfc4270

nabla9

Weak means that a mathematical flaw has been discovered that makes it inherently insecure, or that it is so simple that modern computer technology makes it possible to use “brute force” to crack. Strong means that it's not either.

undefined

[deleted]

gavinhoward

Good, terse article that basically reinforces everything I've seen in my research about cryptographic hashing.

Context: I'm building a VCS meant for any size of file, including massive ones. It needs a cryptographic hash for the Merkle Tree.

I've chosen BLAKE3, and I'm going to use the original implementation because of its speed.

However, I'm going to make it easy to change hash algorithms per commit, just so I don't run into the case that Git had trying to get rid of SHA1.

AdamN

Smart idea doing the hash choice per-commit. Just make sure that somebody putting in an obscure hash doesn't mess up everybody's usage of the repo if they don't have a library to evaluate that hash installed.

gavinhoward

I agree.

There will be a set of presets of hash function and settings; if BLAKE3 fails, then I'll actually have to add SHA3 or something, with a set of settings, as presets.

The per-commit storage will then be an enum identifying the hash and its settings.

This will let me do other things, like letting companies use a 512-bit hash if they expect the repo to be large.

FabHK

> letting companies use a 512-bit hash if they expect the repo to be large.

A repo would have to have more than 1e32 documents for a one in a trillion chance of a collision with a 256 bit hash. (Total annual world data production is estimated at less than 1e24 bytes.)

A 512 bit hash thus seems overkill for almost all purposes.

https://en.wikipedia.org/wiki/Birthday_problem

https://www.weforum.org/agenda/2021/05/world-data-produced-s...

agodfrey

Maybe you’re already aware, but you glossed over something: Since you’re using the hash to locate/identify the contect (you mentioned Merkle and git), if you support multiple hash functions you need some assurance that the chance of collisions is low across all supported hash functions. For example two identical functions that differ only in the value of their padding bytes (when the input size doesn’t match the block size) can’t coexist.

MikusR

zpaq archiver solves that by including decompression bytecode inside archives. So check if repository supports your algorithm, if not then include it inside your commit.

tatersolid

Remote code execution by design. What could possibly go wrong?

tromp

For short inputs, Blake3 behaves very similar to Blake2, on which it is based. From Blake's wikipedia page [1]:

BLAKE3 is a single algorithm with many desirable features (parallelism, XOF, KDF, PRF and MAC), in contrast to BLAKE and BLAKE2, which are algorithm families with multiple variants. BLAKE3 has a binary tree structure, so it supports a practically unlimited degree of parallelism (both SIMD and multithreading) given long enough input.

[1] https://en.wikipedia.org/wiki/BLAKE_(hash_function)

cesarb

While I really like Blake3, for all reasons mentioned in this article, I have to say it does have one tiny disadvantage over older hashes like SHA-256: its internal state is slightly bigger (due to the tree structure which allows it to be highly parallelizable). This can matter when running on tiny microcontrollers with only a few kilobytes of memory.

londons_explore

The internal state is no bigger when hashing small things though right?

I assume most microcontrollers are unlikely to be hashing things much bigger than RAM.

oconnor663

It's hard to give a short answer to that question :)

- Yes, if you know your input is short, you can use a smaller state. The limit is roughly a BLAKE2s state plus (32 bytes times the log_2 of the number of KiB you need to hash). Section 5.4 of https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blak... goes into this.

- But it's hard to take advantage of this space optimization, because no libraries implement it in practice.

- But the reason libraries don't implement it is that almost no one needs it. The max state size is just under 2 KiB, which is small enough even for https://github.com/oconnor663/blake3-6502.

- But it would be super easy to implement if we just put the "CV stack" on the heap instead of allocating the whole thing as an array up front.

- But the platforms that care about this don't have a heap.

@caesarb mentioned really tiny microcontrollers, even tinier than the 6502 maybe. The other place I'd expect to see this optimization is in a full hardware implementation, but those are rare. Most hardware accelerators for hash functions provide the block operation, and they leave it to software to deal with this sort of bookkeeping.

Retr0id

Blake3 is a clear winner for large inputs.

However, for smaller inputs (~1024 bytes and down), the performance gap between it and everything else (blake2, sha256) gets much narrower, because you don't get to benefit from the structural parallelization.

If you're mostly dealing with small inputs, raw hash throughput is probably not high on your list of concerns - In the context of a protocol or application, other costs like IO latency probably completely dwarf the actual CPU time spent hashing.

If raw performance is no longer high on your list of priorities, you care more about the other things - ubiquitous and battle-tested library support (blake3 is still pretty bleeding-edge, in the grand scheme of things), FIPS compliance (sha256), greater on-paper security margin (blake2). Which is all to say, while blake3 is great, there are still plenty of reasons not to prefer it for a particular use-case.

zahllos

I agree that if you can, BLAKE3 (or even BLAKE2) are nicer choices than SHA2. However I would like to add the following comments:

* SHA-2 fixes the problems with SHA-1. SHA-1 was a step up over SHA-0 that did not completely resolve flaws in SHA-0's design (SHA-0 was broken very quickly).

* JP Aumasson (one of the B3 authors) has said publicly a few times SHA-2 will never be broken: https://news.ycombinator.com/item?id=13733069 is an indirect source, can't seem to locate a direct one from Xitter (thanks Elon).

Thus it does not necessarily follow that SHA-2 is a bad choice because SHA-1 is broken.

gavinhoward

All that may be true.

However, I don't think we can say for sure if SHA2 will be broken. Cryptography is hard like that.

In addition, SHA2 is still vulnerable to length extension attacks, so in a sense, SHA2 is broken, at least when length extension attacks are part of the threat model.

zahllos

If you want to be pedantic we can say there is definitely a collision in SHA-2. Assume we have 2^256 unique inputs. Hash them all and assume no collisions. Now, if we have one more unique input (so 2^256 + 1 inputs) we have a collision. The same logic applies to BLAKE3.

However we do actually know quite a bit on how to design hash functions to make this hard to do in practice. The latest cryptanalysis (to actually find a collision) either requires a vastly reduced number of rounds or is is computationally infeasible. There's no clear flaw like there was with SHA1, where the path to finding a collision has been known since ~2004.

Length extension "attacks" sure, that's an unfortunate design choice. But it doesn't impact at all on collision resistance, which is what is implied by suggesting SHA1 is vulnerable then SHA2 is.

In the end, if you can use BLAKE3 or BLAKE2, great, I probably would as well. There isn't always a choice (e.g. there's no blake3 support in most crypto hardware) and if there isn't, sha3 or sha2 are fine choices.

EdSchouten

What I dislike about BLAKE3 is that they added explicit logic to ensure that identical chunks stored at different offsets result in different Merkle tree nodes (a.k.a. the ‘chunk counter’).

Though this feature is well intended, it makes this hash function hard to use for a storage system where you try to do aggressive data deduplication.

Furthermore, on platforms that provide native instructions for SHA hashing, BLAKE3 isn’t necessarily faster, and certainly more power hungry.

oconnor663

We go over some of our reasoning around that in section 7.5 of https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blak.... An early BLAKE3 prototype actually didn't include the chunk counter (https://github.com/oconnor663/bao/blob/master/docs/spec_0.9....), so I'm definitely sympathetic to the use cases that wish it wasn't there. However, after publication we found out that something like a chunk counter is necessary for the security of the Bao streaming verification tool: https://github.com/oconnor663/bao/issues/41. It could be that there's a design that's the best of both worlds, but I'm not sure.

lazide

Huh?

The storage system doing this wouldn’t use that part of the hash, it would do it itself so no issues? (Hash chunks, instead of feeding everything in linearly)

Otherwise the hash isn’t going to be even remotely safe for most inputs?

jasonwatkinspdx

Answer: identify chunks via something like rsyncs rolling window or GearHash, then name those chunks by Blake3.

Trying to use Blake3's tree structure directly to dedupe is a misunderstanding of the problem you're trying to solve. Removing the counter would not let you use Blake3 alone for this purpose.

persnickety

Could you point to how this is implemented and how it can be used? From the sound of it, you're trying to do something like rsync's running-window comparison?

EdSchouten

Imagine the case where you're trying to create a storage system for a large number of virtual machine images (e.g., you're trying to build your own equivalent of AWS Machine Images). There is obviously a lot of duplication between parts of images. And not necessarily at the same offset, but also at different offsets that are n*2^k bytes apart, where 2^k represents the block/sector size.

You could consider building this storage system on top of BLAKE3's tree model. Namely you store blocks as small Merkle tree. And an image is basically a collection of blocks that has a different 'hat' on top of it. Unfortunately, BLAKE3 makes this hard, because the same block will end up having a different Merkle tree node depending on the offset at which it's stored.

prirun

Author of HashBackup here. I don't see how any kind of hash tree would be effective at de-duplicating VM machine images, other than the degenerate case of an exact copy, which is easy to detect with a single file hash.

Most OSes use 4K block sizes. To get the best dedup you have to hash every 4K block and lookup each one individually in a dedup table. Two VM images could both contain an identical 4GB file, but every 4K block of that file could be stored at different offsets in the VM images. A tree hash wouldn't let you dedup anything but identical sections stored at identical offsets, whereas a dedup table and 4K blocks allows you to dedup the entire file.

londons_explore

Sounds to me like you are trying to use the innards of a hash algorithm for something for which it was not designed...

Either modify the algorithm to your needs, and rename it.

Or just use something thats already suitable off-the-shelf. Plenty of merkle-trees out there.

luoc

I thing CDC is what you're looking for. Some backup tools like restic use it. See https://en.m.wikipedia.org/wiki/Rolling_hash

marktangotango

> You could consider building this storage system on top of BLAKE3's tree model.

Consider a crypto currency pow that did that without the chunk counter. It'd be trivially exploitably by precalculating all the tree but the chunk that changed per nonce.

luoc

You mean something like a CDC algorithm? I know that some Backup tools like Restic use this.

https://en.m.wikipedia.org/wiki/Rolling_hash

ndsipa_pomu

At this rate, it's going to take over 700 years before we get Blake's 7

benj111

I had to scroll disappointingly far down to get to the Blake's 7 reference.

Thank you for not disappointing though.

The down side of that algorithm though is that everything dies at the end.

nayuki

It's an interesting set of reasons, but I prefer Keccak/SHA-3 over SHA-256, SHA-512, and BLAKE. I trust the standards body and public competition and auditing that took place - more so than a single author trumpeting the virtues of BLAKE.

jasonwatkinspdx

Ironic, because the final NIST report explaining their choice mentions that BLAKE has more open examination of cryptanalysis than Keccak as a point in favor of BLAKE.

stylepoints

Until it starts coming installed by default on Linux and other mojor OS's, it won't be mainstream.

theamk

Python 3.11 will have it https://bugs.python.org/issue39298

latexr

That says “Resolution: rejected” and Python is currently at 3.12.0. Did the feature land?

theamk

oops I misread it.. seems it was rejected because it was not standard enough...

https://github.com/python/cpython/issues/83479#issuecomment-...

sylvain_kerkour

At the end of the day, what really matters for most people is

1) Certifications (FIPS...)

2) Speed.

SHA-256 is fast enough for maybe 99,9% of use cases as you will saturate your I/O way before SHA-256 becomes your bottleneck[0][1]. Also, from my experience with the different available implementations, SHA-256 is up to 1.8 times faster than Blake3 on arm64.

[0] https://github.com/skerkour/go-benchmarks/blob/main/results/...

[1] https://kerkour.com/fast-hashing-algorithms

oconnor663

I mostly agree with you, but there are a couple other bullet points I like to throw in the mix:

- Length extension attacks. I think all of the SHA-3 candidates did the right thing here, and we would never accept a new cryptographic hash function that didn't do the right thing here, but SHA-2 gets a pass for legacy reasons. That's understandable, but we need to replace it eventually.

- Kind of niche, but BLAKE3 supports incremental verification, i.e. checking the hash of a file while you stream it rather learning whether it was valid at the end of the stream. https://github.com/oconnor663/bao. That's useful if you know the hash of a file but you don't necessarily trust the service that's storing it.

jandrewrogers

I think SHA-256 is still marginal for speed in modern environments unless your I/O is unusually limited relative to CPU. Current servers can support 10s of GB/s combined throughput for network and storage, which is achievable in practice for quite a few workloads. Consequently, you have to plan for the CPU overhead of the crypto at the same GB/s throughput since it is usually applied at the I/O boundaries. The fact that SHA256 requires burning the equivalent of several more cores relative to Blake3 has been a driver in Blake3 anecdotally creeping into a lot of data infrastructure code lately. At these data rates, the differences in performance of the hash functions is not a trivial cost in the cases where you would use a hash function (instead of e.g. authenticated encryption).

The arm64 server case is less of a concern for other reasons. Those cores are significantly weaker than amd64 cores, and therefore tend to not be used for data-intensive processing regardless. This allows you to overfit for AVX-512 or possibly use SHA256 on arm64 builds depending on the app.

There is a strong appetite for as much hashing performance per core as possible for data-intensive processing because it consumes a significant percentage of the total CPU time in many cases. Due to the rapid growing scale, non-cryptographic hash functions are no longer fit for purpose much of the time.

jrockway

Fast hash functions are really important, and SHA256 is really slow. Switching the hash function where you can is enough to result in user-visible speedups for common hashing use cases; verifying build artifacts, seeing if on-disk files changed, etc. I was writing something to produce OCI container images a few months ago, and the 3x SHA256 required by the spec for layers actually takes on the order of seconds. (.5s to sha256 a 50MB file, on my 2019-era Threadripper!) I was shocked to discover this. (gzip is also very slow, like shockingly slow, but fortunately the OCI spec lets you use Zstd, which is significantly faster.)

adrian_b

SHA256 is very fast on most modern CPUs, i.e. all AMD Zen, all Intel Atom since 2016, Intel Core Ice Lake or newer, Armv8 and Armv9.

I use every day both SHA-256 and BLAKE3. BLAKE3 is faster only because it is computed by multiple threads using all available CPU cores. When restricted to a single thread, it is slower on CPUs with fast hardware SHA-256.

The extra speed of BLAKE3 is not always desirable. The fact that it uses all cores can slow down other concurrent activities, without decreasing the overall execution time of the application.

There are cases when the computation of a hash like SHA-256 can be done as a background concurrent activity, or when the speed of hashing is limited by the streaming speed of data from the main memory or from a SSD, so spawning multiple threads does not gain anything and it only gets in the way of other activities.

So the right choice between SHA-256 and BLAKE3 depends on the application. Both can be useful. SHA-256 has the additional advantage that it needs negligible additional code, only a few lines are necessary to write a loop that computes the hash using the hardware instructions.

dralley

>I use every day both SHA-256 and BLAKE3. BLAKE3 is faster only because it is computed by multiple threads using all available CPU cores. When restricted to a single thread, it is slower on CPUs with fast hardware SHA-256.

That's not actually my experience. Last I tested, BLAKE3 was about twice as fast, single-threaded, as SHA256 on a Zen 3 CPU, which has the extensions.

Lower down in the thread someone else did a comparison, and came out with a similar result.

coppsilgold

sha256 is not slow on modern hardware. openssl doesn't have blake3, but here is blake2:

    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
    BLAKE2s256        75697.37k   308777.40k   479373.40k   567875.81k   592687.09k   591254.18k
    BLAKE2b512        63478.11k   243125.73k   671822.08k   922093.51k  1047833.51k  1048959.57k
    sha256           129376.82k   416316.32k  1041909.33k  1664480.49k  2018678.67k  2043838.46k

This is with the x86 sha256 instructions: sha256msg1, sha256msg2, sha256rnds2

dralley

"modern hardware" deserves some caveats. AMD has supported those extensions since the original Zen, but Intel CPUs generally lacked them until only about 2 years ago.

adrian_b

For many years, starting in 2016, Intel has supported SHA-256 only in their Atom CPUs.

The reason seems to be that the Atom CPUs were compared in Geekbench with ARM CPUs, and without hardware SHA the Intel CPUs would have obtained worst benchmark scores.

In their big cores, SHA has been added in 2019, in Ice Lake (while Comet Lake still lacked it, being a Skylake derivative), and since then all newer Intel CPUs have it.

So except for the Intel Core CPUs, the x86 and ARM CPUs have had hardware SHA for at least 7 years, while the Intel Core CPUs have had it for the last 4 years.

richardwhiuk

If you want a fast hash function (and don't care about it's cryptographic properties), don't use a cryptographic hash function.

dralley

BLAKE3 is actually competitive with non-cryptographic hashes like crc32.

RaisingSpear

Not even close. CRC32 can easily run at >50GB/s single thread on this i7-12700K CPU (VPCLMULQDQ implementation). The BLAKE3 page claims around 7GB/s single thread. Fudging the figures a bit to cater to CPU differences, BLAKE3 is still a far cry from CRC32.

oconnor663

To be fair, it really depends on the platform. There's an argument to be made that platforms where you care about the difference are specifically the ones where BLAKE3 is slower (no SIMD, no threads).

Daily Digest email

Get the top HN stories in your inbox every day.