Skip to content(if available)orjump to list(if available)

Whatever happened to SHA-256 support in Git?

corbet

It's nice to see LWN on HN for the second time in one day, but please remember: it is only LWN subscribers that make this kind of writing possible. If you are enjoying it, please consider becoming a subscriber yourself — or, even better, getting your employer to subscribe.

O__________O

For easy of reference, here is the link to subscribe, which includes a description of the benefits:

https://lwn.net/subscribe/Info

And the Wikipedia page for LWN, if you’re not familiar with it:

https://en.m.wikipedia.org/wiki/LWN.net

williadc

Googlers can subscribe through work by visiting go/lwn and following the instructions.

jra_samba

Just want to second this ! Please subscribe to lwn. I learn new things from lwn every week. It's really worth the money.

bloopernova

subbed, thanks for the reminder!

avar

I'm the person and Git developer (Ævar) quoted in the article. I didn't expect this to end up on LWN. I'm happy to answer any questions here that people might have.

I don't think the LWN article can be said to take anything out of context. But I think it's worth empathizing that this is a thread on the Git ML in response to a user who's asking if Git/SHA-256 is something "that users should start changing over to[?]".

I stand by the comments that I think the current state of Git is that we shouldn't be recommending to users that they use SHA-256 repositories without explaining some major caveats, mainly to do with third party software support, particularly the lack of support from the big online "forges".

But I don't think there's any disagreement in the Git development community (and certainly not from me) that Git should be moving towards migrating away from SHA-1.

lalaland1125

Have you considered moving over to a combined SHA-1, SHA-256 model where both hashes are calculated, with SHA-1 shown to the user and SHA-256 only used in the background to prevent collisions?

There is a compute cost for that, but it should be minimal relative to the security benefits?

avar

Someone probably brought it up at some point, I can't remember. But I'm not aware of any known scenario where the SHA1DC library Git uses doesn't give you the benefits of that and more.

"And more" because to detect a collision with a background SHA-256 you'll need both objects, whereas SHA1DC detects attempts to spoof SHA1 in a way that leads to collisions. So it won't pass along an object that collides with another one, even though it only has 1/2 objects.

That distinction is something that's generally considered important, e.g. there's been past exploits in Git where you could trick a client into doing something bad by e.g. a crafted .gitmodules file.

The fix has not only been to patch clients, but also to patch "git fsck" to detect and reject such bad contents, so that e.g. the forges can't be used to relay a repository exploit to users running older versions.

A viable hash collision exploit in the wild might likewise want to make use of such an attack scenarios, so having servers capable of detecting collisions without having both sides is preferable to doing so by re-hashing with SHA-256.

bostik

> SHA1DC detects attempts to spoof SHA1 in a way that leads to collisions

This is the kind of detail I would have loved to see quoted directly in the article. Sure enough, it's prominently displayed on the project's README::about section, but your very succinct explanation here made it clear in immediate context.

The idea of counter-cryptanalysis is eye-opening.

pmarreck

This is what I would have recommended... A transition period where both are used

michaelt

Thanks for your work on Git!

> I'm happy to answer any questions here that people might have.

Is there any way to achieve a gradual, staged rollout of SHA256?

What's the impact of converting a repo to SHA256 - will old commit IDs become invalid? Would signed commits' signatures be invalidated?

avar

The answer is somewhat hand-waivy, because this code doesn't exist as anything except out-of-tree WIP code (and even in that case, incomplete). But yes, the plan is definitely to support a gradual, hopefully mostly seamless rollout.

The design document for that is shipped as part of git.git, and available online. Here's the relevant part: https://git-scm.com/docs/hash-function-transition/#_translat...

Basically the idea is that you'd have a say a SHA-256 local repository, and talk to a SHA-1 upstream server. Each time you'd "pull" or "push" we'd "rehash" the content (which we do anyway, even when using just one hash).

The interop-specific magic (covered in that documentation) is that we'd use a translation table, so you could e.g. "git show" on a SHA-1 object ID, and we'd be able to serve up the locally packed SHA-256 content as a result.

But the hard parts of this still need to be worked out, and problems shaken out. E.g. for hosting providers what you get when you "git clone" is an already-hashed *.pack file that's mostly served up as-is from disk. For simultaneously serving clients of both hash formats you'd essentially need to double your storage space.

There's also been past in-person developer meet-up discussion (the last one being before Covid, the next one in fall this year) about the gritty details of how such a translation table will function exactly.

E.g. if linux.git switches they'd probably want a "flag day" where they'd transition 100% to SHA-256, but many clients would still probably want the SHA-1<->SHA-256 translation table kept around for older commits, to e.g. look up hash references from something like the mailing list archive, or old comments in ticketing systems.

Currently the answer to how that'll work exactly is that we'll see when someone submits completed patches for that sort of functionality, and doubtless issues & edge cases will emerge that we didn't or couldn't expect until the rubber hits the road.

dwheeler

Has anyone considered doing "add16" on the first character of the sha-256 hash, e.g., so the SHA-256 hash 1d06... becomes hd06... ? Then you could see, from the first character, if it is SHA-1 or SHA-256. Having a clear distinction on the first character would make it clear which hash is being used (without needing lots of chars).

tux3

Has there been any feedback/communication with forges happening, on or off-list?

I'm curious how closely (if at all) they've been following this effort

avar

The forges are keenly aware of this effort, e.g. the person who's by far done most of the work on the SHA-256 transition (brian m. carlson) has I believe mostly or entirely been done so on behalf of GitHub.

I myself do some work on upstream Git on behalf of GitLab, although none of it's been on anything related to the SHA-256 transition.

As to why no big forge has SHA-256 support, I think it's a bit of a chicken & egg problem (and these comments are entirely my own, and not on behalf of anyone).

I think it's safe to say that all of the forges are expecting the transition, e.g. I don't think there's anyone creating CHAR(40) database tables for Git hashes anymore (or if they are, someone is planning to deal with it).

Another is that for a successful transition for anything except entirely new repository networks (which already use SHA-1) you really need the "git" client to play along, see my other comment discussing hash interop plans. Some of that same code then needs to run on the server-side.

That code isn't part of git yet, and it's really needed for any sort of viable migration plan.

I mean, it's not really needed, at some point a lot of people reading this migrated from CVS and/or SVN to Git. But a full export/import with a lot of users is painful. We really want it to suck less for Git, to the point that it should Just Work for most or all users.

And a major one is the human factor. For things to happen in free software development someone needs to submit patches, brian m. carlson has been performing a heroic amount of effort over the year on the transition over the years. As he notes in the linked ML thread he's had life reasons for why he hasn't been able to work on it as actively recently as he did in the past.

rurban

brian moved from Texas to Canada, but mostly his employee, GitHub, is not priotizing the remaining sha256 patches. someone needs to finish up the transition patches, and forges need to double their disc space.

Zamicol

This is one of the reasons why Go has its own versioning system. From a project's `go.sum`:

example.com/example v0.0.0-20171218180944-5ea4d0ddac55 h1:jbGlDKdzAZ92NzK65hUP98ri0/r50vVVvmZsFP/nIqo=

Where "h1" is an upgradeable hash (h1 is SHA-256). If there's ever a problem with h1, the hash can be simply upgraded.

Git's documentation describes how to sign a git commit:

$ git commit -a -S -m 'signed commit'

When signing a git commit using the built in gpg function the project is not rehashed with a secure hash function, like SHA-256 or SHA3-256. Instead gpg signs the SHA-1 commit digest directly. It's not signing the result of a secure hash algorithm.

SHA-1 has been considered weak for a long time (about 17 years). Bruce Schneier warned in February 2005 that SHA-1 needed to be replaced. Git development didn't start until April 2005. Before git started development, SHA-1 was identified as needing deprecation.

brasic

> Instead gpg signs the SHA-1 commit digest directly

A minor correction: when signing a commit, gpg does not sign the SHA-1 digest of that commit. This is impossible since the signature becomes part of the commit header which is one of the inputs to the hash function that produces the oid.

Instead, GPG signs the serialized data (parents,headers,tree,message) which would otherwise be the input to SHA-1. Then the sig is inserted into the buffer at the end of the header and the string is digested to produce an oid.

Source: https://github.com/git/git/blob/39c15e485575089eb77c769f6da0...

Zamicol

Thank you for pointing this out, and thank you for the link the the relevant code section. Excellent comment.

omegalulw

Lmao how were you that confident in your original comment which basically claimed git signatures are only as secure as SHA-1.

lewisl9029

Also check out multihash from the IPFS folks: https://github.com/multiformats/multihash

It's a more robust, well-specified, interoperable version of this concept.

Though it's probably overkill if you control both the consumer and producer side (i.e. don't need the interoperability) and are just looking to make hash upgrades smoother, in that case a simple version prefix like Go's approach described above has lower overhead.

Groxx

There's no need to explicitly version your first version of this though. Those first-version values are easy to identify: they don't contain versioning information :)

E.g. say you have `5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8`. What version is that?

Well. It's exactly as long as a SHA1 hash. It doesn't start with "sha256:" or "md5:" or "h1:" or "rot13:". So it's SHA1. Easy and totally unambiguous.

Versioning can almost always begin with version 2.

morelisp

Me, sowing: "Each record begins with a 4 octet BE value indicating the record length."

Me, reaping: "Each record begins with a single byte indicating the record format version. In version 0, this is followed by a 3 octet BE value indicating the record length."

kazinator

That's not applicable to Grox's example. The initial version uses only hexadecimal digits for the SHA256.

If you had: "each record begins with an 8 character record length, in hexadecimal, giving 32 bits", you have no problems. The new version has a 'V' character in byte 0, which is rejected as invalid by the old implementation.

Groxx

if you're storing the raw binary rather than hex or base64: yeah. there are often no illegal values, so there's no way to safely extend it, unless you can differentiate on length.

for those, you have to leave versioning room up-front. even 1 bit is enough, since a `1` can imply "following data describes the version", if a bit wastefully in the long run.

nh23423fefe

sow then reap

arccy

versioning also allows you to change the inputs on the future to include/exclude more info

inconsistencies in how data is presented (optional version number) is a pain to deal with in code

kelnos

I think the implications for Go are a bit different, though. It's a very simple matter to change the hash algorithm used for go.mod. Even if there was no hash version prefix, it's trivial to add one after the fact, though older tools would probably give a confusing error message without foreknowledge of the concept of an unrecognized hash algorithm. And adding a new hash algorithm is just a matter of writing a relatively small amount of code, and then probably waiting a few Go releases before making it the default and assuming most people will have it.

Git's entire foundation relies on SHA1 hashes. Each commit is its own hash, and contains a list of the hashes of all files that are a part of it. Branches have hashes, tags have hashes. Everything has a hash. A repository that uses a different hash algorithm is a completely different repository, even if the contents and commits are otherwise identical. You can't even store your code on someone else's server (well, aside from manually copying the repository data over, though that won't be too useful) unless that server has upgraded their git version.

samatman

The counterpoint: Fossil did it, it was easy, no big deal.

Well, Fossil's database is much better designed, you reply.

That it is!

jupp0r

I think the argument that gp is trying to make is that it's really hard for git to implement this in a backwards compatible way. You may be right (I don't know anything about Fossil, will take a look!) that Fossil allowed for this by making good design decisions in the past. This is not something that git maintainers can do right now without a time machine though. Old versions are in use out there and will need to keep working if the goal is to make the transition easier for users.

er4hn

Just to nit on your portion of signing: wouldn't you need to rehash all prior commits as well so that they used the better hash function? Otherwise someone could find a collision for a prior commit hashed with sha-1, slip that in, and the final commit being hashed with sha256 wouldn't matter.

This then makes the signing code use its own form of hashing that is different from the rest of git's commmit hashing, and seems like a novel way to introduce tooling issues / bugs / etc.

chimeracoder

> and the final commit being hashed with sha256 wouldn't matter.

Git stores content, not diffs. So the signature verifies all content stored in that commit. It does t verify anything that came before it, unless those are specifically signed as well.

ElectricalUnion

> Git stores content, not diffs.

But the "contents" is just pointers to tree roots with a trusted hash. If the hash is no longer secure, you can't garantee that any such trees are your content, or safe.

kazinator

Whenever the word "upgrade" rears its head, beware.

The intent behind it is obsolescence and phasing out, resulting in an endless make-work treadmill for the users.

If there is ever a "problem with h1", and you neglect to upgrade your data right there and then, five to ten years, it will be unreadable.

howinteresting

What in the world are you talking about? Generally, systems with upgradeable hashes will remain backwards-compatible with old ones forever.

arccy

or you know, find the version that handles transition and run it to upgrade

stepping through required versions a common operation

bawolff

Versioning hashes is definitely not a new idea with go - just look at how unix stores password hashes.

barsonme

The author of the comment did not imply this.

harryvederci

Relevant quote from the Fossil website[0]:

"Fossil started out using 160-bit SHA-1 hashes to identify check-ins, just as in Git. That changed in early 2017 when news of the SHAttered attack broke, demonstrating that SHA-1 collisions were now practical to create. Two weeks later, the creator of Fossil delivered a new release allowing a clean migration to 256-bit SHA-3 with full backwards compatibility to old SHA-1 based repositories. [...] Meanwhile, the Git community took until August 2018 to publish their first plan for solving the same problem by moving to SHA-256, a variant of the older SHA-2 algorithm. As of this writing in February 2020, that plan hasn't been implemented, as far as this author is aware, but there is now a competing SHA-256 based plan which requires complete repository conversion from SHA-1 to SHA-256, breaking all public hashes in the repo."

[0]: https://fossil-scm.org/home/doc/trunk/www/fossil-v-git.wiki#...

ludwigvan

Migrations are easier when you are the only one using your software. :p

Joking aside, expected from a developer whose work is the recommended storage format for Library of Congress.

tremon

This raises an interesting point: given that git has been dragging its feet for so long on the transition to SHA-256, it's better if they were to drag their feet a bit longer and move directly to SHA3-256 too, and never let the current SHA-256 implementation get widely deployed.

YesThatTom2

GitHub won’t feel any heat about this until Microsoft salespeople start demanding it.

I’ve added to my todo list a reminder to raise this issue with mine. In fact, I’m going to give them a deadline for when we will start evaluating competitors that do support SHA256.

I suspect that most people on HN do not interact with their MS account team. That relationship is probably managed by your CIO or IT department. They probably have monthly or quarterly “business review” meetings. You should get this issue on the agenda of that meeting.

lewisl9029

Just the other day, I was actually forced to downgrade the file hash used in the product I'm working on to sha1 in order to interact with GitHub's APIs efficiently (to avoid having to download the entire file just to recompute a sha256 for matching).

Luckily I've versioned the internal hash so the upgrade path back to sha256 should be as smooth as the downgrade was. I'm still bitter about it though.

codazoda

Is there something special about GitHub on this? This seems like a Git issue and not a GitHub issue to me; unless I'm missing something.

lucb1e

They don't accept pushes of repositories in that format.

The article says "none of the Git hosting providers appear to be supporting SHA-256", and while GH is not mentioned by name (and I applaud them for indeed not strengthening this "git == github-the-brand" trap), I can't imagine GH was left out of scope when checking the major hosting providers.

evil-olive

as the article says, you can create a local git repository with SHA-256 hashes today, and it should work fine...but the moment you try to push your repo up to Github, you'll hit a brick wall.

Gitlab also appears to be lacking support [0], and the same with Gitea [1].

so it's a grey area where Git itself supports SHA-256-based repos, but without the major Git hosting services also supporting them, the support in core Git is somewhat useless.

0: https://gitlab.com/groups/gitlab-org/-/epics/794

1: https://github.com/go-gitea/gitea/issues/13794

sdfhdhjdw3

Thank you.

bradhe

girvo

You sound like someone who didn’t read the article.

Git basically supports it already. GitHub et al do not, and that is what is holding it back.

vulcan01

Git is not GitHub and GitHub is not Git. This article is about Git, the software, not GitHub, the Git hosting service.

sdfhdhjdw3

Did you skip the bit that discusses hosting providers?

ElectricalUnion

Git supports it, GitHub doesn't. People use forges, therefore they are mislead to believe Git doesn't support it.

simias

It's frankly amateurish for the git dev to delay this. The longer this lasts, the more painful it'll be whenever the switch will finally take place.

Linus shouldn't have used SHA-1 in the first place, it was already being deprecated by the time git got its original release. Then every time a new milestone is reached to break SHA-1 we see the same rationalization about how it's not a big deal and it's not a direct threat to git and blablabla.

It'll keep not mattering until it matters and the longer their wait the more churn it'll create. Let's rip the bandaid that's been hanging there for over 15 years now.

runeks

> Linus shouldn't have used SHA-1 in the first place, it was already being deprecated by the time git got its original release.

Using SHA-1 to begin with was fine. However, commit hashes should have been prepended with a version byte to make it easier to transition to the next hash algorithm.

This would mean an old Git client could report an error to the user of the nature “please upgrade your software to support cloning from this Git server” instead of failing with an error that’s inseparable from “the Git server is broken” when trying to clone a Git repo using SHA-256.

jackweirdy

There’s already a version byte: if it’s [0-9a-f], that’s version 1 ;)

LeifCarrotson

That's a 4-bit nibble, the version byte is 0x00 to 0xFF.

simias

By the time Git was first released the first attacks on SHA-1 had already been published, but I agree with your general point about allowing for backward compatible updates.

layer8

The problem is not a missing version byte. SHA-256 is trivially distinguishable from SHA-1 by hash length. The problem is that that the length of a SHA-1 hash (20 bytes) is (or was) hardcoded in too many places.

tremon

Is SHA3-256 simlilarly distinguishable from SHA-256 by hash length?

wahern

Linus' original excuse for using SHA-1 was that Git hash trees and hash identifiers were never meant to be cryptographically secure. GnuPG signing support, the popular belief that Git trees had a strong security property, etc, came afterward, along with increasingly awkward excuse-making.

So strictly speaking Linus and subsequent maintainers weren't being amateurish in the beginning. (You didn't say that explicitly, but it would be a fair criticism given what was known about SHA-1 at the time, including known by Linus--he knew and made a choice.) Rather, in the beginning it was naivety in believing that people wouldn't begin to depend on Git's apparent security properties.

avar

I don't know if/how this played into it, but if you check out the original version of Git whose commit date is April 7th, 2005 it uses OpenSSL for SHA-1.

The first OpenSSL release that has general SHA-256 support seems to have been 0.9.8, released on July 5th, 2005, the code first appeared in OpenSSL's source tree in May of 2004.

Perhaps Linus has commented on it. I don't know, but I wouldn't be surprised if the actual reason is that Git was thrown together as a weekend project, that he vaguely knew SHA-256 was preferable, but his distro's OpenSSL didn't have it yet.

So the initial version used SHA-1 instead, and the rest is history...

1. https://marc.info/?l=openssl-users&m=135355590501495

jopsen

Yeah, on hindsight maybe he should have made his own 160bit CRC variant :)

Honestly, I think it's fair to say that hashes isn't meant to be a security feature.

But signed tags/commits/etc. probably need a better hash.

hinkley

I worked on code signing for civilian aviation years ago and there were people trying to pressure me into supporting MD5 and SHA-1 signatures. I told the first group to jump off a cliff, and the second group got a firm no. The first papers on theoretical SHA-1 attacks had already been published, we were still a couple years out from active use, and people were already beginning to talk about starting to organize the SHA-3 process.

Once a system expects to handle SHA-1, then you have to deal with old assets that have deprecated signatures, and that's a fight I 1) didn't want to have and 2) was fairly sure I wouldn't be around to win.

Git was still brand new, largely unproven at that point, and I don't understand why he picked SHA-1.

armada651

> Adding my own 0.02, what some of us are facing is resistance to adopting git in our or client organizations because of the presence of SHA-1. There are organizations where SHA-1 is blanket banned across the board - regardless of its use. [...] Getting around this blanket ban is a serious amount of work and I have very recently seen customers move to older much less functional (or useful) VCS platforms just because of SHA-1.

Seems like this company could just use the current SHA-256 support then? Especially if it's the type of company that does all its development in-house and there's no need for SHA-1 interoperability.

skissane

> > There are organizations where SHA-1 is blanket banned across the board - regardless of its use.

Reminds me of the time a security audit (which literally just involved running some scanning tool and dumping the results on us) complained that some code I had written was using MD5 - but in a use case in which we weren’t relying on it for any security purposes. I ended up replacing MD5 with CRC-32 - which is even weaker than MD5, but made the security scanning tool mark the issue as remediated. It was easier than trying to argue that it was a false positive.

bawolff

Honestly, this isn't a bad idea.

The big problem with using sha1/md5 in non-secure contexts is:

*Someone later might think its secure and rely on that when extending the system.

*it can make it difficult for security people to audit code later as you have to figure out if each usage is security critical

Using a non crypto hash makes both those concerns go away since everyone knows crc32 is insecure. The alternative of using sha256 also works (performance wise it is close enough, so why not just use the secure one and be done with it.)

gorkish

> There are organizations where SHA-1 is blanket banned across the board - regardless of its use.

> I have very recently seen customers move to older much less functional (or useful) VCS platforms just because of SHA-1.

A company this dysfunctional has problems far beyond their choice of revision control system.

bostik

I can name a couple of industries where compliance (and their enforcement arm, security[0]) teams require N+1 different monitoring and enforcement agents on all systems because Compliance[TM]. Due to these agents the systems' IDLE load is approaching 1.00 - on a good day. On a less good you need four cores to have one of them available for workload processing.

0: I use the word "security" only because the teams themselves are named like that. You can probably infer my opinion from the tone.

avar

In a past life I used to work for an anti-virus company who in addition to the Windows product sold the very portable virus-scanning engine for pretty much any other OS you could name. I worked in the *nix department, where we ported it to everything from Linux to the BSDs, HP/UX, Solaris & beyond, as well as more obscure setups like z/OS.

So, we sold people software that would run on some fridge-sized Sun machine running Solaris, to ensure that their Solaris machine wasn't about to get infected with the latest Windows virus.

The occasional support calls with technically minded *nix admins were amusing. We knew that what we were selling them was completely useless and made to secret of that fact, they likewise knew that the software they were running was useless to them. The one thing they cared about was that it didn't contribute to the load, and we did our best.

But some PHB somewhere in their organizations had decreed that all computers everywhere must have an anti-virus scanner, and if you're sufficiently motivated to buy something eventually someone will sell it to you, even while telling you that you don't need it :)

the_biot

I definitely see your point -- who hasn't seen or heard of companies ruined by officious rulemakers with no clue, rules to make something more secure that do the exact opposite etc. I've seen my share.

But blanket-banning an obsolete and insecure hash algorithm isn't a bad thing, it's entirely reasonable. In this case, as the article makes clear, it's git that's at fault.

cratermoon

Except said company likely uses one of the Git forge providers, either in-house or as a SaaS, as the (oxymoronic for git) central repo. Until they support SHA-256, or the company goes with a its own git repo solution that is set up for it, companies won't make the move.

wepple

Not just git forge but probably the myriad other ancillary tools that assume SHA1

ivoras

Is there an explanation of what would go wrong with the naive approach? E.g.:

- Change the binary file format in repos to support arbitrary hash algorithms, in a way which unambigously makes old software fail.

- Increment the Git major version number to 3.0

- Make the new version support both the old version repos and the new ones. Make it a per-repo config item that allows/disallows old/new hash formats. In theory, there's nothing wrong with having objects hashed with mixed algorithms as long as the software knows how to deal with that.

- The old format will probably have to be supported forever because of Linux.

Most user-facing utilities don't care what the hash algo actually is, they just use the hash as an opaque string.

runeks

Releasing new software is the simple part. The problem is that versioning is lacking in the old software, and therefore it doesn’t know how to talk to the new software. So for the old software there’s no difference between “invalid data” and “I’m too old, please upgrade me”.

dingleberry420

> So for the old software there’s no difference between “invalid data” and “I’m too old, please upgrade me”.

And why is this an issue? Release the new version that can read new repo formats, but doesn't write them yet. Wait a year. Release new version that can write new repo formats and encourage users to upgrade.

Anyone who hasn't upgraded in the past year probably doesn't care about security and should be left behind. Besides, once they google the error message they'll figure it out soon enough. It's not like git is known for its great UX anyway.

hamilyon2

I might be mistaken, but github could be using their own version of git and accompanying tools. So, unless they implement uprade themselves, no amount of waiting will make git interoperable with them.

kzrdude

All of what you wrote, except the version bump, is already implemented. It's the nicer features that are missing, the nice migration path.

kelnos

> In theory, there's nothing wrong with having objects hashed with mixed algorithms as long as the software knows how to deal with that.

That's an interesting idea, actually. I'm not sure they plan to support that, though? That would make things a lot easier on existing repositories; without support for mixed hashes, repos would have to have their history entirely rewritten, which would invalidate things like signed commits/tags.

rurban

No, study the transition document, please.

there is one hash version, plus a translation table for the other format. no history rewrite.

new repos will use the new hash. old repos will eventually fully convert to the new hash, then all old hash links after the transition period will become obsolete.

null

[deleted]

yjftsjthsd-h

> In his view, the only "defensible" reason to use SHA-1 at this point is interoperability with the Git forge providers.

Okay, but that's a pretty big reason! A git repo that can't be pushed to github/lab is... not always useless, but certainly extremely impaired.

kragen

In case anyone has forgotten, the process for pushing it to your own server is three shell commands. You run, on the server:

    git init --bare public_html/mything.git
    cd public_html/mything.git/hooks/
    mv post-update.sample post-update  # runs git update-server-info on push
(This assumes that your public_html directory exists and is mapped into webspace, as with the usual configuration of Apache, NCSA httpd, and CERN httpd. If you don't have an account on such a thing you can get such PHP shared hosting accounts with shell access anywhere in the world for a dollar or two a month.)

And then on your dev machine, it's precisely the same as for pushing to Gitlab or whatever, except that you use your own username instead of git@:

    git remote add someremotename user@myserver:public_html/mything.git
    git push -u someremotename master # assuming you want it to be your upstream
Then anyone can clone from your repo with a command like this:

    git clone https://myserver/~user/mything.git
They can also add the URL as a remote for pulls.

If you want them to be able to push, you'll need to give them an account on the same server and either set umasks and group ownerships and permissions appropriately or set a POSIX ACL. Alternatively they can do the same thing on their server and you can pull from it. There are reportedly permission bugs in recent versions of Git (the last five years) that prevent this from being safe with people you don't trust (https://www.spinics.net/lists/git/msg298544.html).

Of course source control is only part of the overall development project workflow, so for many purposes adding SHA-256 support to Gogs or Gitlab or Gitea or sr.ht is probably pretty important: you want a Wiki and CI integration and bug tracking and merge requests. But the git repo still works fine with a bog-standard ssh and HTTP server, though slightly less efficiently. It's easier than setting up a new repo on GitLab etc.

Running a git repack -an && git update-server-info in the repo on the server can help a lot with the efficiency, and for having a browseable tree on the server as well as a clonable repo I put this script at http://canonical.org/~kragen/sw/dev3.git/hooks/post-update:

    #!/bin/sh
    set -e

    echo -n 'updating... '
    git update-server-info
    echo 'done. going to dev3'
    cd /home/kragen/public_html/sw/dev3
    echo -n 'pulling... '
    env -u GIT_DIR git pull
    echo -n 'updating... '
    env -u GIT_DIR git update-server-info
    echo 'done.'
That's very far from being GitLab (contrast http://canonical.org/~kragen/sw/dev3 with any GitHub tree view), and it's potentially dangerously powerful: if you're doing this in a repo where you pull from other people, and the server is configured to run PHP files or server-side includes in your webspace (mine isn't!) or CGI scripts (mine is!), then just dropping a file in the repo can run programs on the server with your account privileges. This is great if that's what you want, and it's a hell of a lot better than updating your PHP site over FTP, but that code has full authority to, for example, rewrite your Git history.

In theory you can do other things from your post-update hook as well, like rebuild a Jekyll site, send a message on IRC or some other message queueing system, or fire off a CI build in a Docker container. (Some of these would run afoul of guardrails common in cheap PHP shared hosting providers and you'd have to upgrade to a US$5/month VPS.)

isomorphic

People also forget about Gitolite, which provides lightweight shared access control around Git+SSH+server-repos. For me it's a much simpler alternative than systems with a heavyweight web UI. Although to be honest I don't know whether Gitolite handles SHA256 hashes (I've never tested it).

https://gitolite.com

https://github.com/sitaramc/gitolite

kragen

I did forget about Gitolite! Thanks for the reminder! Do you have suggestions for what sorts of CI tooling and bug trackers people might want to use with it?

dikei

Most developers don't run their own server, and that's probably for the best.

kragen

That's ridiculous.

It would have made sense to say that in 01990 when the hardware cost US$12000 and the software required constant hand-feeding. But now, virtually every home internet connection has a server built into the cable modem, you can rent a VPS for US$5 a month, and you can bring up a running nginx configuration with a single docker command.

Running a server isn't any more difficult than running an Ubuntu laptop — in fact, it's mostly the same tasks, except that you can version-control the server setup in Git — and considerably more educational.

So I would say that most developers don't run their own server, and that's a criminal failure of education that imperils the future of civilization.

donatj

Potentially stupid question, would it be reasonable to use SHA-256 truncated to the first 40 digits?

It seems like that could ease much of the migration problems if it's not a problem?

Zamicol

I don't believe the length is a major issue. It's "upgrading" references to a new hashing algorithm that's the issue.

If for some reason length was an issue, a base64 encoded 256 bit string, like a SHA-256 digest, is 43 characters. That too can be truncated to 40 characters, which has 238 bits of security. SHA-256 is not only a better hashing algorithm than SHA-1 but it could also result in higher effective security even when truncated.

stingraycharles

I found this, which says that the SHA algorithm allows for truncation: https://csrc.nist.gov/publications/detail/sp/800-107/rev-1/f...

Dylan16807

Not just allows, it becomes more secure when you truncate.

tatersolid

Truncated SHA-* hashes are more secure against length-extension attacks, but are very much less secure against collision and pre-image attacks (which are more important in most scenarios).

neon_electro

Care to elaborate? This is not something I would've intuited.

jjtheblunt

that makes collisions more likely

pornel

Sigh, no it doesn't in any meaningful way.

160 bit output, without a cryptographic weakness, is good for about 30 trillion commits per second continuously for 1000 years.

For SHA the cryptographic strength isn't primarily from the length of the hash, but from the internal number of rounds is (e.g. 160-bit SHA-1 with fewer rounds has been badly broken way earlier, and 160-bit SHA-1 with more rounds would be safer).

Cryptographic hashes are designed to be safe to truncate and still have all the safety the truncated length can provide. It's basically a requirement for them being cryptographically strong. Even in the SHA-2 family, the SHA-224 and SHA-384 are just truncated versions of larger hashes.

dspillett

It makes random collisions more likely when comparing truncated SHA256 to pure SHA256, but given the collisions and pre-image attacks shown so far is truncated SHA256 still safer than SHA1 in that respect? I have seen an article that claimed so (sorry, I can't re-find it ATM so I can't offer it for criticism, if anyone else has good information either way please respond with relevant links), and it is immune to extension attacks which is a significant advantage if this is part of your threat sensitivity surface and SHA1 is used without other protective wrappers like HMAC.

bawolff

Truncated sha256 is safer than sha-1 (depending of course on how much you truncate it, but given context lets assume truncating to size of sha-1 - 160 bits).

SHA-1 is quite broken at this point. SHA-256 is not. There aren't any practical non-generic attacks on full sha-256 and thus there wouldn't be any on the truncated version. The Wikipedia article goes into the different attacks on the two algorithms.

That said, if your concern is length extension attacks - strongly reccomend using sha-512/256 instead of trying to do your own custom thing.

mjw1007

> All that is left is the hard work of making the transition to a new hash easy for users — what could be thought of as "the other 90%" of the job.

If that was all that was left, we could at least be using sha256 for new repositories.

It seems to me the big missing piece is support in libgit2, which is at least showing signs of progress:

https://github.com/libgit2/libgit2/pull/6191

xyzzy_plugh

libgit2 isn't an official library, and even if it did support sha256 dependents would still need to update, so I really don't perceive this as a missing piece.

If everyone started using sha256 then all these problems would be addressed practically overnight.

jiggawatts

If you’re going to “fix” the hash algorithm, do it properly!

Sha256 can only be computed in a single sequential stream (thread) by definition.

For large files this is increasingly becoming a performance limitation.

A Merkle tree based on SHA512 would have significant benefits.

SHA512 is faster than SHA256 on modern CPUs because processes 64 bits per internal register instead of 32 bits.

A tree-structured hash can be parallelised across all cores.

For repositories with files over 100MB in them on an SSD this would make a noticeable difference…

dchest

Most git objects are tiny files, so internal tree-based parallelization won't bring much compared to file parallelization (git is a hash tree itself, with variable-length leaves).

SHA256 is actually a lot faster on modern CPUs due to https://en.wikipedia.org/wiki/Intel_SHA_extensions (and similar on Arm), which are implemented for SHA-256 but not for SHA-512, e.g. openssl speed sha256 sha512 on M1:

  type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
  sha256           89474.97k   283341.15k   901724.41k  1730980.24k  2339109.86k
  sha512           66160.19k   262139.03k   365675.96k   487572.26k   545142.91k

jiggawatts

A fair point about the instruction sets, and it is also true that “most” files are small.

But again, due precisely to their size, large files take a disproportionate amount of time to process.

Don’t confuse the typical use-case with the fundamental concept: versioning.

Git could be a general purpose versioning system with many more use-cases, but limitations like this hold it back unnecessarily…

dchest

Hashing is not the only thing that stops git from being useful for large file versioning. For this purpose, splitting files into chunks using a rolling hash (similar to how git packs, rsync, tarsnap or IPFS) would work better. This again doesn't require "internal" tree hashing, since each chunk would be hashed separately.

akvadrako

Actually, SHA256 is faster since many common processors have special instructions to accelerate it.