Get the top HN stories in your inbox every day.
u_fucking_dork
dijit
> the GitHub team is shit, their tech stack is shit
1) Criticism of being unable to achieve service is not a fault of the individual; it simply is a fault of the system. You can criticise the system, it's permissible. Especially if they have more resources than many countries and some of the best tech talent in the world on staff.
2) Their tech stack is shit, and they've gone on record for years defending it, quite arrogantly in some cases, as if nobody can possibly know anything unless they've done github (even if you've done things which scale, or someone comes in with an even larger scale, the people on HN will happily say "but it's not github" which is valid but not intellectually curious or open).
Azure is terrible and it's being foisted on the team: even if they found some technical people to put at the top who are saying it'll be ok: it is a pretty cruel platform to use.
I've personally had a few conversations about their choice of relational database which were handled pretty defensively, and I think we're all somewhat cognisant of their frontend rewrite.
It's a waste of time to rewrite the UI and push AI tools when you can't even keep the site lit.
I have nothing against the engineers- I don't know why people keep chiming in as if we're punching down at "lowly engineers" when the reality is that it's a management failure of the highest order.
They're a billion-dollar company owned by a trillion-dollar one... it's very hard to "punch down" at this system: nobody is going after the engineer, we're punching the fact that the system that is a defacto monopoly due to network effects is putting new features or pleasing their owners over the core offering.. How is that an engineering failure? That's an active choice by management.
linsomniac
>not intellectually curious or open
This checks out. I once was at a conference where they (Azure) had a giant booth. A fairly well known person in the community brings me over to talk to his manager who is working the booth. "We should hire him, he's really smart." Within a minute of talking to this manager he says "You're a Linux guy? We do Windows." and physically turns away from me, conversation over. You know, fair enough, was an easy way to find that it wasn't a good fit. But the lack of curiosity about "what do you bring to the table" was pretty stunning.
Be curious.
edit: Clarifying "they"
vtbassmatt
Wait, is this Azure or GitHub who had the booth? If it was GitHub, I’m super confused and there must have been some serious missing context. I was at GitHub from 2020-2023 and am not aware of _any_ Windows usage in the service. The only meaningful Windows footprint was for client dev (`gh`, GitHub Desktop, etc.) and even there, Windows was the exception. Service side is all Linux; most engineers worked from a Mac.
If the context was an Azure booth, I’m still mildly surprised (they’ve long been invested in beyond-Windows) but not shocked.
(Edit: I forgot about the Actions stack. Some of that was on Windows. I was pretty far removed from that world and much closer to the classic Ruby monolith side.)
netule
Oof, that’s rough, especially considering that GitHub used to be a Linux shop. I wonder what happened to all the Rails folks who built the OG platform.
someguyiguess
If they were curious they wouldn't "do Windows"
etruong42
Your story (and the other posts commenting on lack of intellectual curiosity) fits into a larger model of the world that I prescribe to. Being labeled "well-known" or "smart" doesn't seem to require intellectual openness anymore. In fact, openness seems to be penalized. Being open means potentially exposing yourself to scenarios where you are not the smartest/authoritative, and that reduces your authority, so you avoid those scenarios to preserve your authority. Even when you are not "the authority", being open could be a threatening signal to the authority, where you and your "openness" could be a vector that introduces ideas/scenarios that reduces their authority. So long as authority is solidified by this lack of openness, actually being open could limit your career potential.
Seeing this happen in real time is helping me understand how authoritarian regimes/institutions/movements rise to power.
sam-cop-vimes
Wow - why anyone would build a serious Saas platform this day and age on Windows is beyond me.
xp84
> It's a waste of time to rewrite the UI and push AI tools when you can't even keep the site lit.
This is a flawed argument. There are many designers and frontend engineers there who have zero role in improving site reliability. They might as well keep doing their jobs, instead of having the CSS wizards and art school grads team up and try to crack Azure.
dijit
The implication here is that after 8 years of having issues management has not intentionally hired UX designers or programmers to work on AI features over people who could help build more reliability.
We've reframed this argument from the original "stop punching down" to, now "well, managements allocation of resources is fine because they have staff that would otherwise do nothing".
Thing is, I agree with the base of your argument, over the course of a quarter (or 3, or even 5..) the release of a feature does not mean that resources have been taken from the core.
However... it's been a really long time, and now we're hitting a critical point where the added load of AI, the rot that has been allowed to set in at the core, and the fact that they haven't been allocating staff to improving those pieces is hitting an inflection point.
I can't say for sure, as I don't work there, but I think if the trend is going lower for literally years: management could have changed course.
Those frontend designers didn't hire themselves and normal turnover is something like 5% for a healthy org: there was a conscious effort there. And those feature designers on AI can definitely have done work on reliability.
u_fucking_dork
The avalanche of same comments to every meme tier post about this is the opposite of curious.
Very little discussion of any merit happens on these posts. It’s mostly bandwagoning and repeating the same comments they read on the last iteration.
dijit
I agree... https://news.ycombinator.com/item?id=48026924
Yet here we are.
I just don't feel comfortable with you defending the trillion dollar company as if we owe them something, or as if they're somehow the victim in all of this.
I can buy that there's more demand for service, but;
A) They are the ones pushing the AI hype (microsoft especially but github too)
B) These issues existed before the AI hype anyway
and, obviously:
C) We're not saying they're bad engineers, we're saying it's become a bad service... THAT is everyones problem, managements especially. We're not attacking the developers specifically, we're attacking the state of a core service that is failing.
bastardoperator
Wrap it up, this guy doesn't like the database (they use two), azure is terrible despite being the cash cow for msft, and OP could easily build a more scalable scm service with their pinky and half their brain because they know better then thousands of engineers. I don't know whats more comical, GH going down everyday, or watching bros trying to flex.
s_dev
It's common knowledge that the official status pages don't actually reflect downtime due to SLAs and the status page could be weaponised against them. So comparing them is useless.
You rarely see "outages" even if that what happens in reality, in marketing speak it's referred to as 'degraded performance' i.e. the cheque is in the post, your data is in the tubes on it's way, it's just slow! A business oriented lie.
Far more useful are the 'independent status pages' maintained by enthusiasts that are unaffiliated with whomever they are measuring.
john_strinlai
>Far more useful are the 'independent status pages' maintained by enthusiasts that are unaffiliated with whomever they are measuring.
unless, like this one, they:
- treat "some copilot chat models are failing" and "teams notifications app down" as a major outage, the same as git operations or actions failing... those are very obviously nowhere near the same operational impact and its dishonest to group them as the same
- aggregate downtime so that there is greater than 1 day of possible downtime in a 24 hour period. if 3 services are down for the same 1pm-2pm time period, that is being counted as 3 hours of downtime despite a developer only being impacted for 1 hour.
it would be cool to have an accurate status page. the only two options seem to be company-owned status pages (incentivized to under-state impact) and karma-hunter/meme status pages (incentivized to make as much red as possible for retweeting or whatever).
michaelcampbell
> I’ve personally can’t remember the last time there was an outage that prevented me from doing work.
You and I are in different domains. It's not daily, but I can't remember the last time I (in my company) went a week WITHOUT having to workaround some outage. Perhaps semantics, but I can "do work" through most of them, but that work isn't getting built or deployed in the same time frame it would have been had the outages not occurred. So "affected" is at least weekly for me.
k_roy
It's weekly for me. And that's just with PRs, not even builders. I can't imagine if I relied on their runners.
senko
> If the problems didn’t revolve around load
GitHub is not a mom&pop operation.
I expect the $3T company to handle the load, or at least place a prominent "only for hobby use" warning on top.
jorl17
Ironically, I am in this very moment incapable of creating threads in PRs within my org because of a GH bug. It's on their status page, too.
I can reply to an existing thread, just not create a new one.
How does something like this even slip by?...and why has it been like this for an hour?
EDIT: Oh, good, the issue should be solved in the next 3 or 4 hours. How lovely of them.
somehnguy
I think 2 things may be combined here.
'GitHub Enterprise Server' is hosted on your own resources, not their cloud. It makes sense that it wouldn't have the same downtime as their cloud, but that's hardly relevant.
'GitHub Enterprise Cloud' is their offering hosted on their own resources and what I suspect most enterprise customers use. It's what I at $extremelylargecompany use. It follows the same uptime/downtime as their public non-enterprise offering.
voxic11
No if you use GitHub Enterprise Cloud with data residency then you are on separate infrastructure. Here is the status page for the US enterprise cloud data residency https://us.githubstatus.com/posts/dashboard (which funny enough is reporting an issue atm).
You can tell if you are on the github enterprise with data residency because you will access github at a GHE.com domain rather than github.com. It definitely has better uptime than the public cloud but is not without its own issues.
everfrustrated
It's still on Azure tho, so subject to Azures underlying problems....
tiagod
I don't find your conclusion that obvious. They could also be deploying changes to their regular infrastructure before updating enterprise, shielding it from some mistakes.
sqircles
Odd, our Enterprise side has been jacking up for a few days now on PRs.
collinmanderson
> Disruption with Gemini 2.5 Pro model
> Disruption with Grok Code Fast 1 in Copilot
> Incident with Copilot Grok Code Fast 1
> Claude Opus 4 is experiencing degraded performance
It doesn't seem fair to blame Github for this? There's nothing they can do about it?
Aurornis
The pattern recently is to collect every individual service degradation and present them as all equally significant.
Erase the severity and then present them all as “GitHub outages” or reduce it to an uptime graph.
I’m not happy with GitHub’s recent major outages either, but there is an ugly side of the pile-on where we’re getting these vibecoded attention seeking websites and social media posts to collect upvotes, likes, karma, and attention that blur the lines between small service degradations and total site outages to be more dramatic.
Anon1096
Every time a github outage is posted I start wondering more and more what % of hacker news commenters have actually worked on a system with >10k active hosts and have seen what it takes to run them and how internal dashboards are presented. So much of the criticism just makes 0 sense especially the third party uptime pages.
yreg
To me this makes it uninteresting. Degraded preformance of hyperscalers seems off topic to bundle with e.g. github.com availability. I think the author just wanted the chart to be as red as possible.
YetAnotherNick
I think it is fair to blame Github if they repackage other services. We run a much smaller service than Github and have all sort of fallbacks to different providers and different models.
collinmanderson
Github Copilot also lets you use other models when one has an outage.
rnotaro
But Github also allows you do fallback to other models...
lenerdenator
Depends on who's hosting the models.
sd9
Weekends are the untapped frontier. Still room to scale.
ahstilde
yup! When i did an analysis last month, GitHub is up 89.3% on weekdays and 96.5% on weekends. Incidents touch 62% of weekdays and 11% of weekends. Claude shows the same pattern: 92.5% weekday, 97.8% weekend. Tuesday through Thursday is the danger zone. Sunday is practically a different service.
https://www.aakash.io/tech-chase/github-and-claude-are-down-...
jrumbut
I had an occasion recently where I was working a lot of late nights/early mornings with AI use. And I'd be getting these instant, beautiful responses, and then, as soon as the sun started coming in the windows, it would take longer and fail more, and by the time the clock struck 9 AM, every LLM had turned back into a pumpkin.
user_7832
Which service(s) were you using, if you don't mind sharing?
I'm curious if most of the big players including eg Google do this thing of nerfing models or it's limited to more "smart" (read: black box models like ChatGPT.
jannes
Are you saying US data centers idle in the night rather than serving European/Asian users?
pojzon
Inference results for Copilot are also a lot better during weekends than workdays. Its my personal experience so take it with a grain of salt, but I work on personal projects only on weekends mostly due to that brain drain mon-fri of copilot.
skor
change is the biggest cause then?
kshahkshah
Wait until they go 996!
jve
A graph I have to question is even accurate.
> Across 170 days with at least one incident · worst day Thu, Nov 20, 2025 (1.1 days)
1.1 days total how is that possible? Scrolling over that day doesn't indicate the math behind the scenes - 1.3 hours single bullet point.
Also Nov 19 has a bullet point 1.3 day outage but total is 8.1 hours
hxtk
The missing status page [1] treats it as downtime any time any component of the system is down, and calculates the overall uptime based on the time that doesn't overlap with any individual category outages, and the overall downtime as any time overlapping with at least one individual category outage to avoid double-counting They show 24h of minor outage on that date.
I'm guessing that this site is taking the downtime in a given day across all services and adding it up, which would mean the worst possible day has 10 days of downtime (a day of downtime for each major category).
thenewnewguy
I see a bullet point for "1.0 days of 1.3 days", and when I mouse over the previous day (Wedensday 2025-11-19), I see "7.8 hours of 1.3 days".
I haven't actually checked any sources to confirm there really was downtime on those days, but if we assume those numbers are true 7.8 hours + 1 day is about 1.3 days.
figmert
Far fewer outages during the weekends. Perfect, wasn't gonna do any work then anyway.
__natty__
Contrast between official [0] and third party status pages [1] is huge. How their terms of service for SLA are legal if they are so different from real world usage of their product? I really like GitHub and their services but every time when it’s broken and their status page is green something screams inside me.
[0] https://www.githubstatus.com/ [1] https://mrshu.github.io/github-statuses/
xyzzy_plugh
Their terms of service are legal because their terms of service require YOU, the CUSTOMER, to track their availability against the agreed upon SLA and to pursue credits when they break their SLA.
At a recent gig we experienced many, many GitHub outages that were not tracked on their status page, and we kept a log (i.e. just search in slack). After our business people argued with our account executives at GitHub we got hundreds of dollars of credits.
Then the business peopled complained because hundreds of dollars of credits is not worth their time. And so GitHub continues to have terrible uptime and nothing is done about it.
everfrustrated
This. We talked to our account reps and engineering folks at GitHub - they had no monitoring to track if they had kept their end of the contracted SLA.
They expected us to log any faults and as you say the process wasn't worth it - even with massive outages - just for a few beans in credits.
GitHub has low availability simply because it doesn't cost them and they wear no legal or contractual damage from it.
If a competitor came to me and said, we will _pay_ you damages for the time your developers are offline not able to use our product to do their jobs, we would sign up immediately.
duiker101
Funnily enough, yesterday, when things were breaking, a coworker linked to the mrshu one, and it showed all green while the official showed issues with actions.
philipwhiuk
> How their terms of service for SLA are legal if they are so different from real world usage of their product?
Because the SLA likely doesn't consider some features of GitHub under the SLA, whereas an outage/issues for a single model is seen as problem on the Third Party page.
gen220
This idea has been around!
I made this one in January to help slice and dice uptime by incident category.
culi
"Billing" is all the way in the green and "Pull Requests" all the way in the red
Raed667
This is a much better project
predkambrij
The answer is: yes
keyle
This is one of the most creative idea I've seen this year. Tasteful and clever. Bravo!
elAhmo
Funny to see this closely match contribution graphs with effectively no downtime on weekends.
SlightlyLeftPad
Wow, that’s a great visualization. How many 7s of uptime is that?
devy
We need one for Anthropic's Claude: https://status.claude.com/
IMO, Claude is not fairing any better than Github.
mawax
Except Claude is not catching flak for it.
dvh
I didn't know azure was this bad, completely changed my opinion on their cloud offerings
vaylian
I know that there was a plan to move GitHub to Azure, but I don't know what the status is.
It could very well be, that GitHub is not running on Azure yet.
dijit
Really? A 10 minute interaction with the platform was enough to inform me that no serious engineer is in charge, and no serious engineer chooses this platform.
It is a platform for CFOs to avoid having another vendor relationship.
chrisweekly
It's even worse than I'd imagined. See this peer comment w/ link to scathing analysis from an insider:
Get the top HN stories in your inbox every day.
Every time one of these vibe coded meme sites gets posted there’re endless comments about how it’s not actually because of load, the GitHub team is shit, their tech stack is shit, Microsoft is shit, Azure is shit, etc.
Just compare the GitHub status page for public GitHub vs the enterprise cloud pages.
Enterprise has much better numbers and I’ve personally can’t remember the last time there was an outage that prevented me from doing work.
If the problems didn’t revolve around load, I’d expect to see the same uptime problems reflected on the enterprise offering.