I Didn't Need Kubernetes, and You Probably Don't Either

Daily Digest email

Get the top HN stories in your inbox every day.

tombert

I’ve come to the conclusion that I hate “cloud shit”, and a small part of me is convinced that literally no one actually likes it, and everyone is playing a joke on me.

I have set up about a dozen rack mount servers in my life, installing basically every flavor of Unix and Linux and message busses under the sun in the process, but I still get confused by all the Kubectl commands and GCP integration with it.

I might just be stupid, but it feels like all I ever do with Kubernetes is update and break YAML files, and then spend a day fixing them by copy-pasting increasingly-convoluted things on stackexchange. I cannot imagine how anyone goes to work and actually enjoys working in Kubernetes, though I guess someone must in terms of “law of large numbers”.

If I ever start a company, I am going to work my damndest to avoid “cloud integration crap” as possible. Just have a VM or a physical server and let me install everything myself. If I get to tens of millions of users, maybe I’ll worry about it then.

voidfunc

I'm always kind of blown away by experiences like this. Admittedly, I've been using Kubernetes since the early days and I manage an Infra team that operates a couple thousand self-managed Kubernetes clusters so... expert blindness at work. Before that I did everything from golden images to pushing changes via rsync and kicking a script to deploy.

Maybe it's because I adopted early and have grown with the technology it all just makes sense? It's not that complicated if you limit yourself to the core stuff. Maybe I need to write a book like "Kubernetes for Greybeards" or something like that.

What does fucking kill me in the Kubernetes ecosystem is the amount of add-on crap that is pitched as "necessary". Sidecars... so many sidecars. Please stop. There's way too much vendor garbage surrounding the ecosystem and dev's rarely stop to think about whether they should deploy something when it's easy as dropping in some YAML and letting the cluster magically run it.

jq-r

Those "necessary" add-ons and sidecars are out of control, but its the people problem. I'm part of the infra team and we manage just couple of k8s clusters, but those are quite big and have very high traffic load. The k8s + terraform code is simple, with no hacks, reliable and easy to upgrade. Our devs love it, we love it too and all of this makes my job pleasant and as little stressful as possible.

But we recently hired a staff engineer to the team (now the most senior) and the guy just cannot rest still. "Oh we need a service mesh because we need visibility! I've been using it on my previous job and its the best thing ever." Even though we have all the visibility/metrics that we need and never needed more than that. Then its "we need a different ingress controller, X is crap Y surely is much better!" etc.

So its not inexperienced engineers wanting newest hotness because they have no idea how to solve stuff with the tools they have, its sometimes senior engineers trying to justify their salary, "seniority" by buying into complexity as they try to make themselves irreplaceable.

ysofunny

> Then its "we need a different ingress controller, X is crap Y surely is much better!" etc.

I regard these as traits of a junior dev. they're thinking technology-first, not problem-first

jppittma

Service mesh is complicated, but the reason you use it integrate services across clusters. That and it has a bunch of useful reverse proxy features. On the other hand, It took me and 2 other guys two evenings of blood, sweat, and tears to understand what the fuck a virtual service actually does.

It’s not strictly necessary, but if you’ve had to put in the work elsewhere, I’d use it.

cyberpunk

To be fair, istio and cilium are extremely useful tools to have under your belt.

There’s always a period of “omgwhat” when new senior engineers join and they want to improve things. There’s a short window between joining and getting bogged into a million projects where this is possible.

Embrace it I recon.

withinboredom

> So its not inexperienced engineers wanting newest hotness because they have no idea how to solve stuff with the tools they have, its sometimes senior engineers trying to justify their salary, "seniority" by buying into complexity as they try to make themselves irreplaceable.

The grass is always greener where you water it. They joined your company because the grass was greener there than anywhere else they could get an offer at. They want to keep it that way or make it even greener. Assuming that someone is doing something to become 'irreplaceable' is probably not healthy.

alienchow

How do you scale mTLS ops when the CISO comes knocking?

undefined

[deleted]

stiray

I would buy the book. Just translate all "new language" concepts into well known concepts from networking and system administration. It would be best seller.

If I would only have a penny for each time I wasted hours trying to figure out what something in "modern IT" is, just to figure out that I already knew what it is, but it was well hidden under layers of newspeak...

deivid

This[0] is my take on something like that, but I'm no k8s expert -- the post documents my first contact with k8s and what/how I understood these concepts, from a sysadmin/SWE perspective.

[0]: https://blog.davidv.dev/posts/first-contact-with-k8s/

radicalbyte

The book I read on K8S written by a core maintainer made is very clear.

carlmr

>expert blindness at work.

>It's not that complicated if you limit yourself to the core stuff.

Isn't this the core problem with a lot of technologies. There's a right way to use it, but most ways are wrong. An expert will not look left and right anymore, but to anyone entering the technology with fresh eyes it's a field with abundance of landmines to navigate around.

It's simply bad UX and documentation. It could probably be better. But now it's too late to change everything because you'd annoy all the experts.

>There's way too much vendor garbage surrounding the ecosystem

Azure has been especially bad in this regard. Poorly documented in all respects, too many confusing UI menus that have similar or same names and do different things. If you use Azure Kubernetes the wrapper makes it much harder to learn the "core essentials". It's better to run minkube and get to know k8s first. Even then a lot of the Azure stuff remains confusing.

wruza

This and a terminology rug pull. You wanted to upload a script and install some deps? Here’s your provisioning genuination frobnicator tutorial, at the end of which you’ll learn how to maintain the coalescing encabulation for your appliance unit schema, which is needed for automatic upload. It always feels like thousands times bigger complexity (just in this part!) than your whole project.

rbanffy

> There's a right way to use it, but most ways are wrong.

This is my biggest complaint. There is no simple obvious way to set it up. There is no "sane default" config.

> It's better to run minkube and get to know k8s first.

Indeed. It should be trivial to set up a cluster from bare metal - nothing more than a `dnf install` and some other command to configure core functionality and to join machines into that cluster. Even when you go the easy way (with, say, Docker Desktop) you need to do a lot of steps just to have an ingress router.

t-writescode

> Admittedly, I've been using Kubernetes since the early days and I manage an Infra team

I think this is where the big difference is. If you're leading a team and introduced all good practices from the start, then the k8s and Terraform or whatever config files can never get so very complicated that a Gordian knot isn't created.

Perhaps k8s is nice and easy to use - many of the commands certainly are, in my experience.

Developers have, over years and decades, learned how to navigate code and hop from definition to definition, climbing the tree and learning the language they're operating in, and most of the languages follow similar-enough patterns that they can crawl around.

Configuring a k8s cluster has absolutely none of that knowledge built up; and, reading something that has rough practices is not a good place to learn what it should look like.

paddy_m

Thank you. I can always xargs grep for a function name, at worst. Dir() in python at a debugger for other things. With YAML, kubernetes and other devops hotness, I frequently can’t even find the relevant scripts/YAML that are executed nor their codebases.

This also happens with configuration based packaging setups. Python hatch in particular, but sometimes node/webpack/rollup/vite.

figassis

This, all the sidecars. Use kubernetes to run your app like you would without it, take advantage of the flexibility, avoid the extra complexity. Service discovery sidecars? Why not just use the out of the box dns features?

tommica

Because new people don't know better - I've never used k8s, but have seen sidecars being promoted as a good thing, so I might have used them

karmarepellent

Agreed. The best thing we did back when we ran k8s clusters, was moving a few stateful services to dedicated VMs and keep the clusters for stateless services (the bulk) only. Running k8s for stateless services was an absolute bliss.

At that time stateful services were somewhat harder to operate on k8s because statefulness (and all that it encapsulates) was kinda full of bugs. That may certainly have changed over the last few years. Maybe we just did it wrong. In any case if you focused on the core parts of k8s that were mature back then, k8s was (and is) a good thing.

AtlasBarfed

So you don't run any databases in those thousands of clusters?

To your point, and I have not used k8s I just started to research it when my former company was thinking about shoehorning cassandra into k8s...

But there was dogma around not allowing access to VM command exec via kubectl, while I basically needed it in the basic mode for certain one-off diagnosis needs and nodetool stuff...

And yes, some of the floated stuff was "use sidecars" which also seemed to architect complexity for dogma's sake.

voidfunc

> So you don't run any databases in those thousands of clusters?

We do, but not of the SQL variety (that I am aware of). We have persistent key-value and document store databases hosted in these clusters. SQL databases are off-loaded to managed offering's in the cloud. Admittedly, this does simplify a lot of problems for us.

pas

postgresql operators are pretty nice, so it makes sense to run stateful stuff on k8s (ie. for CI, testing, staging, dev, etc.. and probably even for prod if there's a need to orchestrate shards)

> exec

kubectl exec is good, and it's possible to audit access (ie. get kubectl exec events with arguments logged)

and I guess and admissions webhook can filter the allowed commands

but IMHO it's shouldn't be necessary, the bastion host where the "kubectl exec" is run from should be accessible only through an SSH session recorder

thiht

> It's not that complicated if you limit yourself to the core stuff

Completely agree. I use Kubernetes (basically just Deployments and CronJobs) because it makes deployments simple, reliable and standard, for a relatively low cost (assuming that I use a managed Kubernetes like GKE where I don’t need to care at all about the Kubernetes engine). Using Kubernetes as a developer is really not that hard, and it gives you no vendor lock-in in practice (every cloud provider has a Kubernetes offer) and easy replication.

It’s not the only solution, not the simplest either, but it’s a good one. And if you already know Kubernetes, it doesn’t cost much to just use it.

torginus

Kubernetes starts sucking at first sight and is kind of the sign of the ailment of modern times.

Let me try to explain:

First, you encounter the biggest impedance mismatch between cloud and on prem: Kubernetes works with pods, while AWS works with instances as the unit of useful work, so they must map to each other, right?

Wrong, first each instance needs to run a Kubernetes node, which duplicates the management infrastructure hosted by AWS, and reduces the support for granularity, like if I need 4 cores for my workload, I start an instance with 4 cores, right?. Not so with k8s, you have to start up a big node, then schedule pods there.

Yay, extra complexity and overhead! And it's like when you need 3 carrots for the meal your cooking, but the store only sells it in packs of 8, you have to pay for the ones you don't need, and then figure out how to utilize the waste.

I'm not even going to talk about on-prem kubernetes, as I've never seen anyone IRL use it.

Another thing is that the need for kubernetes is manifestation of crappy modern dev culture. If you wrote your server in node, Python and Ruby, you're running a single threaded app in the era of even laptops having 32 core CPUs. And individual instances are slow, so you're even more dependent on scaling.

So, to make use of the extra CPU power, you're forced to turn to Docker/k8s and scale your infra that way, whereas if you went with something like Go, or god forbid, something as deeply uncool as ASP.NET, you could just put your 32 core server, and you get fast single threaded perf, and perfect and simple multi-threaded utilization out of the box without any of the headaches.

Also I've found stuff like rolling updates to be a gimmick.

Also a huge disclaimer, I don't think k8s is a fundamentally sucky or stupid thing, it's just I've never seen it used as a beneficial architectural pattern in practice.

erinaceousjones

Being nitpicky about Python specifically, Python not necessarily single-threaded; gunicorn gets you a multiprocess HTTP/WSGI server if you configure it to. asyncio and gevent have made it easier to do things actually-concurrently. Judicious use of generator functions lets you stream results back instead of blocking I/O for big chunks. And we have threads. Yeah, the Global Interpreter Lock is still hanging around, it's not the fastest language, but there are ways to produce a Docker image of an API written in Python which can handle thousands of concurrent HTTP requests and actively use all available CPU cores to do intensive computation.

zelphirkalt

Here I thought you would bring multiprocessing and process pool in Python.

regularfry

It really, really wants a higher-level abstraction layer over the top of it. I can see how you'd build something heroku-like with it, but exposing app developers to it just seems cruel.

switch007

> I can see how you'd build something heroku-like with it

That's what many teams end up half-arsing without realising they're attempting to build a PaaS.

They adopt K8S thinking it's almost 90% a PaaS. It's not.

They continue hiring, building a DevOps team just to handle K8S infrastructure things, then a Platform team to build the PaaS on top of it.

Then because so many people have jobs, nobody at this point wants to make an argument that perhaps using an actual PaaS might make sense. Not to mention "the sunk cost" of the DIY PaaS.

Then on top of that, realising they've built a platform mostly designed for microservices, everything then must become a microservice. 'Monolith' is a banned word.

Argh

torginus

Thing is, k8s is already an abstraction layer on top of something like AWS or GCP

ndjdjddjsjj

Run a node process on each core?

jmb99

I have pretty much the exact same opinion, right down to “cloud shit” (I have even used that exact term at work to multiple levels of management, specifically “I’m not working on cloud shit, you know what I like working on and I’ll do pretty much anything else, but no cloud shit”). I have worked on too many projects where all you need is a 4-8 vCPU VM running nginx and some external database, but for some reasons there’s like 30 containers and 45 different repos and 10k lines of terraform, and the fastest requests take 750ms which should be nginx->SQL->return in <10ms. “But it’s autoscaling and load balanced!” That’s great, we have max 10k customers online at once and the most complex requests are “set this field to true” and “get me an indexed list.” This could be hosted on a raspberry pi with better performance.

But for some reason this is what people want to do. They would rather spend hours debugging kubernetes, terraform, and docker, and spending 5 digits on cloud every month, to serve what could literally be proxied authenticated DB lookups. We have “hack days” a few times a year, and I’m genuinely debating rewriting the entire “cloud” portion of our current product in gunicorn or something, host it on a $50/month vps, point it at a mirror of our prod db, and see how many orders of magnitude of performance I can knock off in a day.

I’ve only managed to convert one “cloud person” to my viewpoint but it was quite entertaining. I was demoing a side project[0] that involved pulling data from ~6 different sources (none hosted by me), concatenating them, deduping them, doing some math, looking up in a different source an image (unique to each displayed item), and then displaying the list of final items with images in a list or a grid. ~5k items. Load time on my fibre connection was 200-250ms, sorting/filtering was <100ms. As I was demoing this, a few people asked about the architecture, and one didn’t believe that it was a 750 line python file (using multiprocessing, admittedly) hosted on an 8 core VPS until I literally showed up. He didn’t believe it was possible to have this kind of performance in a “legacy monolithic” (his words) application.

I think it’s so heavily ingrained in most cloud/web developers that this is the only option that they will not even entertain the thought that it can be done another way.

[0] This particular project failed for other reasons, and is no longer live.

jiggawatts

Speaking of Kubernetes performance: I had a need for fast scale-out for a bulk testing exercise. The gist of it was that I had to run Selenium tests with six different browsers against something like 13,000 sites in a hurry for a state government. I tried Kubernetes, because there's a distributed Selenium runner for it that can spin up different browsers in individual pods, even running Windows and Linux at the same time! Very cool.

Except...

Performance was woeful. It took forever to spin up the pods, but even once things had warmed up everything just ran in slow motion. Data was slow to collect (single-digit kilobits!), and I even saw a few timeout failures within the cluster.

I gave up and simply provisioned a 120 vCPU / 600 GB memory cloud server with spot pricing for $2/hour and ran everything locally with scripts. I ended up scanning a decent chunk of my country's internet in 15 minutes. I was genuinely worried that I'd get put on some sort of "list" for "attacking" government sites. I even randomized the read order to avoid hammering any one site too hard.

Kubernetes sounds "fun to tinker with", but it's actually a productivity vampire that sucks up engineer time.

It's the Factorio of cloud hosting.

pdimitar

> I gave up and simply provisioned a 120 vCPU / 600 GB memory cloud server with spot pricing for $2/hour and ran everything locally with scripts. I ended up scanning a decent chunk of my country's internet in 15 minutes.

Now that is a blog post that I would read with interest, top to bottom.

throwaway2037

These comments:

    > He didn’t believe it was possible to have this kind of performance in a “legacy monolithic” (his words) application.

    > I think it’s so heavily ingrained in most cloud/web developers that this is the only option that they will not even entertain the thought that it can be done another way.

One thing that I need to remind myself of periodically: The amount of work that a modern 1U server can do in 2024 is astonishing.

sgarland

Hell, the amount of work that an OLD 1U can do is absurd. I have 3x Dell R620s (circa-2012), and when equipped with NVMe drives, they match the newest RDS instances, and blow Aurora out of the water.

I’ve tested this repeatedly, at multiple companies, with Postgres and MySQL. Everyone thinks Aurora must be faster because AWS is pushing it so hard; in fact, it’s quite slow. Hard to get around physics. My drives are mounted via Ceph over Infiniband, and have latency measured in microseconds. Aurora (and RDS for that matter) has to traverse much longer physical distances to talk to its drives.

HPsquared

It's nice to think about the amount of work done by game engines, for instance. Factorio is a nice example, or anything graphics-heavy.

mschuster91

I mostly agree with you, but at least using Docker is something one should be doing even if one is on bare metal.

Pure bare metal IME only leads to people ssh'ing to hotfix something and forgetting to deploy it. Exclusively using Docker images prevents that. Also, it makes firewall management much, much easier as you can control containers' network connectivity (including egress) each on their own, on a bare-metal setup it involves loads of work with network namespaces and fighting the OS-provided systemd unit files.

mrweasel

I wouldn't throw "cloud shit" in the same bucket as Kubernetes. Professional cloud services are mostly great, but super expensive.

Kubernetes is interesting, because it basically takes everything you know and sort of pushes it down the stack (or up, depending on your viewpoint). To some extend I get the impression that the idea was: Wouldn't it be great if we took all the infrastructure stuff, and just merged the whole thing into one tool, which you can configure using a completely unsuitable markup language. The answer is that "No, that would in fact not be great".

For me the issue is that Kubernetes is overused. You can not really argue that it's not useful or has its place, but that place is much much small than the Internet wants us to believe. It's one of the few services where I feel like "Call us" would be an appropriate sales method.

The article is correct, you probably don't need Kubernetes. It's a amazing piece of technology, but it's not for the majority of us. It is and should be viewed as a niche product.

nixpulvis

What do you think its niche is?

mrweasel

Very large "single product" services, the focus being on "very large". It would also be relevant in the cases where your product is made up of a large number of micro-services, though if you have more than 50 services to support your main product I'd be interested in knowing why you have that many services. There might be completely sane reasons, but it is a special case.

Mostly a question of scale to me, I'd guess that the majority (80-90%) of people running Kubernetes don't have large enough scale that it makes sense to take on the extra complexity. Most Kubernetes installations I've seen runs on VMs, three for control plane and 2 - 5 for worker node and I don't think the extra layer is a good trade off for a "better" deployment tool.

If you do use Kubernetes as a deployment tool, then I can certainly understand that. It is a reasonably well-known, and somewhat standardised interface and there's not a lot of good alternatives for VMs and bare metal. Personally I'd just much rather see better deployment tools being developed, rather than just picking Kubernetes because Helm charts are a thing.

You'd also need to have a rather dynamic workload, in the sense that some of your services is need a lot of capacity at one point in time, while other need the capacity at another time. If you have constant load, then why?

It's like Oracle's Exadata servers, it's a product that has its place, but the list of potential buyers isn't that long.

stiray

I agree, but what pisses me the most is that today higher level abstractions (like cloud, spring boost,...) are hiding lower level functionality so well, that you are literally forced to use obnoxious amounts of time to study documentation (if you are in luck and it is well written), while everything is decorated with new naming of known concepts that was invented by people who didn't know that the concept already exists and has a name or some marketing guy figured out it will sell better with more "cool" name.

Like Shakespeare work would be clumsy and half translated to french advertising jargoon and you are forced to read it and make it work on a stage.

datadeft

The real benefits of LLMs is to give me a good summary of these documents. Still, using these abstractions is probably not worth it.

PittleyDunkin

Well, a summary. It's impossible to evaluate the quality of the summary without actually reading the document you're trying to avoid reading.

stiray

LLMs are a symptom, not a solution. Just another over-hyped bullshit (those downvoting will agree in few years or never as they are the ones riding the hype) that its only concern is to boost company stocks. Google is the proof, their search started to suck immediately when they added AI into concept. It promotes bullshit, while it doesn't hit the really relevant results, even if you specify them into the details. If AI would have any real value, it would never be given out for free and this is the case: they have a solution and they are searching for a problem it solves.

andyjohnson0

Kubernetes was developed by Google, and it seems to me it's a classic example of a technology that can work well in a resource-rich (people, money) environment. But in an averge business it becomes a time/money sink quite quickly.

Way too many businesses cargo-cult themselves into thinking that they need FAANG-class infra, even though they haven't got the same scale or the same level of resourcing. Devs and ops people love it because they get to cosplay and also get the right words on their CV.

If you're not Google-scale then, as you say, a few VMs or physical boxes are all you need for most systems. But its not sexy, so the business needs people who don't mind that.

xorcist

K8s is absolutely not Google-scale. Not even close. The scheduler is not built for it.

chromanoid

IMO the actual experience you want is something as simple as you can do with those old-school simple PHP-based "serverless" web hosters. You just upload your stuff and maybe configure a database URL and that's it.

Regarding AWS, function.zip Lambda + DynamoDB + S3 + SQS is basically this at "enterprise-grade".

Now you have the in-between left, where you want enterprise-grade (availability, scaling and fault tolerance) but with a catch (like lower costs, more control, data sovereignty or things that are not as serverless as you want, e.g. search engine etc.). In these cases you will run into many trade-off decisions, that may lead you to K8s, cloud specific stacks and also simple VM ClickOps / ShellOps setups.

As long as you can still be on the pet instead of cattle range of problems, K8s is not what you want. But as soon as you want cattle, reproducible throw-away online environments etc. "just have a VM or a physical server" will become a "build your own private cloud solution" that may or may not become a bad private cloud / K8s.

devjab

I don’t mind the cloud, but even in enterprise organisations I fail to see the value of a lot of the more complex tools. I’ve anlways worked with Azure because Denmark is basically Microsoft territory in a lot of non-tech organisations because of the synergy between pricing and IT operations staff.

I’ve done bicep, terraform and both Kubernetes and the managed (I forgot what azure conteiner apps running on top of what is basically Kubernetes is called). When I can get away with it I always use the Azure CLI through bash scripts in a pipeline however and build directly into Azure App services for contained which is just so much less complicated than what you probably call “cloud shit”. The cool part about the Azure CLI and their app services is that it hasn’t really changed in the past 3 years, and they are almost one size fit any organisation. So all anyone needs to update in the YAML scripts are the variables. By contrast working with Bicep/Terraform, Jenkins and whatever else people use has been absolutely horrible, sometimes requiring full time staff just to keep it updated. I suppose it may be better now that azure co-pilot can probably auto-generate what you need. A complete waste of resources in my opinion. It used to be more expensive, but with the last price hike of 600% on azure container apps it’s usually cheaper. It’s also way more cost efficient in terms of maintaining since it’ll just work after the initial setup pipeline has run. This is the only way I have found that is easier than what it was when organisations ran their own servers. Whether it was in the basement or at some local hardware house (not exactly sure what you call the places where you rent server rack space). Well places like Digital Ocean are even easier but they aren’t used by enterprise.

I’m fairly certain I’ve ever worked with an organisation that needed anything more than that since basically nothing in Denmark scales beyond what can run on a couple of servers behind a load balancer. One of the few exceptions is the tax system which sees almost 0 usage except for the couple of weeks where the entire adult population logs in in at the same time. When DevOps teams push back, I tend to remind them that StackOverflow ran on a couple of IIS servers for a while and that they don’t have even 10% of the users.

Eventually the business case for Azure will push people back to renting hardware space or jumping to Hetzner and similar. But that’s a different story.

DanielHB

Terraform has the same problem as Kubernetes sidecars with terraform providers trying to do magic for you. If you stick to the cloud platform provider it is actually much nicer than using the CLI.

Although my experience is with AWS, I find the terraform AWS provider docs better documentation than the official AWS docs for different options. If they don't answer any question I have right away they at least point me where to look for answers in the mess that is AWS docs.

MortyWaves

This was a good read! I have similar thoughts especially about IaC vs a bash script. Definitely clear pros and cons to both, but I’m wondering how you handle infrastructure drift with imperative bash scripts?

I mean hopefully no one is logging into Azure to fuck with settings but I’m sure we’ve all worked with that one team that doesn’t give a flying fuck about good practices.

Or say you wish to now scale up a VM, how does your bash script deal with that?

Do you copy past the old script, pass new flags to the Azure CLI, and then run that, then delete the old infrastructure somehow?

I’m curious because I think I’d like to try your approach.

devjab

I think you’ve nailed the issues with this approach. I think the best approach to control “cowboy” behaviour is to make everything run through a service connection so that developers don’t actually need access to your azure resources. Though to be fair, I’ve never worked with a non-tech enterprise organisation where developers didn’t have at least some access into Azure directly. I also think the best way to avoid dangers in areas like networking is to make sure the responsibility for these are completely owned by IT-Operations. With VNETs and private DNS zones in places all you really need to allow is for the creation of private end points and integration to the network resources. Similarity I think it’s best practice to have things like key vaults managed by IT operations with limited developer access, but this can depend on your situation.

One of the things I like about the Azure CLI is that it rarely changes. I would like to clarify that I’m mainly talking about Azure App Services and not VMs. Function apps for most things, web apps for things like APIs.

As far as the script goes they are basically templates which are essentially “copy paste”. One of the things I tend to give developers in these organisations is “skeleton” projects that they can git clone. So they’ll typical also have some internal CLI scripts to automate a lot of the code generation and an azure-pipelines-resource-creation.yml plays into this. Each part of your resource creation is its own “if not exist” task. So there is a task to create a resource group. Then a task to create an app service plan and so on.

It won’t scale. But it will scale enough for every organisation I’ve worked with.

To be completely honest it’s something which grew out of my frustration of repeating the same tasks in different ways over the years. I don’t remember exactly but I think quite a few of the AZ CLI commands haven’t change for the past three years. It’s really the one constant across organisations, even the Azure Poetal hasn’t remained the same.

paxys

> Kubernetes comes with substantial infrastructure costs that go beyond DevOps and management time. The high cost arises from needing to provision a bare-bones cluster with redundant management nodes.

That's your problem right there. You really don't want to be setting up and managing a cluster from scratch for anything less than a datacenter-scale operation. If you are already on a cloud provider just use their managed Kubernetes offering instead. It will come with a free control plane and abstract away most of the painful parts for you (like etcd, networking, load balancing, ACLs, node provisioning, kubelets, proxies). That way you just bring your own nodes/VMs and can still enjoy the deployment standardization and other powerful features without the operational burden.

dikei

Even for on-prem scenario, I'd rather maintain a K8S control plane and let developer teams manage their own apps deployment in their own little namespace, than provisioning a bunch of new VMs each time a team need some services deployed.

mzhaase

This for me is THE reason for using container management. Without containers, you end up with hundreds of VMs. Then, when the time comes that you have to upgrade to a new OS, you have to go through the dance, for every service:

- set up new VMs

- deploy software on new VMs

- have the team responsible give their ok

It takes forever, and in my experience, often never completes because some snowflake exists somewhere, or something needs a lib that doesn't exist on the new OS. VMs decouple the OS from the hardware, but you should still decouple the service from the OS. So that means containers. But then managing hundreds of containers still sucks.

With container management, I just

- add x new nodes to cluster

- drain x old nodes and delete them

rtpg

Even as a K8s hater, this is a pretty salient point.

If you are serious about minimizing ops work, you can make sure people are deploying things in very simple ways, and in that world you are looking at _very easy_ deployment strategies relative to having to wire up VMs over and over again.

Just feels like lots of devs will take whatever random configs they find online and throw them over the fence, so now you just have a big tangled mess for your CRUD app.

guitarbill

> Just feels like lots of devs will take whatever random configs they find online

Well it usually isn't a mystery. Requiring a developer team to learn k8s likely with no resources, time, or help is not a recipe for success. You might have minimised someone else's ops work, but at what cost?

dikei

> Just feels like lots of devs will take whatever random configs they find online and throw them over the fence, so now you just have a big tangled mess for your CRUD app.

Agree.

To reduce the chance a dev pull some random configs out of nowhere, we maintain a Helm template that can be used to deploy almost all of our services in a sane way, just replace the container image and ports. The deployment is probably not optimal, but further tuning can be done after the service is up and we have gathered enough metrics.

We've also put all our configs in one place, since we found that devs tend to copy from existing configs in the repo before searching the internet.

spockz

I can imagine. Do you have complete automation setup around maintaining the cluster?

We are now on-prem using “pet” clusters with namespace as a service automated on it. This causes all kinds of issues with different workloads with different performance characteristics and requirements. They also share ingress and egress nodes so impact on those has a large blast radius. This leads to more rules and requirements.

Having dedicated and managed clusters where everyone can determine their sizing and granularity of workloads to deploy to which cluster is paradise compared to that.

solatic

> This causes all kinds of issues with different workloads with different performance characteristics and requirements.

Most of these issues can be fixed by setting resource requests equal to limits and using integer CPU values to guarantee QoS. You should also have an interface with developers explaining which nodes in your datacenter have which characteristics, using node labels and taints, and force developers to pick specific node groups as such by specifying node affinity and tolerations, by not bringing online nodes without taints.

> They also share ingress and egress nodes so impact on those has a large blast radius.

This is true regardless of whether or not you use Kubernetes.

DanielHB

> than provisioning a bunch of new VMs each time a team need some services deployed.

Back in the old days before cloud providers this was the only option. I started my career in early 2010s and got the tailend of this, it was not fun.

I remember my IT department refusing to set up git for us (we were using SVN before) so we just asked a VM and set up a git repo in there ourselves to host our code.

jillesvangurp

For most small setups, the cost of running an empty kubernetes cluster (managed) is typically higher than setting up a db, a couple of vms and a loadbalancer, which goes a long way for running a simple service. Add some buckets, a CDN and you are pretty much good to go.

If you need dedicated people just to stay on top of running your services, you have a problem that's costing you hundreds of thousands per year. There's a lot of fun and easy stuff you can do with that kind of money. This is a pattern I see with a lot of teams that get sucked into using Kubernetes, micro services, terraform, etc. Once you need a few people just to stay on top of the complexity that comes from that, you are already spending a lot. I tend to keep things simple on my own projects because any amount of time I spend on that, I'm not spending on more valuable work like adding features, fixing bugs, etc.

Of course it's not black and white and there's always a trade off between over and under engineering. But a lot of teams default to over engineering simply by using Kubernetes from day one. You don't actually need to. There's nothing wrong with a monolith running on two simple vms with a load balancer in front of it. Worked fine twenty years ago and it is still perfectly valid. And it's dead easy to setup and manage in most popular cloud environments. If you use some kind of scaling group, it will scale just fine.

dikei

> For most small setups, the cost of running an empty kubernetes cluster (managed) is typically higher than setting up a db, a couple of vms and a loadbalancer, which goes a long way for running a simple service.

Not really, the cost of an empty EKS cluster is the management fee of $0.1/hour, or roughly the price of a small EC2 instance.

jillesvangurp

0.1 * 24 * 30 = 720$/month

That's about 2x our monthly cloud expenses. That's not a small VM. You can buy a mac mini for that.

oofbey

If you do find yourself wanting to create a cluster by hand, it's probably because you don't actually need lots of machines in the "cluster". In my experience it's super handy to run tests on a single-node "cluster", and then k3s is super simple. It takes something like 8 seconds to install k3s on a bare CI/CD instance, and then you can install your YAML and see that it works.

Once you're used to it, the high-level abstractions of k8 are wonderful. I run k3s on raspberry pi's because it takes care of all sorts of stuff for you, and it's easy to port code and design patterns from the big backend service to a little home project.

sbstp

Most control planes are not free anymore, they cost like 70$/mo on AWS & GCP. Used to be a while back.

dikei

GCP has $74 free credit for Zonal cluster, so you effectively have the first cluster for free.

And even $70 is cheap, considering that a cluster should be shared by all the services from all the teams in the same environment, bar very few exceptions.

szszrk

That's around the cost of a single VM (cheapest 8GB ram I found quickly).

Azure has a free tier with control plane completely free (but no SLA) - great deal for test clusters and testing infra.

If you are that worried about costs, then public cloud may not be for you at all, or you should look at ECS/App containers or serverless.

ants_everywhere

People talk about Kubernetes as container orchestration, but I think that's kind of backwards.

Kubernetes is a tool for creating computer clusters. Hence the name "Borg" (Kubernetes's grandpa) referring to assimilating heterogeneous hardware into a collective entity. Containers are an implementation detail.

Do you need a computer cluster? If so k8s is pretty great. If you don't care about redundancy and can get all the compute you need out of a single machine, then you may not need a cluster.

Once you're using containers on a bunch of VMs in different geographical regions, then you effectively have hacked together a virtual cluster. You can get by without k8s. You just have to write a lot of glue code to manage VMs, networking, load balancing, etc on the cloud provider you use. The overhead of that is probably larger than just learning Kubernetes in the long run, but it's reasonable to take on that technical debt if you're just trying to move fast and aren't concerned about the long run.

stickfigure

K8s doesn't help you solve your geographical region problem, because the geographical region problem is not running appserver instances in multiple regions. Almost any PaaS will do that for you out of the box, with way less fuss than k8s. The hard part is distributing your data.

Less overhead than writing your own glue code, less overhead than learning Kubernetes, is just use a PaaS like Google App Engine, Amazon Elastic Beanstalk, Digital Ocean App Platform, or Heroku. You have access to the same distributed databases you would with k8s.

Cloud Run is PaaS for people that like Docker. If you don't even want to climb that learning curve, try one of the others.

photonthug

> just use a PaaS like Google App Engine, Amazon Elastic Beanstalk, Digital Ocean App Platform, or Heroku.

This is the right way for web most of the time, but most places will choose k8s anyway. It’s perplexing until you come to terms with the dirty secret of resume driven development, which is that it’s not just junior engs but lots of seniors too and some management that’s all conspiring to basically defraud business owners. I think the unspoken agreement is that Hard work sucks, but easy work that helps you learn no transferable skills might be worse. The way you evaluate this tradeoff predictably depends how close you are to retirement age. Still, since engineers are often disrespected/discarded by business owners and have no job security, oaths of office, professional guilds, or fiduciary responsibility.. it’s no wonder things are pretty mercenary out there.

Pipelines are as important as web these days but of course there are many options for pipelines as a service also.

K8s is the obviously correct choice for teams that really must build new kinds of platforms that have many diverse kinds of components, or have lots of components with unique requirements for coupling (like say “scale this thing based on that other thing”, but where you’d have real perf penalties for leaving the k8s ecosystem to parse events or whatever).

The often mentioned concern about platform lock in is going to happen to you no matter what, and switching clouds completely rarely happens anyway. If you do switch, it will be hard and time consuming no matter what.

To be fair, k8s also enables brand new architectural possibilities that may or may not be beautiful. But it’s engineering, not art, and beautiful is not the same as cheap, easy, maintainable, etc.

vrosas

PaaS get such a bad rap from devs in my experience, even though they would solve so many problems. They'd rather keep their k8s clusters scaled to max traffic and spend their nights dealing with odd networking and configuration issues than just throw their app on Cloud Run and call it a day.

Spivak

This has got to be the most out there k8s take I've read in a while. k8s doesn't save you from learning your cloud providers infrastructure, you have to learn k8s in addition to your cloud provider's infrastructure. It's all ALBs, ASGs, Security Groups, EBS Volumbes and IAM policy underneath and k8s, while very clever, isn't so clever as to abstract much of any of it away from you. On EKS you get to enjoy more odd limitations with your nodes than EC2 would give you on its own.

You're already building on a cluster, your cloud provider's hypervisor. They'll literally build virtual compute of any size and shape for you on demand out of heterogeneous hardware and the security guarantees are much stronger than colocated containers on k8s nodes.

There are quite a few steps between single server and k8s.

ants_everywhere

The same argument can be made about Borg. Someone at Google needs to know about things like Juniper switches and hard drive bays. Someone needs to repair or replace defective compute nodes and power switches. Before they had software load balancing, someone had to manage the hardware load balancers.

But the idea of Borg is that all of that's abstracted away for the typical developer. It's the same with k8s. The infrastructure team in your org needs to understand the implementation details, but really only a few.

You can also configure load balancers, IAM groups & policy etc as k8s CRDs. So all that stuff can be in one place in the code base alongside the rest of your infrastructure. So in that sense it does abstract those concepts. You still need to know something about them, but you don't have to configure them programmatically yourself since k8s will do that.

psini

You can self host Kubernetes on "dumb" VMs from Hetzner or OVH.

p_l

K8s was designed around deployment on premise on bare metal hardware.

The cloud extensions were always just a convenience.

davidgl

Yep, it's a cluster OS. If you need to run a cluster, you need to explore and understand the trade offs of k8s versus other approaches. Personally I run a small cluster on k3s, for internal tools, and love it. Yes it's a load of new abstractions to learn, but once learnt if really helps in designing large scalable systems. I manage lots of pet machines and VMs for clients, and it would be soooo much easier on k8.

politelemon

I like to describe it similarly, but as a way of building platforms.

ashishmax31

Exactly. I've come to describe k8s as a distributed operating system for servers.

K8s tries to abstract away individual "servers" and gives you an API to interact with all the compute/storage in the cluster.

_flux

What is the container orchestration tool of choice beyond docker swarm, then?

rixed

Is nomad still around?

_flux

Thanks, hadn't heard of that.

Seems pretty active per its commit activity: https://github.com/hashicorp/nomad/graphs/commit-activity

But the fact that I hadn't heard of it before makes it sound not very popular, at least not for the bubble I live in :).

Does anyone have any practical experiences to share about it?

otabdeveloper4

> Containers are an implementation detail.

They really aren't.

Personally I have a big Nix derivation to deploy my (heterogeneous) cluster to bare metal.

None of the k8s concepts or ideas apply here.

pas

how does it work? how many nodes? what do you use for consensus, request routing, etc...?

lkrubner

Interesting that the mania for over-investment in devops is beginning to abate. Here on Hacker News I was a steady critic of both Docker and Kubernetes, going to at least 2017, but most of these posts were unpopular. I have to go back to 2019 to find one that sparked a conversation:

https://news.ycombinator.com/item?id=20371961

The stuff I posted about Kubernetes did not draw a conversation, but I was simply documenting what I was seeing: vast over-investment in devops even at tiny startups that were just getting going and could have easily dumped everything on a single server, exactly as we used to do things back in 2005.

OtomotO

It's just the hype moving on.

Every generation has to make similar mistakes again and again.

I am sure if we had the opportunity and the hype was there we would've used k8s in 2005 as well.

The same thing is true for e.g. JavaScript on the frontend.

I am currently migrating a project from React to HTMX.

Suddenly there is no build step anymore.

Some people were like: "That's possible?"

Yes, yes it is and it turns out for that project it increases stability and makes everything less complex while adding the exact same business value.

Does that mean that React is always the wrong choice?

Well, yes, React sucks, but solutions like React? No! It depends on what you need, on the project!

Just as a carpenter doesn't use a hammer to saw, we as a profession should strive to use the right tool for the right job. (Albeit it's less clear than for the carpenter, granted)

sarchertech

>Just as a carpenter doesn't use a hammer to saw, we as a profession should strive to use the right tool for the right job. (Albeit it's less clear than for the carpenter, granted)

The problem is that most devs don’t view themselves as carpenters. They view themselves as hammer capenters or saw carpenters etc…

It’s not entirely their fault, some of the tools are so complex that you really need to devote most of your time to 1 of them.

I realize that this kind of tool specialization is sometimes required, but I that it’s overused by at the very least an order of magnitude.

The vast majority of companies that are running k8s, react, kafka etc… with a team of 40+, would be better off running rails (or similar) on heroku (or similar), or a VPS, or a couple servers in the basement. Most of these companies could easily replace their enormous teams of hammer carpenters and saw carpenters with 3-4 carpenters.

But devs have their own gravity. The more devs you have the faster you draw in new ones, so it’s unclear to me if a setup like the above is sustainable long term outside of very specific circumstances.

But if it were simpler there wouldn’t be nearly many jobs, so I really shouldn’t complain. And it’s not like every other department isn’t also bloated.

ajayvk

Along those lines, I am building https://github.com/claceio/clace for teams to deploy internal tools. It provides a Cloud Run type interface to run containers, including scaling down to zero. It implements an application server than runs containerized apps.

Since HTMX was mentioned, Clace also makes it easy to build Hypermedia driven apps.

MortyWaves

Would you be open to non Python support as well? This tool seems useful, very useful in fact, but I mainly use .NET (which yes can run very well in containers).

esperent

> Just as a carpenter doesn't use a hammer to saw, we as a profession should strive to use the right tool for the right job

I think this is a gross misunderstanding of the complexity of tools available to carpenters. Use a saw. Sure, electric, hand powered? Bandsaw, chop saw, jigsaw, scrollsaw? What about using CAD to control the saw?

> Suddenly there is no build step anymore

How do you handle making sure the JS you write works on all the browsers you want to support? Likewise for CSS: do you use something like autoprefixer? Or do you just memorize all the vendor prefixes?

OtomotO

Htmx works on all browsers I want to support.

I don't use any prefixed CSS and haven't for many years.

Last time I did knowingly and voluntarily was about a decade ago.

creesch

As far as browser prefixes go, you know that browser vendors have largely stopped using those? Not even recently, that process started already way back in 2016. Chances are that if you are using prefixes in 2024 you are supporting browsers versions who, by all logic, should no longer have internet access because of all the security implications....

augbog

It's actually kinda hilarious how RSC (React Server Components) is pretty much going back to what PHP was but yeah proves your point as hype moves on people begin to realize why certain things were good vs not

fud101

where does tailwind stand on this? you can use it without a build step but it's strongly recommended in production

fer

A build step in your pipeline is fine because, chances are, you already have a build step in there.

raxxorraxor

[dead]

harrall

People gravely miss-understand containerization and Docker.

All it lets you do is put shell commands into a text file and be able to run it self-contained anywhere. What is there to hate?

You still use the same local filesystem, the same host networking, still rsync your data dir, still use the same external MySQL server even if you want -- nothing has changed.

You do NOT need a load balancer, a control plane, networked storage, Kubernetes or any of that. You ADD ON those things when you want them like you add on optional heated seats to your car.

skydhash

Why would you want to run it anywhere. People mostly select an OS and just update that. It may be great when distributing applications for others to host, but not when it’s the only strategy. I have to reverse engineer dockerfiles when the developer wouldn’t provide a proper documentation.

ftmch

OS upgrades are a pain. Even just package updates could break everything. Having everything in containers makes migrating to another system much easier.

sobellian

I've worked at a few tiny startups, and I've both manually administered a single server and run small k8s clusters. k8s is way easier. I think I've spent 1, maybe 2 hours on devops this year. It's not a full-time job, it's not a part-time job, it's not even an unpaid internship. Perhaps at a bigger company with more resources and odd requirements...

nicce

But how much this costs extra? Sounds like you are using cloud-provided k8s.

sobellian

EKS is priced at $876 / yr / cluster at current rates.

Negligible for me personally, it's much less than either our EC2 or RDS costs.

p_l

I currently use k8s to control bunch of servers.

The amount of work/cost of using k8s for handling them in comparison to doing it "old style" is probably negative by now.

valenterry

So, let's say you want to deploy server instances. Let's keep it simple and say you want to have 2 instances running. You want to have zero-downtime-deployment. And you want to have these 2 instances be able to access configuration (that contains secrets). You want load balancing, with the option to integrate an external load balancer. And, last, you want to be able to run this setup both locally and also on at least 2 cloud providers. (EDIT: I meant to be able to run it on 2 cloud providers. Meaning, one at a time, not both at the same time. The idea is that it's easy to migrate if necessary)

This is certainly a small subset of what kubernetes offers, but I'm curious, what would be your goto-solution for those requirements?

bruce511

That's an interesting set of requirements though. If that is indeed your set of requirements then perhaps Kubernetes is a good choice.

But the set seems somewhat arbitrary. Can you reduce it further? What if you don't require 2 cloud providers? What if you don't need zero-downtime?

Indeed given that you have 4 machines (2 instances, x 2 providers) could a human manage this? Is Kubernetes overkill?

I ask this merely to wonder. Naturally if you are rolling out hundreds of machines you should, and no doubt by then you have significant revenue (and thus able to pay for dedicated staff) , but where is the cross-over?

Because to be honest most startups don't have enough traction to need 2 servers, never mind 4, never mind 100.

I get the aspiration to be large. I get the need to spend that VC cash. But I wonder if Devops is often just premature and that focus would be better spent getting paying customers.

valenterry

> Can you reduce it further? What if you don't require 2 cloud providers? What if you don't need zero-downtime?

I think the "2 cloud providers" criteria is maybe negotiable. Also, maybe there was a misunderstanding: I didn't mean to say I want to run it on two cloud providers. But rather that I run it on one of them but I could easily migrate to the other one if necessary.

The zero-downtime one isn't. It's not necessarily so much about actually having zero-downtime. It's about that I don't want to think about it. Anything besides zero-downtime actually adds additional complexity to the development process. It has nothing to do with trying to be large actually.

osigurdson

"Imagine you are in a rubber raft, you are surrounded by sharks, and the raft just sprung a massive leak - what do you do?". The answer, of course, is to stop imagining.

Most people on the "just use bash scripts and duct tape" side of things assume that you really don't need these features, that your customers are ok with downtime and generally that the project that you are working on is just your personal cat photo catalog anyway and don't need such features. So, stop pretending that you need anything at all and get a job at the local grocery store.

The bottom line is there are use cases, that involve real customers, with real money that do need to scale, do need uptime guarantees, do require diverse deployment environments, etc.

QuiDortDine

Yep. I'm one of 2 Devops at an R&D company with about 100 employees. They need these services for development, if an important service goes down you can multiply that downtime by 100, turning hours into man-days and days into man-months. K8 is simply the easiest way to reduce the risk of having to plead for your job.

I guess most businesses are smaller than this, but at what size do you start to need reliability for your internal services?

ozim

You know that you can scale servers just as well, you can use good practices with scripts and deployments in bash and having them documented and in version control.

Equating bash scripts and running servers to duct taping and poor engineering vs k8s yaml being „proper engineering„ is well wrong.

caseyohara

I think you are proving the point; there are very, very few applications that need to run on two cloud providers. If you do, sure, use Kubernetes if that makes your job easier. For the other 99% of applications, it’s overkill.

Apart from that requirement, all of this is very doable with EC2 instances behind an ALB, each running nginx as a reverse proxy to an application server with hot restarting (e.g. Puma) launched with a systemd unit.

osigurdson

To me that sounds harder than just using EKS. Also, other people are more likely to understand how it works, can run it in other environments (e.g. locally), etc.

globular-toast

Hmm, let's see, so you've got to know: EC2, ALB, Nginx, Puma, Systemd, then presumably something like Terraform and Ansible to deploy those configs, or write a custom set of bash scripts. And all of that and you're tied to one cloud provider.

Or, instead of reinventing the same wheels for Nth time, I could just use a set of abstractions that work for 99% of network services out there, on any cloud or bare metal. That set of abstractions is k8s.

valenterry

Sorry, that was a misunderstanding. I meant that I want to be able to run it on two cloud providers, but one at a time is fine. It just means that it would be easy to migrate/switch over if necessary.

tootubular

My personal goto-solution for those requirements -- well 1 cloud provider, I'll follow up on that in a second -- would be using ECS or an equivalent service. I see the OP was a critic of Docker as well, but for me, ECS hits a sweet spot. I know the compute is at a premium, but at least in my use-cases, it's so far been a sensible trade.

About the 2 cloud providers bit. Is that a common thing? I get wanting migrate away from one for another, but having a need for running on more than 1 cloud simultaneously just seems alien to me.

mkesper

Last time I checked ECS was even more expensive than using Lambda but without the ability of fast starting your container, so I really don't get the niche it fits into, compared to Lambda on one side and self-hosting docker on minimal EC2 instances on the other side.

valenterry

Actually, I totally agree. ECS (in combination with secret manager) is basically fulfilling all needs, except being not so easy to reproduce/simulate locally and of course with the vendor lock-in.

shrubble

Do you know of actual (not hypothetical) cases, where you could "flip a switch" and run the exact same Kubernetes setups on 2 different cloud providers?

InvaderFizz

I run clusters on OKE, EKS, and GKE. Code overlap is like 99% with the only real differences all around ingress load balancers.

Kubernetes is what has provided us the abstraction layer to do multicloud in our SaaS. Once you are outside the k8s control plane, it is wildly different, but inside is very consistent.

threeseed

Yes. I've worked on a number of very large banking and telco Kubernetes platforms.

All used multi-cloud and it was about 95% common code with the other 5% being driver style components for underlying storage, networking, IAM etc. Also using Kind/k3d for local development.

devops99

Both EKS (Amazon) and GKE (Google Cloud) run Cilium for the networking part of their managed Kubernetes offerings. That's the only real "hard part". From the users' point of view, the S3 buckets, the network-attached block devices, and compute (CRIO container runtime) are all the same.

You are using some other cloud provider or want uniformity there's https://Talos.dev

hi_hi

Yes, but it would involve first setting up a server instance and then installing k3s :-)

brodo

If you are located in germany and run critial IT infrastructure (banks, insurance companies, energy companies) you have to be able to deal with a cloud provider completely going down in 24 houres. Not everyone who has to can really do it, but the big players can.

stitched2gethr

I'm just happy to see the tl;dr at the TOP of the document.

kccqzy

I've worked at tiny startups before. Tiny startups don't need zero-downtime-deployment. They don't have enough traffic to need load balancing. Especially when you are running locally, you don't need any of these.

anon7000

Tiny startups can’t afford to loose customers because they can’t scale though, right? Who is going to invest in a company that isn’t building for scale?

Tiny startups are rarely trying to build projects for small customer bases (eg little scaling required.) They’re trying to be the next unicorn. So they should probably make sure they can easily scale away from tossing everything on the same server

p_l

Tiny startups don't have money to spend on too much PaaS or too many VMs or faff around with custom scripts for all sorts of work.

Admittedly, if you don't know k8s, it might be non-starter... but if you some knowledge, k3s plus cheap server is a wonderful combo

whatever1

Why does a startup need zero-downtime-deployment? Who cares if your site is down for 5 seconds? (This is how long it takes to restart my Django instance after updates).

valenterry

Because it increases development speed. It's maybe okay to be down for 5 seconds. But if I screw up, I might be down until I fix it. With zero-downtime deployment, if I screw up, then the old instances are still running and I can take my time to fix it.

everfrustrated

If you're doing CD where every push is an automated deploy a small company might easily have a hundred deploys a day.

So you need seamless deployments.

lkjdsklf

We’ve been deploying software like this for a long ass time before kubernetes.

There’s shitloads of solutions.

It’s like minutes of clicking in a ui of any cloud provider to do any of that. So doing it multiple times is a non issue.

Or automate it with like 30 lines of bash. Or chef. Or puppet. Or salt. Or ansible. Or terraform. Or or or or or.

Kubernetes brings in a lot of nonsense that isn’t worth the tradeoff for most software.

If you feel it makes your life better, then great!

But there’s way simpler solutions that work for most things

valenterry

I'm actually not using kubernetes because I find it too complex. But I'm looking for a solution for that problem and I haven't found one, so I was wondering what OP uses.

Sorry, but I don't want to "click in a UI". And it is certainly not something you can just automate with 30 lines of bash. If you can, please elaborate.

pclmulqdq

The attraction of this stuff is mostly the ability to keep your infrastructure configurations as code. However, I have previously checked in my systemd cofig files for projects and set up a script to pull them on new systems.

It's not clear that docker-compose or even kubernetes* is that much more complicated if you are only running 3 things.

* if you are an experienced user

honkycat

Having done both: running a small Kubernetes cluster is simpler than managing a bunch of systemd files.

worldsayshi

Yeah this is my impression as well which makes me not understand the k8s hate.

santoshalper

As an industry, we spent so much time sharpening our saw that we nearly forgot to cut down the tree.

sunshine-o

Kubernetes, as an industry standard that a lot of people complain about is just a sitting duck waiting to be disrupted.

Anybody who doesn't have the money, time or engineering resources will jump on whatever appear as a decent alternative.

My intuition is that alternative already exist but I can't see it...

A bit like Spring emerged as an alternative to J2EE or what HTMX is to React & co.

Is it k3s or something more radical?

Is it on a chinese Github?

ftmch

I wish Docker Swarm would get more attention. It could be the perfect Kubernetes lightweight alternative. Instead it seems like it could get deprecated any day now.

rozap

ZIRP is over.

philbo

We migrated to Cloud Run at work last year and there were some gotchas that other people might want to be aware of:

1. Long running TCP connections. By default, Cloud Run terminates inbound TCP connections after 5 minutes. If you're doing anything that uses a long-running connection (e.g. a websocket), you'll want to change that setting otherwise you will have weird bugs in production that nobody can reproduce in local. The upper limit on connections is 1 hour, so you will need some kind of reconnection logic on clients if you're running longer than that.

Ref: https://cloud.google.com/run/docs/configuring/request-timeou...

2. First/second generation. Cloud Run has 2 separate execution environments that come with tradeoffs. First generation emulates Linux (imperfectly) and has faster cold starts. Second generation runs on actual Linux and has faster CPU and faster network throughput. If you don't specify a choice, it defaults to first generation.

Ref: https://cloud.google.com/run/docs/about-execution-environmen...

3. Autoscaling. Cloud Run autoscales at 60% CPU and you can't change that parameter. You'll want to monitor your instance count closely to make sure you're not scaling too much or too little. For us it turned out to be more useful to restrict scaling on request count, which you can control in settings.

Ref: https://cloud.google.com/run/docs/about-instance-autoscaling

czhu12

I'm not sure google cloud run can be considered a fair comparison to Kubernetes. It would be like saying AWS Lambda is a lot easier to use than EC2. I've used both Kubernetes and GCR at the current company I cofounded, and theres pros and cons to both. (Team of about 10 engineers)

GCR was simple to run simple workloads, but, an out of the box Postgres database can't just handle unlimited connections and so connecting to it from GCR without having a DB connection proxy like PG bouncer risks exhausting the connection pool. For a traditional web app at any moderate scale, you typically need some fine grained control over per process, per server and per DB connection pools, which you'd lose with GCR.

Also, to take advantage of GCR's fine grained CPU pricing, you'd have to have an app that boots extremely quickly, so it can be turned off during periods of inactivity, and rescheduled when a request comes in.

Most of our heaviest workloads run on Kubernetes for those reasons.

The other thing thats changed since this author probably explored Kubernetes is that there are a ton of providers now that offer a Kubernetes control plane for no cost. The ones that I know of are Digital Ocean and Linode, where the pricing for a Kubernetes cluster is the same as their droplet pricing for the same resources. That didn't use to be the case. [1] The cheapest you can get is a $12 / month, fully featured cluster on Linode.

I've been building, in my spare time, a platform that tries to make Kubernetes more usable for single developers: https://canine.sh, based on my learnings that the good parts of Kubernetes are actually quite simple to use and manage.

[1] Digital oceans pricing page references its free control plane offering https://www.digitalocean.com/pricing

gizzlon

> an out of the box Postgres database can't just handle unlimited connections and so connecting to it from GCR without having a DB connection proxy like PG bouncer risks exhausting the connection pool.

Good point. How many connections can it handle? Seems like it's up to 262142 in theory? Or am I reading this wrong: https://cloud.google.com/sql/docs/postgres/flags#postgres-m ??

But even 1000 seems ok? 1 per container, so 1000 running containers? Quite a lot in my world, especially since they can be quite beefy. Would be very worried about the cost way before 1000 simultaneously running containers :)

czhu12

This assumes your Postgres has a an infinite amount of memory which would also be very expensive. I think you'd probably want to assume each connection takes ~10-20MB of memory.

igor47

Why are GCR and pgbouncer incompatible? Could you run a pgbouncer instance in GCR?

seabrookmx

GCR assumes it's workload is HTTP, or a "job" (container that exits once it's task has completed). It scales on request volume and CPU, and the load balancer is integrated into the service. It's not obvious to me how you'd even run a "raw" TCP service like pgbouncer on it.

czhu12

I’m not an expert, but from what I understand, the standard set up is like:

4x(Web processes) -> 1x(pgbouncer) -> database

This ensures that the pgbouncer instance is effectively multiplexing all the connections across your whole fleet.

In each individual web process, you can have another shared connection pool.

This is how we set it up

solatic

Cloud Run is fine if you're a small startup and you're thinking about your monthly bill in three-figure or even four-figure terms.

Like most serverless solutions, it does not permit you to control egress traffic. There are no firewall controls exposed to you, so you can't configure something along the lines of "I know my service needs to connect to a database, that's permitted, all other egress attempts are forbidden", which is a foundational component of security architecture that understands that getting attacked is a matter of time and security is something you build in layers. EDIT: apparently I'm wrong on Cloud Run not being deployable within a VPC! See below.

GCP and other cloud providers have plenty of storage products that only work inside a VPC. Cloud SQL. Memorystore. MongoDB Atlas (excluding the expensive and unscalable serverless option). Your engineers are probably going to want to use one or some of them.

Eventually you will need a VPC. You will need to deploy compute inside the VPC. Managed Kubernetes solutions make that much easier. But 90% of startups fail, so 95% of startups will fail before they get to this point. YMMV.

jedi3335

Cloud Run has had network egress control for a while: https://cloud.google.com/run/docs/configuring/vpc-direct-vpc

solatic

Nice, I didn't know about this, it wasn't available last time I checked.

With that said... there are so many limitations on that list, that seriously, I can't imagine it would really be so much easier than Kubernetes.

bspammer

I’m surprised Cloud Run doesn’t let you do this. You can put an AWS lambda in a VPC no problem.

p_l

kubernetes is how I keep compute costs in 2-3 digits :V

madjam002

I don't get these recent anti-Kubernetes posts, yes if you're deploying a simple app then there are alternatives which are easier, but as your app starts to get more complex then suddenly you'll be wishing you had the Kubernetes API.

I'd use Kubernetes even if I was spinning up a single VM and installing k3s on it. It's a universal deployment target.

Spinning up a cluster isn't the easiest thing, but I don't understand how a lot of the complaints around this come from sysadmin-type people who replace Kubernetes with spinning up VMs instead. The main complexity I've found from managing a cluster is the boring sysadmin stuff, PKI, firewall rules, scripts to automate. Kubernetes itself is pretty rock solid. A lot of cluster failure modes still result in your app still working. Etcd can be a pain but if you want you can even replace that with a SQL database for moderate sized clusters if that's more your thing (easier to manage HA control plane then too if you've already got a HA SQL server).

Or yes just use a managed Kubernetes cluster.

noname44

lol, even if you have complex apps there are always easier solutions than Kubernetes. It is evident that you have never run such an app and are just talking about it. Otherwise, you would know the issues you would encounter with every update due to breaking changes. Not to mention that you need a high level of expertise and a dedicated team, which costs far more than running an app on Fargate. Recommending a managed Kubernetes cluster is nonsense, as it goes against the whole purpose of Kubernetes itself.

madjam002

I've been running apps on Kubernetes clusters for the past 6 years and the only thing that really comes to mind that was a breaking change was when the ingress class resource type was introduced. Everything else has been incremental. Maybe I'm forgetting something.

What's wrong with recommending a managed cluster? I wouldn't use one but it is certainly an option for teams that don't want to spin up a cluster from scratch, although it comes with its own set of tradeoffs.

My project at the moment is definitely easier thanks to Kubernetes as pods are spun up dynamically and I've migrated to a different cloud provider and since migrated to a mix of dedicated servers and autoscaled VMs, all of which was easy due to the common deployment target rather than building on top of a cloud provider specific service.

p_l

There was breaking change around 1.18, which was spread over few releases to make migration easier. Similar fix pattern as with graduating beta to stable APIs for things like Ingress, they just IIRC covered all the core APIs or so? Don't have time to look it up right now.

Generally the only issue was forgetting to update whatever you use to setup the resources, because apiserver auto-updated the formats to the point worst case you could just grab them with kubectl get ... -o yaml/json and trim the read-only fields.

mplewis

This is obvious FUD from a throwaway account. 1.x Kubernetes breakage has rarely affected me. I’m a team of one and k8s has added a lot of value by allowing me to build and run reliable, auto-managed applications that I’m confident I can transfer across clouds.

semitones

Kubernetes has a steep learning curve, and certainly a lot of complexity, but when used appropriately for the right case, by god it's glorious

threeseed

Kubernetes has a proportional learning curve.

If you're used to managing platforms e.g. networking, load balancers, security etc. then it's intuitive and easy.

If you're used to everything being managed for you then it will feel steep.

alienchow

That's pretty much it. I think the main issue nowadays is that companies think full stack engineering means OG(FE BE DB) + CICD + Infra + security compliance + SRE.

If a team of 5-10 SWEs have to do all of that while only graded on feature releases, k8s would massively suck.

I also agree that experienced platform/infra engineers tend to whine less about k8s.

ikiris

Nah the difference between managing k8 and the system it was based on is VASTLY different. K8 is much harder than it needs to be because there wasn't tooling for a long time to manage it well. Going from google internal to K8 is incredibly painful.

t-writescode

I think this is only true if the original k8s cluster you're operating against was written by an expert and laid out as such.

If you're entering into k8s land with someone else's very complicated mess across hundreds of files, you're going to be in for a bad time.

A big problem, I feel, is that if you don't have an expert design the k8s system from the start, it's just going to be a horrible time; and, many people, when they're asked to set up a k8s setup for their startup or whatever, aren't already experts, so the thing that's produced is not maintainable.

And then everyone is cursed.

p_l

Thanks to kubernetes "flattening" that mess into somewhat flat object map (I like to call it a blackboard system :D) it can be reasonably easy to figure out what's the desired state and what's the current state for given cluster, even if the files are a mess.

However...

Talking with people who started using kubernetes later than me[1], it seems like a lot of confusion starts by trying to start with somewhat complete example like using a Deployment + Ingress + Services to deploy, well, a typical web application. The stuff that would be trivial to run in typical PaaS.

The problem is that then you do not know what a lot of those magic incantations mean, and the actually very, very simple mechanism of how things work in k8s are lost, and you can't find your way in a running cluster.

[1] I started learning around 1.0, went with dev deployment with 1.3, graduated it to prod with 1.4. Smooth sailing since[2]

[2] The worst issues since involved dealing with what was actually global GCP networking outage that we were extra sensitive to due to extensive DNS use in kubernetes, and once naively assuming that the people before me set sensible sizes for various nodes, only to find a combination of too small to live EC2 instances choking till control plane death, and outdated etcd (because the rest of the company twas too conservative in updating) getting into rare but possible bug that corrupted data which was triggered by the flapping caused by too small instances. Neither I count as k8s issue, would have killed anything else I could setup given the same constraints.

threeseed

The exact can be said for your Terraform, Pulumi, Shell scripts etc. Not to mention unique config for every component and piece of infrastructure.

At least Kubernetes is all YAML, consistent and can be tested locally.

jauntywundrkind

And there's very few investment points below it.

You can cobble together your own unique special combination of services to run apps on! It's an open ended adventure into itself!

I'm all for folks doing less, if it makes sense! But there's basically nothing except slapping together the bits yourself & convincing yourself your unique home-made system is fine. You'll be going it alone, & figuring out on the fly, all to save yourself from getting good at the one effort that has a broad community, practitioners, and massive extensibility via CRD & operators.

winwang

Using it to run easily-scalable Spark clusters. Previously used it for large distributed builds. It's been pretty great (even if annoying at times).

JohnMakin

I don't understand how these posts exist when much of my consulting and career for the last few years has been on companies that have basically set up a bare bones out of the manual EKS/GCP solution and just essentially let it sit for 3+ years untouched until it got to a crisis. That to me as a systems engineer is nuts and a testament to how good this stuff is when you get it even kind of right. Of course, I'm referring to managed systems. Doing Kubernetes from scratch I would not dream of doing.

OtomotO

Right.

Depending on my client's needs we do it oldschool and just rent a beefy server.

Using your brain to actually assess a situation and decide without any emotional or monetary attachment to a specific solution actually works like a charm most of the time.

I also had customers who run their own cloud based on k8s.

And I heard some people have customers that are on a public cloud ;-)

Choose the right solution for the problem at hand.

seabrookmx

We dabbled with Cloud Run and Cloud Functions (which as of v2 are just a thin layer over Cloud Run anyways).

While they worked fine for HTTP workloads, we wanted to use them to consume from Pub/Sub and unfortunately the "EventArc" integrations are all HTTP-push based. This means there's no back pressure, so if you want the subscription to buffer incoming requests while your daemons work away there's no graceful way to do this. The push subscription will just ramp up attempting to DoS your Cloud Run service.

GKE is more initial overhead (helm and all that junk) but using a vanilla, horizontally scaled Kubernetes deployment with a _pull_ subscription solves this problem completely.

For us, Cloud Run is just a bit too opinionated and GKE Autopilot seems to be the sweet spot.

gizzlon

> The push subscription will just ramp up attempting to DoS your Cloud Run service

Interesting. My assumption would be that Cloud Run should quickly* spin up more containers to handle the spike and then spin them down again. So there would be no need for back pressure? Guess it depends on the scale? How big of a spike are we talking about? :)

*Let's say a few seconds

seabrookmx

You typically have it configured with a cap on instances for cost reasons.

Even if you don't, if you have other services (like a database) downstream, you might not want it to scale infinitely as then you're simply DoS'ing the DB instead of the Cloud Run service.

Backpressure is really important for the resiliency of any distributed system, IMO.

Daily Digest email

Get the top HN stories in your inbox every day.