Skip to content(if available)orjump to list(if available)

Falling for Kubernetes

Falling for Kubernetes


·August 9, 2022


For anyone managing a k8s cluster and are fatigued with memorizing and reciting kubectl commands should definitely take a look at k9s[0]. It provides a curses like interface for managing k8s which makes it really easy to operate and dive into issues when debugging. Move from grabbing logs for a pod to being at a terminal on the container and then back out to looking at or editing the yaml for the resource definition in only a few key presses.



I've used k9s every day for the last 6 months and it's really superior to everything if you have any vi-fu. It even plays nice with the terminal emulator's colour scheme. It's simply an all-around pleasant experience in a way no dashboard is.


I like Lens, as more of a GUIs fan, and very occasional k8s-er. It has saved me a lot of time.


Lens has been bought by another company, the same one that bought Docker, and they are not playing nice with the community.

Some people have forked it to remove the newly added trackers, the forced login, and the remote execution of unknown code, but I sadly guess that it will become a corporate nightmare and the forks will not be popular enough to really take over the development.


Which fork do you recommend?


For those who use emacs, I'd also recommend the very nice `kubel` plugin - an emacs UI for K8S, based on Magit.


I had to look up k9s because I wondered what you meant by "curses like interface" - it couldn't be where my mind went:"f*ck_u_k8s -doWhatIwant -notWhatISaid"

And upon lookup I was transported back to my youth of ascii interfaces.


K9s made my learning of k8s way way way easier. I still use it every single day and I absolute adore it. The terminal user interface was so absolutely fantastic that it genuinely sparked my motivation to build more TUIs myself.


Do you pronounce it 'canines' or 'K-9-S'?


Logo is a dog, and everybody I know call it "canines"


I use k9s every day, love it. Only problem is that the log viewing interface is buggy and slower than kubectl get logs. Still love it though.


The larger the log buffer is, the slower k9s gets unfortunately. For me the builtin log viewer is useful for going back short time periods or pods with low log traffic.

You can work around it by using a plugin to invoke Stern or another tool when dealing with pods with high log traffic.


Can’t vouch for k9s enough, it’s great and I think it helped me to gain a much better understanding of the resource/api system.


k9s is by far the most productive tool in my toolshed for managing & operating k8s clusters.


Lots of people complain about Kubernetes complexity but I have found it is as complex as you make it. If you are running some simple workloads then once you have the pipeline setup, there is almost no maintenance required.

When people complain and then start talking about super complex configuration, bespoke networking functionality and helm charts that have "too many options" then surely that just means you don't have the skills to use the system to that degree?

I could say that .Net is too complicated because it has MSIL and library binding sequences involving versions and public keys and the fact you can not always link e.g. netfx with netstandard but these are either just things you need to learn, or things that you can't use until you do learn them.

It's like someone complaining that a Ferrari is too complicated because you can't work out how to change the cylinder head when most people will just drive it.


Where some people collide, and disagree about complexity depends on their roles.

If you're a consumer, then yes, it's as complex as you make it. If you keep it super simple you may lose out on some features, but that a reasonable trade-off.

If you're the person responsible for running and maintaining the Kubernetes cluster, then you're kinda out of luck. It honestly not that bad to install, you can do that in an afternoon. Where I find Kubernetes to be exceedingly complex is in debug-ability. I'm not sure there's anyway around that, it's basically a software defined datacenter, with all the complexity that brings... For some of us it's a software defined datacenter, on top of an actual datacenter, just to make things worse.

When I read about a company that just spin up a new Kubernetes cluster, because it's quicker than debugging the existing one, then I get concerned. For running payload, absolutely, just use the subset of the features you're comfortable with and build from there. Still I'd argue that most of us will never have problems large enough or complex enough that Kubernetes is a hard requirement.


This is a bit like people being apologetic for PHP. Sure, technically, it is possible to write good PHP code. It doesn't have to turn into a spaghetti.

I have several issues with Kubernetes that superficially look like I'm just avoiding the complexity, but I've dealt with systems that are much more complex with ease.

1. In most orgs and environments, its a cloud-on-a-cloud. A few years ago I used to joke with people that the virtualisation in the cloud is 7 layers deep and no human can understand or troubleshoot it any longer. Adding something like Docker adds 1-2 layers. Kubernetes doubles the layers. Everything you do with the underlying cloud is duplicated in Kubernetes, but incompatible. E.g.:

    Azure has:        Kubernetes has:

    Resource Groups   Namespaces
    Tags              Labels
    Disks             PVs & Container Images
    VMs               Nodes
    (various)         Pods
    Load balancers    Ingress
    NSGs & FWs        (various)
    Policy            Policies
    Key Vault         etcd
    ARM Templates     Helm charts
    JSON APIs         gRPC APIs
    Azure AD          Pluggable auth
    Azure Metrics     Prometheus
    Log Analytics     (various)
    PowerShell        cli tool
These interact in weird and wonderful ways. Azure NATs all traffic, and then Kubernetes NATs it again by default. There's "security" at every layer, but just to be irritating, all Kubernetes traffic comes from unpredictable IPs in a single Subnet, making firewalling a nightmare. You're running cloud VMs already, but Windows containers run in nested VMs by default on Kubernetes. Kubernetes has its own internal DNS service for crying out loud!

2. Trying to do everything is to be less than optimal for everyone. There are four distinct ways of managing a cluster, and they're not compatible. You can run imperative commands, upload Helm charts, sync the cluster with an entire folder of stuff, or use a plugin like Flux to do GitOps. But if different people in a large team mix these styles, then this causes a giant mess. (To be fair, this is an issue with all of the major public cloud providers also.)

3. Google-isms everywhere. Every aspect of Kubernetes uses their internal shorthand. I'm not an ex-Googler. Nobody at any of my customers is. Nobody around here "speaks this dialect", because we're 12,000 kilometres from Silicon Valley. I'm sure this is not deliberate, but there are definite "cliques" with distinct cultures in the IT world. As FAANG employees flush with cash jump ship to start projects and startups like Kubernetes, they take their culture with them. In my opinion, mixing these together at random into a larger enterprise architecture is generally a mistake.

4. Kubernetes is not much better than bare VMs for developers, especially when compared with something like Azure App Service. The latter will "hold your hand for you" and has dozens of slick diagnostic tools integrated with it. On Kubernetes, if you want to do something as simple as capture crash dumps when there's a memory leak detected, you have to set this up yourself.

5. Microservices and RPC-oriented by default. Sure: you're not forced to implement this pattern, but it's a very steep slippery slope with arrows pointing downhill. In my experience, this is unnecessary complexity 99.99% of the time. Just last week I had to talk a developer out of adding Kubernetes to a trivial web application. Notice that I said "a" developer? Yes, a solo developer was keen on adopting this fad "just because". He was seriously planning on splitting individual REST endpoints out into containerised microservices. He's the third solo developer I've had to talk out of adopting Kubernetes this year.

6. Startup culture. Kubernetes shipped too early in my opinion. Really basic things are still being worked out, and it is already littered with deprecation warnings in the documentation. It's the type of product that should have been fleshed out a bit better at one or two large early adopter customers, and only then released to the general public. But its authors had a lot of competition (Docker Swarm, etc...) so they felt a lot of pressure to ship an MVP and iterate fast. That's fine I suppose, for them, but as an end-user I have to deal with a lot of churn and breakage. A case-in-point is that the configuration file formats are so inadequate that they have spawned a little ecosystem of config-file-generator-generators. I don't even know how deep those layers go these days. (Again, to be fair, Azure now has Bicep -> ARM as a standard transpilation step.)

7. Weak security by default because containers aren't security boundaries as far as Linus Torvalds or Microsoft Security Response Center are concerned. Everyone I've ever talked to about Kubernetes in the wild assumes the opposite, that it's magically more secure than hypervisors.

I get the purpose of Kubernetes in the same way that I get the purpose of something like Haskell, coq, or hand-rolled cryptography. They all have their legitimate uses, and can be useful in surprising ways. Should they be the norm for typical development teams? I'm not convinced.

Maybe one day Kubernetes v3 will be mature, stable, and useful for a wider audience. Especially once the underlying cloud is removed and there are smaller providers offering "pure Kubernetes clouds" where node pools are bare metal and there's no NAT and there isn't an external load balancer in front of the Kubernetes load balancer to make things extra spicy when diagnosing performance issues late at night across multiple incompatible metrics collector systems...


k8s is deceptively simple (or is that deceptively complex?). Anyway, what I mean is that spinning up a basic cluster isn't hard. Maintaining a cluster on premises while following every existing infosec and ops guideline is. It's not that you can't do this, it's just a very non-trivial amount of work.


This is what infra/ops people are there for. I think a lot of the problems here are devs with no ops background having to maintain these platforms. It’s understandably daunting for those in this position.


> once you have the pipeline setup, there is almost no maintenance required.

You could apply this to a traditional deployment. Once you setup all the CI/CD there’s no maintenance required.

But the non kubernetes would probably be cheaper.


Depends on many details, and how you use k8s.

I mainly use it to save money as I can pay for less cloud resources.


You’re paying for resource isolation. You can put many things on 1 server and it will be cheaper. We have done that for decades.


...or deploy your code on Google App Engine, Heroku, Elastic Beanstalk, Digital Ocean App Platform, (etc etc) and spend all your time implementing user-facing features instead of maintaining infrastructure.

Yeah, I get it, compared to maintaining bare metal, k8s is amazing. But you're still wasting your time working on plumbing.


Google Cloud Functions for the win. Very reasonable pricing model too. Doing 55 requests per second, 24/7, and it is about $100 a month, including the managed Cloud SQL postgres instance.


That's what I thought as well, but now I do have some long-running jobs that exceed GCF's 60min limit. So I'm stuck with docker on Compute Engine, where GCP treats you like a 2nd class citizen as the OP found out.


I've worked on systems that did that and it was a huge huge mess, especially as the company grew. When jobs run that long, any failure means that they have to start over again and you lose all that time. Even worse, is that it stacks up. One ETL job leads into the next and it becomes a house of cards.

It is better to design things from the start to cut things up into smaller units and parallelize as much as possible. By doing that, you solve the problem I mention... as well as the problem you mention. Two birds.


you need to split those jobs into smaller ones that read their parameters from a queue. Then it will fit in serverless and also be more reliable


AWS Lambda > Google Functions!


I have a lot of experience with both clouds.



Agree, especially at early stage you don't need to overcomplicate your infrastructure.


Amazon EKS + Fargate.

No bare metal to manage. Control plane complexity is abrastacted away. Fargate namespace + profiles, no worker node configuration.

EKS cost will be $90/m Thereafter you only pay for what cpu/mem limits you assign to your deployments/pods.

Otherwise, why bare metal? For your basic needs, bare metal, self managed control planes, etc are definitely over complicating things.


If you're abstracting away most of the complexity of k8s, why not just go use ECS and spend nothing on the cluster? You will probably have to do some rewriting of your deployment scripts when you move off of EKS anyway (just like when you move off of ECS), so you might as well use ECS and save the $90/m (and it's generally easier to use).




Using Fargate for long / permanently running workloads in EKS is only an option when the costs are none of your concern.


As opposed to?

Fargate is cheap, you can have permanently running workloads and still pay less than non-fargate options.


> This deployment needed to serve docker images on boot-up, which instance templates do support. However they don't support them via API, only in the web console or the CLI.

Not exactly true. I got around this limitation by sending a startup script[1] in API metadata which basically just does invokes `docker run ...` and it works just fine. This allows spinning up/down container-based VMs via API only which is nice.



Yeah you can do the same thing via any image that supports cloud-init or ignition (eg fedora coreos) and have the same exact setup deployed in almost any cloud


> Keep it managed since that legitimately becomes a reliability headache.

This is the thing that I think will always give me pause. If I have to pay a third party to manage my cluster orchestration backplane, that seems like a pretty big piece of overhead.

Sure, I can do it myself, but then I have to deal with said reliability headache. It seems ironic that a cluster management framework -- that touts its ability to reliably keep your applications running -- has its own reliability issues.


This may not be a surprise to some, but when folks talk about reliability of the control plane, they usually think failure means their web service goes down. That’s not true. If you shot the kubernetes control plane, the individual servers can’t talk to it anymore - so they do nothing. Docker images that were running stay running. They even get restarted if they crash (via restartPolicy). Services that had specific other pods they were referencing continue referencing those pods. In net: everything except for kubectl and other kubenetes internals keeps working.

That said, one piece that isn’t talked about frequently is the network overlay. Kubernetes virtualizes IPs (so each pod gets an IP), which is awesome to work with when it works. But if your overlay network goes down - god help you. DNS failures are the first to show up, but it’s all downhill from there. Most overlays take great care to degrade well, so they’re not tied to the control plane, but I have yet to find one that’s perfect. The overlay is the part of kube that truly isn’t failure tolerant in my experience.


> Kubernetes virtualizes IPs (so each pod gets an IP), which is awesome to work with when it works

Kubernetes does no such thing.

Weave Net, which is likely the most used CNI, does. There are other options however, and some of them use baremetal routers via bridging or even VLANs for example.


The fact that each pod has an IP is a core assumption of Kubernetes. Sure, the CNIs are responsible for actually implementing this, but it is a required part of their contract to provide 1 unique IP per pod (or, more precisely, either 1 IPv4 or 1 IPv6 or both per virtual NIC per pod - to cover dual-stack support in 1.24+ and Multus).


Exactly. We are building these incredible open source tools... but they grow so complex that we need to pay others in order to use them effectively?

What would you say if you had to pay Google if you want to use Golang in an effective way (because the language has become so complex that it's difficult to handle it on your own?). Crazy.

I wanted to take a look at how to use K8s on my own cluster, and damn it, to install the whole thing is not that straightforward. So, now to keep my sanity I need to pay a cloud provider to use k8s! I guess that's the trick: build some open source monster that's very hard to install/maintain/use but has cool features. People will love it and they'll pay for it.


I think people are overestimating the difficulty of setting up k8s. It's not that hard. Grab Debian, install, set one sysctl, load one kernel module and you're all set. Install few more packages, run kubeadm init and that's all.

The only thing that I've found truly hard is autoscaling VMs. Managed clouds got it easy, I was not able to make it so far. But it doesn't seem that hard either, I just need to write one particularly careful bash script, I just don't have time for that yet.

Google does have some secret sauce. For example there's horizontal scaling which spins up more pods as load grows. There's vertical scaling that adjusts resources for individual pods. Those are generally incompatible with each other and people adjust resources manually. GKE autopilot has multi-dimensional scaling which they didn't open source, it does both.

May be I wasn't hit with something nasty yet. But so far I think that those complains about managing k8s are somewhat strange. Yes, if you manage thousands of VMs, it might get another full-time job, but that's a scale. I manage few VMs and that's not an issue.


Something that isn't appreciated enough is how reliability issues demolish your teams throughput. Got a random cluster restart heisenbug taking customers offline? good luck debugging it for the next 6 weeks. Alternately, ignore the problem while your engineers get woken up at night until they quit...

The progress of software has been towards managed offerings for a reason. A company can make it their entire business to own fixing these reliability issues for you, and keeping them at bay. Do you really want to be int the business of managing a cloud on your cloud?


I don't fully understand, there's benefits for using a managed service in instances the control plane is something you only interact with but don't manage. Not every Ops team will have a CKA administrator at hand to delve into etcd or controller manager. Open a ticket and it's generally fixed.

Then there's situations where you want full control over the control plane itself, I've worked with companies that had clusters installed on bare metal in their stores in the UK. A CKA engineer is an essential in this case but brings it's own reliability headaches.


I don't disagree with you, but if you can reliably trade your dataplane outages for control plane outages, that's still usually a good tradeoff.


That's a really good point. Certainly you don't want either to go down, but outages are more tolerable if customer requests are still getting serviced regardless.


Vanilla k8s is pretty good. But once the 8 trillion vendors have you 'curl | helm' ing you end up with a knot of a system.

Keep it simple, use GitOps (ArgoCD is great), let k8s do what it's good at, managing workloads, not as a delivery mechanism for a vendor.

As an aside, the existence of the '{{ | indent 4 }}' function in helm should disqualify it from any serious use. Render, don't template.


> As an aside, the existence of the '{{ | indent 4 }}' function in helm should disqualify it from any serious use. Render, don't template.

This. My first thought when I saw the indentation hack was "it can't be serious, production-ready software".

My take on this is as follows.

If you have a simple use case, write your K8s manifests directly.

If you have a complex use case, Helm is often more pain than it's worth. Use alternatives, for example Jsonnet[0] with kubecfg[1]. Or emit manifests from your language of choice. Just don't use Helm.

[0]: [1]:


For a complex use case: cdk8s is cool


It is shocking that such a clearly bad design choice has stuck, and that Helm has become so popular in spite of it. I've had my eye on jsonnet, I'll have to try that next time I do something in k8s.


It makes sense if you consider how poor K8s' design already is. They somehow made it overcomplicated, yet lacking in critical features, with terrible security, no multitenancy, unnecessarily duplicated concepts, confusing and redundant configuration, a nightmarish setup process, no VCS integration, with mutable state, and lock-in reliance on specific external software components, while also having microservices, when the components were never really intended to be replaced, and misbehave when their assumptions change between versions. Then they invented and sunsetted multiple iterations of the same concept (plugins, essentially), redesigned other concepts (Roles) without consolidating them, and decided long-term support/stable ABIs are for chumps, so that anyone using it was stuck doing infinite migrations. It's up there with Jenkins and Terraform as one of the worst designed systems I've seen. The fact that you need a million integrations to make it do anything useful at scale is proof that it's more of a development toy than a practical tool.


We are using helm at work but I've never touched it so I cannot comment on it being bad. However cargo cult habits have made quite a few technologies over the years take off when they never should have just because someone with either the right sort of intelligence and/or charisma made people believe it was good, and then the ripple effect made it take off.


I like helm because it's like a package manager. I don't know what's inside rpm or dpkg. I think they're terrible inside either. But the fact that I can apt install gimp makes it awesome.

Same about helm. I didn't write a single helm yet. But I like packages that I can helm install and I probably would avoid software without first-class helm. Helm is good for users. I can install it, upgrade it, configure it with values. Awesome.


I judiciously delete any and all helm I can find, and fight text-based templating where possible :/

Especially since k8s doesn't even use "text" for the format, it's just happens that JSON is the serialization, use datastructures and serialize them dammit :/


I'm fairly convinced that helm in production is an anti-pattern. With you instead having all K8s manifests checked into your Git repository with CI/CD to handle any changes.

Helm just has too much auto-magic and remove's one of k8s best features, the git-blame / git-diff for Infrastructure.


I don't understand why this isn't the prevailing philosophy. I'm over here Terraforming helm deployments and having no idea what is actually happening under the covers.


We have our CI run a helm —-dry-run —-debug before the deploy step, so we can see the exact kubernetes resources that are going to be applied. You can even save that as an artifact so you can diff it later.


Helm still saves you an incredible amount of work for setting up all the third party services.

If your team is big enough you can just write your own configs, but that takes a lot of time and often quite a bit of knowledge about the relevant service.

Rollbacks and resource cleanup is not to be underestimated either if you are not getting that from other tools like Argo.

Note: You can still use the Git-centric approach by generating configuration via `helm template`.


I'm convinced that the only reason Helm saves you time with 3rd party services is because 3rd party services only provide Helm chart or don't provide means of deployment to k8s at all.

I've been using kustomize to deploy some things with ArgoCD and it's so much easier. Now, I'm trying to never use helm for 3rd party.

However, for your internal things, Helm is hard to replace. It's easy to start a chart that is capable of deploying most of your internal services, but maintaining it is a nightmare.

Actually, using helm, as in `helm` the binary, directly sounds bonkers to me and I wouldn't wish this upon anyone.


I use Helm only for installing “supporting applications” -elastic searchoperator, jupyterHub, etc. Our normal deployments are standard K8s configs. These apps use Helm because a lot of the settings are complicated, Co dependent, etc.

Absolutely would not write helm charts from scratch for normal deployments, and if I got these apps in a better format than helm, I’d probably drop it immediately.


It’s the fact that they are complicated which warrants manual configuration. If the org can’t write the configuration, how will they support it when something goes wrong? It’s a problem waiting to happen.


I can't edit this anymore - but to anyone reading, I ought to specify that I do _not_ use Helm directly, only via Flux, our CD tool. It ablates a lot of the issues of dealing with helm charts.


> not as a delivery mechanism for a vendor

Amen. I got turned off from k8s following a tutorial that used Helm. I ran it and watched mysteries and line noise scroll past and walked away for a year. I thought "no, I will never put this squirming mass of tapeworms in front of anyone."

Then I took up with k3s and got underway.


> As an aside, the existence of the '{{ | indent 4 }}' function in helm should disqualify it from any serious use. Render, don't template.

Yeah, I think helm will be the death of Kubernetes. Some other workload management tool will come out, it will have a computer-science-inspired templating system (think Lisp macros, not C #defines) that is also easy to use, and the increased maintainability will be a breath of fresh air to people in helm hell and they'll switch over, even if the workload management fundamentals are worse.

It is a shame that ksonnet was abandoned. jsonnet is a very pleasant programming language for problems of similar scope. I think that people have to see some adoption or they give up; so Helm stays motivated to continue existing despite the fact that the design is fundamentally flawed, while alternatives give up when nobody uses them. If you're looking for a lesson in "worse is better", look no farther from helm. Easy things are hard, but everything is possible. That turns out to be important.

I also liked kustomize a lot, but it definitely sacrificed features for security and a clean overall design posture. I don't really know what it's missing, but it must be missing something. (I use kustomize for everything, but it seems rare in the industry. And I don't use it for complicated things like "make a namespace and app deployment for every PR"; I think to manage things like that it's missing features people want. I say just run your development branch on your workstation and commit it when it works. Running the app you write shouldn't be rocket science, and so a computer program that configures it shouldn't be necessary. The industry definitely disagrees with me there, though.)

One of the biggest problems I have with helm is that it's not easy to make local modifications to charts. That needs to be a first class feature, and something everyone knows how to use. As a vendor who ships a helm chart, I feel like almost all of my work these days is hearing from users "our k8s security team requires that every manifest set field X", and then I have to go add that for them and release it. If kustomize were the default, they'd just add that themselves instead of asking. But hey, opportunity to provide good customer service, I guess. (Luckily, most of the requests are compatible with each other. At one point we ran as root because that's the default, a customer required running as a user, and now everyone can run a locked down version; nobody NEEDED to run as root. So these requests generally improve the product, but it's a little tedious to make them ask.)


kustomize supports using a helm chart as a resource


Can you please elaborate on “render, don’t template”?


With templating, you treat the yaml as text with a yaml-unaware templating engine (like golangs text/template). You need to make sure that the end result is valid yaml.

With rendering, you use something that is aware of yaml, and you feed it data and it outputs valid yaml.


I don't understand this comment. How else are you going to deploy pieces of k8s infra into k8s if not with Helm and Helm Charts? Sure, you can use Argo to deploy and sync Helm charts into k8s're still going to be using Helm (if not indirectly via Argo) and you will inevitably need to template things that need to be dynamically configured at render-time.


I don't use templates for manifests and avoid them like the plague.

I use my preferred language to emit manifests and built tooling around it. It does not template, and instead generates the manifest by transforming a data structure (hash) into json. I can then use whatever language feature or library I need to generate that data structure. This is much easier to work with than trying to edit template files.

I don't need to save the output of these because when I use the tooling, it generates and feeds that directly into kubectl. There's also a diff tool that works most of the time that lets me see what I am about to change before changing it.

In fact, I ended up adding a wrapper for Helm so that I can have all the various --set and values, source chart, chart repo, chart version pinning all versioned in git, and use the same tooling to apply those with the flags to install things idempotently turned on by default. It sets that up and calls helm instead of kubectl.

That tooling I wrote is open-source. You never heard of it because (1) I don't have the time and energy to document it or promote it, and (2) it only makes sense for teams that use that particular language. Helm is language-agnostic.

EDIT: and reading this thread, someone mentioned Kustomize. If that had been around in '16, I might not have written my own tool. It looks like it also treat YAML manifests as something to be transformed rather than templates to be rendered.


Just kubectl apply the manifests. You can even use kubectl -k for the Kustomize configuration engine that can more or less replace most of what helm does today.


So what, I'm going to have a big Makefile or something with a bunch of kubectl applies? For each environment too? What if one of my dependencies (cert-manager for example) doesn't support directly applying via kubectl but have to be rendered with Helm? How do I manage versions of these dependencies too?

For better or for worse Helm is the defacto standard for deploying into k8s. Kustomizing toy apps or simple things may work but I have yet to see a large production stack use anything but Helm.


The only thing helm provides is awkward templating, IME. Ideally you'd never use a text template library to manipulate YAML or JSON structured objects. Instead you'd have scripts that generate and commit whole YAML files, or you'd just update the YAML manually (less optimal), and then you'd write those to the k8s API directly or through a purpose-built tool.

(Or, hell, helm but with no templating whatsoever).


> How else are you going to deploy pieces of k8s infra into k8s if not with Helm?

kustomize? raw manifests? "sed s/MY_PER_DEPLOYMENT_VALUE/whatever/" < manifest.yaml | kubectl apply -f -"? "jsonnet whatever.jsonnet | kubectl apply -f -"?

But yeah, a lot of people think Helm is mandatory and ask for it by name, and if you don't provide a chart they won't use your thing.


You can use envsubst instead of sed


> How else are you going to deploy pieces of k8s infra into k8s if not with Helm and Helm Charts?

kubectl apply -f $MANIFESTS

> you're still going to be using Helm (if not indirectly via Argo) and you will inevitably need to template things that need to be dynamically configured at render-time.

Use Kustomize for dynamic vars and keep it at a minimum. Templating is the root of all evil.

Helm mostly adds unnecessary complexity and obscurity. Sure it's faster to deploy supporting services with it, but how often do you actually need to do that anyway ? The time you're initially gaining by using Helm might generate an order of magnitude more time in maintenance later on because you've created a situation where the underlying mechanics are both hidden and unknown from you.


> kubectl apply -f $MANIFESTS

How do you configure it? Like you're installing new version, do you go over manifests and edit those by hand over and over every update? Do you maintain some sed scripts?

helm is awesome because it separates configuration.



So instead of Go Templates, I'm going to use Dhall? Why? I'd be losing interop with the entire Helm ecosystem too, so there go dependencies and such to maintain my IaC.

That blog post doesn't alleviate any issues one might have using the traditional Go Templates + Helm.


Gives you a Typescript API for generating resources. The difference is that these templates are semantic (via the type system) not just syntactic as in text templates. Relatedly, they also factor and compose better because functions and imports.


It’s probably best you avoid using Kubernetes in production for a long while. At the very least until you understand why your comment is being so heavily downvoted.


I’ve been using Kubernetes in production for over 4 years. I think I fully understand what’s going on, it’s just I have a different opinion than those downvoting me.


I really think this is why most people dislike and misunderstand the value of kube. If you raw dog it, it’s pretty amazing what you can build. It’s not very hard to roll your own Heroku (tailored to your workflows and workloads) with kube, if you shy away from all the noise of helm and other vendors as you say.


We are currently building a Database-as-a-service platform ( using Kubernetes. I have to say it is a love-and-hate story.

On the bright side, k8s is almost the only option of an abstraction layer on top of different Clouds, for a complex system with tens of components. Database is more than masters and workers, there are so many components you need to take care of. For example, we may need monitoring agents, certificate managers, health checkers, admin proxies, etc. Without k8s, you have to be the owner of a kindergarten.

On the other side, k8s IS complicated. It's like an unexplored garden. People just enter it and try to use whatever they see, and cause all kinds of problems. What we met are:

* Try to apply the operator pattern to everything, debugging is really painful. Learning curve is steep. * Small services still cost a lot. VPA is not mature enough and many tiny services may be just better off on lambda. * k8s is not really designed for small per-tenant clusters. Managing a fleet of clusters is no easy job, but it is something SaaS companies have to deal with.


> We are currently building a Database-as-a-service platform ( using Kubernetes.

I worked for a company that did exactly that (database-as-a-service on k8s); no one in the entire company knew how to run a cluster from scratch. This is a real problem if your developers want to do crazy bizarre advanced stuff like run tests because no one knows how anything fits with anything. At least, I thought it was a real problem as it wrecked any semblance of productivity, but no one else seemed to mind much and thought "it kind-of works on the CI on a good day if the planetary alignments are good" was fine. But hey, VC money so free to piss it all away on a small army of devs being wildly unproductive shrug

Also, the reliability of it all was nothing short of embarrassing, and debugging issues was hard because it was all pretty darn complex. Any technology is fantastic if it works, but I think that's the wrong metric. You need to look at how well things will go for you when it doesn't work – for any reason – because you only set your infrastructure up once, but you will "read" (debug) them countless times.

I hope it works out better for you, but I always felt that the advantages that k8s gave us could have been implemented from scratch significantly better by a few devs working on it for a few months. The other day I spent about an hour writing a shell script (and then spent a bit of time fixing a bug a few days later) to automate a ~15 minute task. k8s kinda felt like that.


It actually works better for us. The system is definitely complex, but we still have some ways to debug and develop locally. For example:

* You can test different parts of the system individually, via APIs.

* With k8s operator model, it's more like a white-box debugging experience. It is not ideal but do-able.

* You can have the rest of the system running remotely, but only your own piece locally. As long as the piece can have access to k8s api server, it just works.

The best thing k8s offers is repeatability. Scripts are so fragile once the system becomes more complicated. (with monitoring, management agents, etc.) And the product is a distributed database, which itself has so many running parts...


> no one in the entire company knew how to run a cluster from scratch

How is that even possible? I've worked for 2 dbaas companies that both used k8s and standing up a cluster with a control plane and data plane was as simple as a bash/terraform script. The only thing that was pretty annoying, as I recall, was cert manager because debugging it if you didn't configure it properly was painful, but once it worked it was great.

I mean even the non-dbaas companies I worked at didn't have that issue. It sounds like you would have had that problem even if you _didn't_ use k8s.


It's actually been an issue everywhere I've seen k8s be deployed. Part of the problem isn't k8s per se but rather "microservices" and that no one really has a good view of everything ties in with everything else, so running a complete product from scratch becomes really hard, and IMO k8s adds to this confusion and complexity.

No doubt there would have been issues if k8s wouldn't have been used, but at least that's (often) easier to understand, so it's 'only' "a mess I can understand and fix" rather than "a mess and wtf I have no idea".


Probably the next closest is just plain VMs (and potentially backplane/management layer running on k8s or whatever)

But yeah... Even then each cloud has quirks with Kubernetes and there's still quite a few resources just to stand up a cluster. Kubernetes can partially solve the initial provisioning but you generally need the cluster running with nodes before you can use something like CAPI or Crossplane (so you still need Terraform or Pulumi or scripts or whatever)

Having worked with a similar system, shared tenancy with tenant per namespace is just as bad but in a different way (if you use the classic operator pattern with 1 operator per cluster, you potentially have a massive blast radius). Then there's security...


1 operator per cluster is not ideal since most clusters are "stable" and don't need much care. Having plenty of them should be a headache.

The operator crash on our side does sound scary. But as a DBaaS system, as long as the blast radius doesn't touch the data plane, it is manageable.


From the Google trend[0] and the historical data of Kubernetes repo on GitHub[1], K8s has crossed the chasm to become the dominant choice for infrastructure. Whether it is the result of several companies working together, or from developers' choice. I think K8s will remain mainstream until there are big changes in the infrastructure world.

[0] [1]


Seems like everyone is forgetting about PaaS, and I don't understand why ..

For many use-cases it's going to be much simpler and cheaper than a manged k8.

There's no lock-in with Cloud Run than GKE. (actual lock-in comes with proprietary databases and the like.)

edit: Missed the GPU part, might make the OP's project the exception to the rule

People also forget about auto-scaling groups of VM's such as Managed Instance Groups in GCP:


"Azure is the only major provider that still has free control panels"

Oracle Cloud Infrastructure does as well. Perhaps it does not yet qualify as major... It's major to Oracle, that's for sure.


This is one of the pet peeves of HN submitters and readers :)

Sure, here is my two cents FWIW - Kubernetes is complex for some set of folks but not for others. So, the answer is - depends; On a lot of external factors outside of just technical capabilities of team.

Kubernetes solves many non-trivial challenges but its not a silver bullet. I could certainly learn from Mercedes platform/infra team's "story from the trenches" (they reportedly run 900+ k8s clusters on production:)