Running stateful application workloads in Kubernetes

One of the most common questions we get from clients and prospective clients is, "Will you run our database in Kubernetes"? Typically our answer is "No, and you probably shouldn’t either", but there are certain cases where running stateful application workloads such as databases can be the right choice if you are willing to take on additional work and complexity.

Where are you starting from?

There are many reasons (some good, some not as good) you may wish to run your database on Kubernetes. Unless this is a truly greenfield project, chances are that you have already been happily running one or many database instances in your cloud provider or data center.

Let’s examine some of the possible reasons you may have, and more importantly, whether Kubernetes is likely to help or hurt.

Maybe you have some Terraform or other configuration management code that allows you to configure your MySQL replicas or your Elasticsearch Cluster and are not quite happy with it. Or you are happy with it but you would like to centralize on a Kubernetes Infrastructure Deployment workflow and that makes it very tempting to run helm install mongodb . Maybe you read an article where the author created a Wordpress instance on Kubernetes in less than 10 minutes. It's possible that you've been trying to manually manage a PostgreSQL instance and it's been unreliable and flaky and you heard that Kubernetes helps with application reliability! Maybe you'd like to reuse as much of your application deployment automation as possible and would like to have your Kafka instance deployed in the same way as your application.

What do all these scenarios have in common? The idea that running a database on a Kubernetes cluster is going to be about the same or less effort than running that same database on metal or VMs. Sorry, that just won't be the case; the complexity is additive. You still need to understand the underlying database configuration, how to optimize it as well as how to run it on Kubernetes. Production data stores can be tricky. Kubernetes can be tricky and data persistence with Kubernetes can be extra tricky.

Considerations for persistent data in Kubernetes in the Cloud:

StatefulSets in Kubernetes have come a long way since the PetSets days but the old name indicates the intent the early versions of Kubernetes had for managing state. It takes the "Cattle not Pets" idea right down to the pod level. If losing an instance (or pod in this case), no matter how quickly it is restored, is going to be a problem for you, you’re going to have challenges managing it on Kubernetes. This goes double when it doesn’t come back automatically for some reason.

Fairwinds has had our fair share of midnight pages when a PersistentVolume EBS volume was orphaned because it could not re-attach to a node in the cluster (and many other misadventures). Thankfully, the Kubernetes core has come a long way and newer solutions like Rook and Portworx have much improved the state of state in Kubernetes (once you know how to run them properly). That said, there are several things you should consider when picking a database that will run and stay healthy on Kubernetes.

What is Kubernetes good at?

Service discovery and load balancing
Storage orchestration
Automated rollouts and rollbacks
Bin packing
Self-healing
Secret and configuration management

This paints a picture of a platform that load balances connections, can horizontally scale under load, automatically discover new workload instances into the load balancer, and replace failed instances. For stateless workloads this is a holy grail of orchestration but throws a bit of a wrench in the works for traditional database technologies like MySQL. Looking at this picture it appears that the best types of databases to run in Kubernetes are ones that are:

distributed and capable of having multiple replicas, and still function while missing some number of those replicas,
ephemeral, like a a cache or development environments.

No worries, I'm using <insert sharded multi-multi replicated data store> !

Great, you're on your way to a reasonable database solution on Kubernetes! Assuming:

you have worked through all the settings and tweaks that make it performant for your use-case
you have a repeatable method for deployment.
the community Helm chart works for you or you have the expertise in house to write your own.

All you have left to do is make it run well on Kubernetes: taking Limits/Resources, Network Policy, Pod Disruption Budgets, Auto Scaling and consider all of the other myriad complications of running any application on Kubernetes

Do you need multi-zone? multi-region? multi-cloud? If you need multi-region or multi-cloud there is a good chance you'll need to add a Service Mesh like linkerd or Istio or to the list and you’ll need to tackle latency issues. Are your PodDisruptionBudgets configured? How many replicas can you afford to lose during an upgrade, during an outage? Do you have your Affinity configured properly to ensure each replica lands on a different node? On AWS, are your PersistentVolumes spread across zones and then what happens when an instance is scheduled in a zone without a PersistentVolume? Production data stores can be Tricky. Kubernetes can be Tricky and data persistence with Kubernetes can be extra tricky

What if my database cannot be cattle?

Is it ephemeral? If not you can still run MySQL on Kubernetes in a StatefulSet with one replica like the simple cases you'll see in most examples and the cases you'd find for setting up something like Wordpress, but in doing so you'll hamstring Kubernetes' capability to provide a higher level of service than running the "old school" way. Without load balanced replicas, you will experience a service disruption any time that workload moves between nodes.

If it is ephemeral, non production, or any other case where you can afford to not have it for some period of time, rock on.

Where are you going?

At Fairwinds I've had several conversations where organizations want to do away with their data specialists and have their platform team operationalize their data store on Kubernetes thinking it will eliminate the complexity of managing the database. You should see by now, that running your database on Kubernetes does not eliminate complexity, it adds complexity. Complexity that could result in a higher quality of service to be sure, but it won't be a magic bullet and there are more ways to do it wrong than to do it right.

Ok, then what do you recommend?

You should run your production database how and where works for you. Running it on Kubernetes can be a great option if you and your team are willing to put in some significant effort to fully understand the implications and requirements of doing so. Production data stores can be tricky. Kubernetes can be tricky and data persistence with Kubernetes can be extra tricky.
If you choose to do so, I highly recommend not running your stateful application mixed in with your other highly scalable workloads on; use a separate node pool or cluster. A cluster/node pool that is optimized for quick scaling provides more chance that something will go wrong with your persistent state as your cloud volumes are mounted and unmounted and your replicas are moved around. Production data stores can be tricky. Kubernetes can be tricky and data persistence with Kubernetes can be extra tricky.
Don't fire your DBAs! Databases on Kubernetes will still require database experts. It will also require Kubernetes experts. Production data stores can be tricky. Kubernetes can be tricky and data persistence with Kubernetes can be extra tricky.
Ephemeral cache and development environments are A-OK. Chances are good here that if you lose a pod, the ~90 seconds it will take to remount the volume or repopulate the cache won't hurt. This is not a production data store.

Your mileage may vary. Need further help?