- Managed Kubernetes
- Managed Kubernetes
In the technology world, you may have heard that Kubernetes is great — but complicated. But what does that really mean? Since we've been practitioners in the space a while, helping other organizations build and deploy workloads in Kubernetes , we thought it might be helpful to share some of the real world pitfalls and successes we’ve seen with Kubernetes and answer your questions, too. Let’s dive into the questions we got from attendees!
There’s a lot to unpack in here, so here’s an overview of the topics:
When folks are first learning about Kubernetes, they might be used to an environment where everything is always running, and restarts are unexpected. So, coming into Kubernetes, they're surprised to learn that disruptions are not only expected, but planned for. Kubernetes provides configuration that allows you to ensure that these expected disruptions (or pod restarts), happen in a safe and predictable way.
The first configuration that Kubernetes provides is an object called a Pod Disruption Budget (PDB). This object specifies the number or percentage of identical pods in a deployment or statefulSet can be disrupted at a time when performing routine maintenance or scaling down the cluster. This helps prevent accidental downtime when performing these actions.
The second configuration, called Pod anti-affinity, is a Kubernetes feature that allows you to make sure pods aren’t scheduled on the same node, or in the same availability zone. This can help you prevent pods from competing for resources, isolate pods from one another to reduce the risk of failures affecting multiple pods, and to spread pods across nodes to improve reliability and performance. You can configure pod anti-affinity at the pod, deployment, or statefulSet level by defining an affinity rule that specifies the criteria that pods must meet in order to be scheduled on the same node. There are two types of pod anti-affinity rules:
Pod anti-affinity and PDBs are both important tools that help you improve the reliability, security, and performance of applications running in Kubernetes.
Frequently, when people are porting a legacy application to run in the cloud, they're also starting with a monolithic application and then breaking it up. What you need to understand when you do that is that you're changing your monolithic application into a distributed application. That means you have to worry about everything that goes with distributed applications. You need to shift your mindset, because the application architecture is the first important thing to think about. The application architecture informs where it should be running and not the other way around. So start with thinking about your desired architecture, and then work towards your method of deployment. it might even make sense to keep your monolith and deploy that to Kubernetes. Just remember that Kubernetes is a tool, not a solution
There are a lot of great monitoring tools out there for Kubernetes. We use Datadog quite a lot in-house at Fairwinds.
The Datadog pricing model is cryptic and can get very expensive very quickly if you're not careful. It's a great product that can help you protect, optimize, and monitor your mission-critical Kubernetes applications, but you have to be cognizant of the potential cost.
Prometheus is an open source system monitoring and alerting toolkit. You can do a lot of great things with the Prometheus stack, but your cost comes from a different place. It requires you to build it, design it, and maintain it yourself. There are a lot of good materials out there, but that's an expected cost of running open source.
If you're talking about monitoring, then you need to consider the entire stack, and you can't monitor anything that you don't have metrics for. If you are trying to achieve good observability, you actually need to have good metrics feeding into your monitoring solution. That is a real strength of service meshes, such as Linkerd (an open source service mesh). Service meshes monitor all of the network traffic happening in your cluster, giving you really good observability metrics all the way up and down your call stack.
If you’re creating a platform for your developers to use to deploy their applications in Kubernetes, you have to give them the tools that they need to ensure that they are using Kubernetes securely. Building a good platform requires you to put guardrails in place upfront in order to make sure that you are doing things correctly the first time, including proper configuration, admission controllers, and feedback into your CI/CD system.
It’s also about providing good documentation and patterns for developers upfront. That includes creating templates and Helm starters that are standard for your company. Providing education and putting guardrails in place that enforce those best practices are the two main pillars of how you can enable developers and improve developer experience.
There’s also the principle of least surprise . If you can, arrange your platform and development environment such that when a developer does something, they do not get a surprising response. Developers tend to feel better about that rather than getting an unpredictable response. There are ways to deploy Kubernetes that can result in some very unpredictable behaviors, but in general it is better to keep it simple and predictable.
There are a lot of great examples of cases where microservices are stateless or the databases are separate from the microservices. Frequently, you have many more microservices than you do databases or tables. There are a lot of applications that use microservices for the front end and API layers, and then have another place where things are stored. You also have to think about where to put databases, because running databases in the Kubernetes cluster can get interesting. This is an application architecture question, and that's always going to be specific to what your application is doing. Both can work, it just depends on what those services are doing and what they need.
There's a lot of information available on the Kubernetes website . There are also a lot of blogs about securing Kubernetes and different takes on what it means to secure Kubernetes . In addition, there are a lot of decisions that you make when you're deploying it that determine whether or not it's secure. Keep in mind that there are some defaults that probably don’t make sense from a Kubernetes security standpoint. Here are a few things you should think about:
You need to think about securing your network and what your network policies are, so if something happens, you can control your blast radius. You also need to think about role-based access control ( RBAC ) policies, taking a least privilege approach using permissions, cluster roles, and cluster role bindings. Out of the box, Kubernetes is not very secure.
Another great resource is the NSA Hardening Guide for Kubernetes . It provides a ton of information on where Kubernetes is insecure by default, and how to configure it securely.
The Kubernetes infrastructure deployed using cloud providers tends to provide authentication and authorization for people who are trying to manipulate the Kubernetes API itself. However, they don't do very much in terms of securing network communications in your cluster. For example, if you run a service mesh, the cloud providers don't do anything for authentication and authorization at a higher level, such as determining whether the user using your application can use a particular microservice. Security can mean lots of different things, and it's important to figure out which one you are talking about. A good start might be reading about Kubernetes security posture management , which can help you understand some of the challenges related to managing Kubernetes security.
The first thing you have to do when you deploy any workload into Kubernetes is say, "I need this much resources." And this number affects the Kubernetes scheduler, auto-scaling, bin packing, where things are run next to each other, whether your cluster is stable or not, and how much things cost in the end. If you can set your resource requests and limits correctly , then everything will run better in Kubernetes.
But it's a hard problem. It can be difficult to do application profiling, load testing, and identify how many resources you need for each pod. There are a lot of tools available to help you, both open source and paid. Goldilocks is an open source project that helps you get started with setting resource requests and limits, while Fairwinds Insights can help you do that at scale.
Kubernetes is gaining momentum, and organizations large and small are deploying more workloads into production environments. If you have questions or need help implementing Kubernetes, reach out to Fairwinds and our Slack Community to help you get started (we have a lot of open source projects designed to help you handle Kubernetes challenges as well as a free tier of our Insights platform ). And check out part one of this blog series to get more answers from us!