Before we dive too deep, we should clarify a couple things about the types of scaling available, horizontal and vertical. Horizontal scaling means you add hardware instances to your pool of resources in order to grow the number of application instances you can run at any one time. Vertical scaling means you add more power (CPU and RAM) to the existing hardware instances you manage for the same reason. In the first case, you have more instances; in the second, you have bigger instances.
A lot of companies don’t differentiate between the two in their day-to-day operations – or optimize them for that matter.
For example, an application might be single threaded and CPU bound (meaning it can’t take advantage of more than one CPU core) and require a decent amount of memory. To scale this application, you’ll need to run more instances of it. You can either run more instances of the application on each hardware instance (vertical scaling), or you can continue to run a single application instance per machine but spawn more hardware instances to support them (horizontal scaling). If you choose to horizontally scale (the most common method I see), you then run into instance-sizing issues. Since cloud providers have a set offering of instance sizes, you must choose a size large enough to provide the amount of RAM your application needs, which means you will often have CPU resources that are sitting idle. As a result, you end up buying expensive pieces of hardware and using only half (or less than half) of them. This type of scaling is also slow because it takes time to spin up every instance, which gets in the way of your ability to respond to huge surges in traffic.
Let’s say you spin up a bunch of instances because you decide to run a Super Bowl commercial or promote a major Black Friday event. You end up spending a lot of money to support your infrastructure needs on that one big day, and those same resources would go unused or underused most other days. If you have the same infrastructure on Black Friday as you do on an average Tuesday in March, you’ll have variable performance and waste money.
Fortunately, there’s another way. Instead of having an infrastructure with large instances that are half used, containerization enables you to manage a more focused infrastructure made up of a controlled number of clustered host nodes. In the Kubernetes world, you have a communal pool of node resources – CPUs and RAM. Since in this case your application is containerized and not beholden to hardware limitations in the same way as before, you can fit multiple replicas of it into the cluster. Rather than spinning up a new machine for every application instance, you can stack a bunch of application replicas on the same machine. Kubernetes even allows you to designate very tiny measurements of CPU and RAM (such as one-thousandth of a CPU or KiB of RAM), so you can specify exactly what your application needs. This level of precision in resource allocation is extremely empowering.
As your business grows, your infrastructure and application can grow with it. And as you grow, you can evolve the limits and requests of your container without having to redeploy your infrastructure. In other words, you can be more flexible about how your infrastructure responds as your applications change over time. You’ll also waste less money because you’ll be making full use of all of your resources.
Picture a company that’s running hundreds of instances to keep up with demand. Because their application is not containerized and can’t be run in a clustered/parallel fashion, they’re spending thousands of dollars a month on their infrastructure. By containerizing their application and moving to a container platform like Kubernetes, they could reduce the number of physical instances dramatically and use the pool of resources completely, which results in immense savings.
Finally, a focused, flexible infrastructure is also a more portable cluster. During the AWS outage on February 28, 2017, the entire us-east-1 regional data center lost the ability to spawn new instances. For companies relying on horizontal instance scaling, this was crippling. Furthermore, the same outage caused cascade failures of several key AWS services. Imagine if the region hadn’t recovered and the hypothetical company above had had to move their infrastructure to a different region or cloud provider to get back online quickly – it would have been a massive undertaking. Proper resource management and scaling cloud infrastructure would have enabled faster, simpler portability of a more focused cluster of instances.
This is an important topic. In case you want to learn more, here’s a useful presentation: “Everything You Ever Wanted to Know on Resource Scheduling … Almost” (created by Tim Hockin at Google).