Member-only story

Working with HPAs in Kubernetes

5 min readFeb 8, 2023

How to make your Kubernetes workloads scale with a few simple steps

What is an HPA?

HPAs (horizontal pod autoscalers) are one of the two ways to scale your services elastically within Kubernetes. In the event that your pod is under sufficient load, then you can scale up the number of pods in use. You can also scale down in the event that your pods are underutilized, thereby freeing up resources within your cluster.

The other way to scale, a VPA or vertical pod autoscaler, simply allocates more resources to your pods. We’ll ignore that for this deep dive.

Remember:

Horizontal -> More replicas of the same thing
Vertical -> A larger amount of resources for one thing

So how do we tell the HPA to scale up? Well, first we’ll need to have metrics about the pods themselves. That means we need.

The Metrics Server

The most basic way to get resources exported in a Kubernetes cluster is to run the metrics server. It exports metrics at a minimum of 15 seconds, so there’s a bit of a delay, but it’s good enough for our HPA needs.

The installation is very simple if you have cluster-admin privileges.

As an apply:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

As a helm install:

helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/
helm upgrade --install metrics-server metrics-server/metrics-server

You can see if it’s running by using

kubectl get deployment metrics-server -n kube-system

And in any namespace with runing commands like kubectl top pods will now return the memory and CPU usage for running pods.

There are more advanced ways to configure an HPA metrics source, such as Prometheus, but the metrics server is the simplest way to get started with pod scaling.

Working with HPAs in Kubernetes

What is an HPA?

The Metrics Server

What are resource limits and why do they matter for HPAs?

Written by Matt Kornfield

No responses yet