Kubernetes provides several autoscaling mechanisms, and one of them is the Vertical Pod Autoscaler (VPA). Unlike the Horizontal Pod Autoscaler (HPA) which scales the number of pods, VPA adjusts CPU and memory requests/limits for individual pods.

This article breaks down when VPA makes sense, what advantages it brings, and the critical limitations you must understand before deploying it in production.


1. What Is Vertical Pod Autoscaler?

VPA automatically adjusts resource requests and limits for containers based on historical usage.

It can operate in three modes:

  • Off: only collects recommendations
  • Auto: updates pod specs and restarts them
  • Initial: applies recommendations only at pod creation

Example manifest:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"

2. Benefits of Using VPA

  • 🔄 Automated resource tuning: No more guesswork in setting requests/limits.
  • 📉 Reduced over-provisioning: Especially useful in dev/staging environments.
  • 🚫 Avoids OOM crashes: By increasing memory if under-allocated.
  • 🔍 Good for single-instance apps: Where HPA doesn’t make sense.

3. Downsides and Limitations

While VPA is useful, it comes with serious trade-offs:

  • 🔁 Restarts required: VPA applies changes by restarting pods, which can disrupt workloads.
  • ⚖️ Not compatible with HPA on CPU/memory: You can’t use both simultaneously on the same metric.
  • 🧠 No autoscaling for replicas: It doesn’t change pod count, only resources.
  • 📈 Slow to adapt: Recommendations are based on past metrics, not real-time spikes.
  • 🚫 No support for cronjobs or jobs: Only works with long-running pods like Deployments or StatefulSets.
  • 📉 Resource limits needed: Without initial limits, VPA won’t have enough data to make decisions.
  • 📦 Doesn’t track initContainers or ephemeral containers

4. Best Use Cases for VPA

  • CI/CD tools with variable usage
  • Internal APIs with stable traffic
  • Machine learning workloads with changing memory profiles
  • Background workers with steady CPU usage

5. Should You Use VPA?

Use VPA if:

  • You have long-running, stateful workloads
  • Your pods are often over- or under-requested
  • You need resource efficiency, not horizontal scale

Avoid it for:

  • Latency-sensitive applications
  • Microservices needing real-time scaling
  • Short-lived batch jobs

6. Conclusion

VPA is a powerful tool for right-sizing containers in Kubernetes, but comes with important limitations. Understanding its strengths and weaknesses helps you decide when it’s the right fit — or when HPA or KEDA might be better suited.