Optimizing etcd on Slow Disks in Kubernetes

In Kubernetes, etcd is the central database that stores the entire cluster state.
If etcd runs on slow disks, you might notice performance issues: API requests slow down, pods take longer to schedule, and sometimes the cluster feels “laggy.”


Why etcd Struggles on Slow Disks

etcd is very I/O-intensive. Each write goes to disk to guarantee consistency.
On spinning HDDs or cheap cloud disks with poor IOPS, etcd can quickly become a bottleneck.

Typical symptoms:

  • Slow kubectl responses
  • Pods stuck in Pending
  • Increased API server latency
  • High disk usage in /var/lib/etcd

Running Defragmentation

etcd keeps a history of changes (MVCC). Over time, the database grows, even if old keys are deleted.
This is why etcd provides defrag, which compacts the storage.

Example:

ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  defrag

Best Practices for Slow Disks

  • Defragment regularly – prevents DB bloat.
  • Enable quotas – set –quota-backend-bytes to control etcd database size.
  • Move etcd to faster disks if possible (SSD/NVMe).
  • Monitor latency – use metrics (etcd_disk_wal_fsync_duration_seconds).
  • Avoid running etcd with noisy neighbors – dedicate resources.

Conclusion

Running etcd on slow disks is risky, but with proper defragmentation, quotas, and monitoring, you can keep the cluster responsive. If your cluster is critical, always prefer fast SSD storage for etcd.