Optimizing etcd on Slow Disks in Kubernetes
In Kubernetes, etcd is the central database that stores the entire cluster state.
If etcd runs on slow disks, you might notice performance issues: API requests slow down, pods take longer to schedule, and sometimes the cluster feels “laggy.”
Why etcd Struggles on Slow Disks
etcd is very I/O-intensive. Each write goes to disk to guarantee consistency.
On spinning HDDs or cheap cloud disks with poor IOPS, etcd can quickly become a bottleneck.
Typical symptoms:
- Slow
kubectl
responses - Pods stuck in
Pending
- Increased API server latency
- High disk usage in
/var/lib/etcd
Running Defragmentation
etcd keeps a history of changes (MVCC). Over time, the database grows, even if old keys are deleted.
This is why etcd provides defrag, which compacts the storage.
Example:
ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
defrag
Best Practices for Slow Disks
- Defragment regularly – prevents DB bloat.
- Enable quotas – set –quota-backend-bytes to control etcd database size.
- Move etcd to faster disks if possible (SSD/NVMe).
- Monitor latency – use metrics (etcd_disk_wal_fsync_duration_seconds).
- Avoid running etcd with noisy neighbors – dedicate resources.
Conclusion
Running etcd on slow disks is risky, but with proper defragmentation, quotas, and monitoring, you can keep the cluster responsive. If your cluster is critical, always prefer fast SSD storage for etcd.