Helm Error: UPGRADE FAILED - Another Operation in Progress

Helm Error: UPGRADE FAILED - Another Operation in Progress When working with Helm in Kubernetes, you might encounter the following error: Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress This usually happens when a Helm release is already in the middle of an action, but another upgrade or rollback is triggered. Common Pending States Helm releases can get stuck in several states: pending-install — Helm started installing, but something went wrong before it finished. pending-upgrade — Helm tried to upgrade, but the process didn’t complete. pending-rollback — A rollback started but got stuck in the middle. These states prevent you from running another helm upgrade or helm rollback. ...

September 26, 2025 · 2 min · 258 words · John Cena

How to Automatically Restart Deployment on ConfigMap Change

By default, Kubernetes does not automatically restart a Deployment when its ConfigMap changes. This can lead to situations where your pods keep running with outdated configuration until you trigger a rollout manually. Fortunately, there are common patterns to solve this. Why It Happens Kubernetes mounts ConfigMaps into pods as files or environment variables, but the Deployment controller does not track changes in ConfigMap content. That means no automatic restart. Solution Checksum annotations: Add a hash of the ConfigMap into the Deployment’s pod template annotations. Example in Helm: ...

September 25, 2025 · 1 min · 197 words · John Cena

Changing Node IPs in Kubernetes: Why It's a Bad Idea and What to Do Instead

Changing the IP addresses of Kubernetes nodes is rarely a good idea — it can lead to broken networking, node unavailability, or even complete cluster failure. This article explains why you should avoid it, and provides a step-by-step recovery plan if you must do it. 1. Why Node IPs Matter Kubernetes heavily relies on the IP addresses of nodes for: Scheduling and node identity kubelet and API server communication CNI and network overlays DNS and service discovery TLS certificates tied to node IPs Changing an IP breaks all these associations — kubelet may fail to register, Pods may not communicate, and the control plane may mark the node as NotReady. ...

September 15, 2025 · 2 min · 347 words · DevOps Insights

Common etcd Errors and How to Fix Them

Introduction etcd is a distributed key-value store that plays a critical role in Kubernetes by storing cluster configuration and state. When etcd runs into problems, it can cause cluster instability or downtime. This article covers common etcd errors, their underlying causes, and actionable solutions. 1. etcdserver: request timed out ❓ Cause Occurs when etcd members can’t communicate efficiently, often due to network issues or disk I/O latency. 🛠️ Solution Check disk performance: iostat -xz 1 Ensure etcd data is on SSD storage. Check network latency and connectivity between cluster members: ping <etcd-member-IP> 2. etcdserver: leader changed ❓ Cause This is often seen when leadership changes too frequently, indicating instability in the etcd cluster. ...

September 13, 2025 · 2 min · 284 words · John Cena

How to Defend Against DDoS Attacks: Techniques for DevOps and Developers

DDoS (Distributed Denial of Service) attacks are among the most common threats to cloud-native infrastructure and APIs. They can flood your services with traffic, exhausting resources and causing downtime. In this article, we’ll explore effective strategies to prevent and mitigate DDoS attacks — from rate limiting to cloud-based protections. 1. What Is a DDoS Attack? A DDoS attack occurs when a network of compromised machines sends overwhelming traffic to a target server or service, aiming to exhaust bandwidth or system resources. ...

September 11, 2025 · 2 min · 278 words · DevOps Insights