Site Reliability Engineering (SRE) is not just about “keeping things up” — it’s about building systems that are reliable and understandable. At the heart of this idea lies a simple but powerful toolset: the four golden signals.

Let’s break them down in human terms — no jargon, just practical insights.

🚨 What Are the Golden Signals?

Golden signals are the four key metrics that Google’s SRE team recommends tracking for any user-facing service:

  1. Latency — how long does it take to handle a request?
  2. Traffic — how many requests are coming in?
  3. Errors — how many requests fail?
  4. Saturation — how close is your system to its limits?

🕒 1. Latency

This is how long your system takes to respond. A user clicks a button — how fast do they get a response?

💡 Tip: track latency for both successful and failed requests. A fast failure is better than a slow one.

Prometheus metric example:

http_request_duration_seconds

📈 2. Traffic

Traffic shows the volume of activity. It could be requests per second (RPS), active connections, user sessions, or data throughput.

Metric example:

http_requests_total

❌ 3. Errors

Errors are failed requests — 5xx codes, timeouts, logic exceptions, etc. Even a small percentage of errors can ruin the user experience.

Metric example:

http_requests_errors_total

💥 4. Saturation

Saturation means how “full” your system is — CPU, memory, disk I/O, database connections. If you’re constantly near 100%, you’re living dangerously.

Metric examples:

node_cpu_seconds_total
container_memory_usage_bytes

🛠 How to Use Golden Signals

To make the most of them:

  • Collect these metrics with Prometheus, Datadog, New Relic, etc.
  • Build dashboards for each signal.
  • Set up alerts for threshold breaches.
  • Watch for trends — rising latency, creeping saturation, error spikes.

🎯 Final Thoughts

If you only track four things, track these.

Golden signals give you the fastest feedback when your system is in trouble — or about to be. They won’t tell you everything, but they’ll get you 80% of the way there.

Share this post with your team and follow the blog for more SRE/DevOps insights!