How Tripswitch Evaluates Health

How Tripswitch Evaluates Health

Tripswitch decides whether a dependency is healthy or degraded based on reported signals, evaluated over time. This page explains how it makes that decision.

If you haven’t read Getting Started yet, start there.


Signals, Not Requests

Tripswitch does not sit in your request path. It never sees the actual calls your service makes.

Instead, your service reports samples — small data points that describe what happened after a call completes. Each sample includes a metric identifier, a value, and optional metadata.

Tripswitch evaluates these samples centrally to determine breaker state. It knows what you tell it. Nothing more.

This separation is intentional. Tripswitch provides a decision signal. Your service owns execution.


Evaluation Windows

Samples are evaluated over a rolling time window. A breaker configured with a 60-second window looks at the last 60 seconds of samples when deciding whether to trip.

Why windows?

  • Recency matters. A failure from five minutes ago is less relevant than one from five seconds ago.
  • Noise filtering. A single bad sample shouldn’t trip a breaker. A pattern of bad samples should.
  • Recovery detection. When samples improve, the window eventually contains only healthy data.

Windows are time-based. Samples older than the window are discarded from evaluation. The window slides forward continuously. Evaluation happens continuously as new samples arrive.


Thresholds

Thresholds turn aggregated samples into a decision.

A threshold defines what “bad” means for a given breaker.

For example:

  • Error rate > 0.1 means “more than 10% of samples are failures”
  • P95 latency > 500ms means “95th percentile response time exceeds 500ms”
  • Consecutive failures >= 5 means “five failures in a row without a success”

Thresholds are not one-size-fits-all. Too sensitive, and the breaker trips on noise. Too forgiving, and it trips too late to help.

Tuning thresholds requires understanding your dependency’s normal behavior and your tolerance for degradation.


State Transitions

A breaker has three states: Closed, Open, and Half-Open.

Closed → Open

The breaker is Closed when the dependency is considered healthy. Traffic flows normally.

When the rule condition is met — for example, error rate exceeds the threshold over the evaluation window — the breaker transitions to Open.

This is a trip. Tripswitch records the event, including the samples that triggered it.

Open → Half-Open

When a breaker is Open, Tripswitch stops using new samples to change state. The decision is made: the dependency is unhealthy.

After a cooldown period, the breaker transitions to Half-Open. This is a probing phase. The goal is to test whether the dependency has recovered.

If the breaker trips repeatedly, the cooldown increases via exponential backoff. This prevents hammering a struggling dependency.

Half-Open → Closed or Open

During Half-Open, a percentage of traffic is allowed through as probes. Outcomes are reported back to Tripswitch.

If probes succeed consistently over a confirmation period, the breaker transitions to Closed. Recovery is complete.

If any probe fails during confirmation, the breaker transitions back to Open. The cooldown resets with backoff.

For more on Half-Open behavior, see Half-Open: The Critical Recovery Phase.


What Tripswitch Knows

Tripswitch knows Example
Sample values Latency was 320ms
Sample timing The sample arrived at 14:32:01 UTC
Sample outcomes The call succeeded or failed
Sample metadata Trace ID, endpoint, tags

Tripswitch aggregates this data to compute statistics: averages, percentiles, error rates, counts.


What Tripswitch Does Not Know

Tripswitch does not know Why it matters
Dependency capacity It cannot tell if the dependency is at 10% or 90% load
Your fleet size It does not know how many instances exist. Breakdown views rely on reported metadata, not discovered topology.
Retry behavior It sees each attempt as a separate sample
Fallback logic It does not know what your code does when blocked
Request importance Tripswitch does not infer request importance. All samples are evaluated according to how you model breakers.

Tripswitch provides a signal. You provide the context.


Summary

Tripswitch evaluates health by aggregating samples over a time window and comparing the result to a threshold. When the threshold is exceeded, the breaker trips. When probes succeed during Half-Open, the breaker recovers.

The model is simple by design. Complexity belongs in your service, where you have the context to act on it.


Next Steps