Sometimes, the safest system is the one that refuses to run.

Circuit breakers are often described as a way of “keeping things up.” That framing is incomplete. The real job is choosing the least-wrong outcome when you do not fully trust a dependency.

The Cost Comparison

When a dependency falters, you are forced to choose between two distinct costs:

Option Outcome Impact
1. No Answer Stop, wait, and retry later. Downtime / Latency
2. Wrong Answer Proceed with unreliable data. Permanent Inconsistency

The Rule of Thumb: If the cost of a “wrong answer” is higher than the cost of “no answer,” then turning the system off is a feature, not a failure.


When “OFF” is a Feature

In these scenarios, a flaky dependency is more than an error—it is a threat to the integrity of your system.

  • Payments and Billing: Flaky dependencies lead to double charges, missing ledger entries, or inconsistent refunds. These are trust events, not just logs.
  • Permissions and Identity: If you cannot validate access reliably, you should never guess.
  • Irreversible Side Effects: Sending emails, firing webhooks, or mutating permanent state.
  • Stale Data: A cached value is only acceptable if it is clearly labeled as a snapshot that can be reconciled later.

When to Keep Running

Conversely, continuing is often the right move if you can be transparent about the data’s state:

  • Read-heavy UIs showing the “last known good” state.
  • Non-critical analytics.
  • Tasks that can be safely queued for later processing.

The Decision Rubric

Before deciding whether to trip the breaker, ask these four questions:

  1. Is the action reversible?
  2. Can we reconcile the data later?
  3. Does stale look like “true” in a way that misleads?
  4. Is a wrong outcome a trust or compliance problem?

When the answers are “No, No, Yes, Yes,” the only correct behavior is to stop.


Final Thoughts

This logic is the core motivation behind Tripswitch. I want the choice to be explicit and observable, rather than buried in a mess of scattered retries and timeouts.

If we are going to keep running, we should be able to state clearly:

  • What we know.
  • What we don’t know.
  • What we chose to do about it.