Most breaker logic is built around one assumption: you have enough observations to judge health. Then reality shows up.
Sometimes you have no samples. No requests. No probes. No events. Just silence.
Silence is not a signal. It does not mean healthy, and it does not mean unhealthy. It means Unknown. The mistake teams make is forcing “Unknown” into a binary outcome. That is how you manufacture failures and accidentally declare false recoveries.
Why “No Data” Happens
- Total Blackout: The dependency is down so hard you cannot even attempt calls.
- Zero Traffic: Natural drops due to time of day, feature flags, or tenant behavior.
- Infrastructure Issues: Probes are misconfigured or rate limited.
- Self-Inflicted: You cut off calls during the OPEN state, so you stopped collecting new evidence.
That last one matters: A breaker can create the conditions for “no data,” which means your policy must account for it.
The Two Wrong Moves
1. “No data means healthy.”
You move from OPEN → HALF_OPEN → CLOSED with zero evidence. You re-enable traffic because time passed, not because the dependency improved.
2. “No data means failure.”
You punish systems for low traffic or intentional gating. You trip the breaker again without any new evidence of a fault.
Both are guesses. The honest answer is: We do not know yet.
A Better Default: Treat “Unknown” as a First-Class State
Once you accept Unknown as a state, policy becomes clearer. When you have no data, the system should:
- Hold Position: Stay where you are and wait for evidence.
- Probe: Actively gather evidence using controlled, isolated probes.
- Honest Degradation: Degrade in a way that doesn’t require guessing (queueing or explicit “unknown” labels).
The Mantra: Time passing is not evidence. Time passing is only a reason to try to collect evidence again.
State-Specific Hazards
In OPEN
If you are OPEN and not attempting calls, you will produce no new samples. Your policy should not interpret this as improvement. It is simply a state of “still unknown.”
In HALF_OPEN
This is the danger zone where people are most tempted to auto-close. A sane approach requires meeting a threshold before declaring a system healthy:
- A minimum number of successful samples.
- A minimum duration plus at least one successful probe.
- A structured, bounded probe plan.
If you do not meet these criteria, you do not declare healthy. You remain in HALF_OPEN or revert to OPEN.
The Product Implication
If your system returns a “normal” response while the breaker is in an unknown condition, you are lying by omission.
While this might be acceptable for some reads (if marked as stale), it is never acceptable for high-stakes actions. “No recent samples” is a fundamentally different message than “Healthy.” Users can reason about that difference—but only if you are transparent enough to show it.