Blog
Field Note #1
Why Your Load Shedder Won't Save Your Caller
Server-side rate limiting and service meshes manage network connectivity. They cannot manage application resilience — and the difference shows up at 2am.
The Witness Problem: Circuit Breakers in a Distributed Fleet
A circuit breaker that cannot compare notes with its peers is not a fleet-level protection mechanism. It is a local sensor you forgot to aggregate.
Recovery Is a Confidence Curve
Stop treating recovery like a switch and start treating it like a process.
Circuit Breakers Without Lies: What the UI Should Prove
If the UI is certain, it should prove it.
How Retries Become Incidents
Retries without bounds are an incident multiplier.
Recovery Is a Product Decision
Half-open should be treated like a controlled experiment
The "No Data" Problem in Breakers
Silence is not evidence of recovery.
Fail Open vs Fail Closed: Pick the Default, Then Earn Exceptions
Pick a default, then make exceptions reviewable.
Degrade vs. Stop: A Simple Decision Table
A simple table for choosing "degrade" vs "stop."
When Refusing to Run Is the Safer Choice
Because a wrong answer is more expensive than no answer.