Day 11 — Risk Limits: Hard-Coding Max-Drawdown Circuit Breakers

Mar 08, 2026

A circuit breaker that doesn’t know its own state is not a circuit breaker. It’s a bug that occasionally does the right thing.

What We’re Building

Think of a circuit breaker in your house. When too much current flows, it trips — once — and it stays tripped until a person physically resets it. It doesn’t re-trip 32 times while the electrician is still at the door.

Today we build the software version of that. When your trading strategy loses more than 5% from its peak value, we want the system to:

Stop accepting new trades, immediately
Close all open positions, exactly once
Stay closed until a human decides to re-open it

That sounds simple. It isn’t. By the end of this lesson you will understand why naive implementations cause real financial losses, and you will have a production-grade solution running on your machine.

Part 1 — The Problem

The “Flag-in-a-Loop” Trap

Here is what a junior engineer writes at 2am after reading about drawdown protection:

MAX_DRAWDOWN = 0.05
peak_value = 100_000.0

def on_tick(current_value: float):
    global peak_value
    peak_value = max(peak_value, current_value)
    drawdown = (peak_value - current_value) / peak_value
    if drawdown > MAX_DRAWDOWN:
        close_all_positions()   # blocks, not idempotent
        print("BREAKER TRIPPED")

This will blow up your account. Not metaphorically — literally. During a fast-moving market event, on_tick may be called 400 times per second. The breaker trips on call #1. close_all_positions() fires an HTTP request to Alpaca. Before it returns (say, 80ms at paper-trading latency), on_tick has been called 32 more times. You have just issued 32 identical liquidation orders. Alpaca will fill as many as inventory allows. You will be short your own positions. Margin call follows.
The second failure is subtler. peak_value is a bare Python float. Python floats are IEEE 754 double-precision, which carries ~15–17 significant decimal digits. After 10,000 tick updates with fractional P&L, your cumulative drawdown figure has drifted. You’re checking 0.04999999999998 > 0.05 — it never fires. Your strategy loses 7% before you notice.
The third failure: there is no state. The breaker has no concept of “already tripped.” After positions are closed, the next tick re-arms and re-trips instantly, firing another round of close orders. This is not a circuit breaker. It’s a bug that occasionally does the right thing.

The Failure Mode: Three Technical Crashes

1. Non-Idempotent Action Under Re-entrancy

close_all_positions() is a side-effectful, stateful operation. Calling it N times is not equivalent to calling it once. Without an idempotency guard, every duplicate call fights the previous one. Order fills arrive out of order. Your position state becomes undefined.

2. Float Precision Drift in Cumulative P&L

Don’t use float for money logic. Use decimal.Decimal with a fixed precision context, or represent all monetary values in integer basis points (cents * 100). The error of float addition is O(ε * N) where ε ≈ 2.2e-16 and N is tick count. At 100k ticks/day, you accumulate ~22 nanoseconds of drift per dollar. Sounds small — but your trigger threshold is also a float, and the error amplifies at the comparison boundary.

3. Missing State Machine → Re-entry and Thrash

A circuit breaker that doesn’t track its own state is not a circuit breaker. It’s a conditional. You need explicit states: ARMED → MONITORING → TRIGGERED → COOLING_DOWN → RESET. Any action taken in TRIGGERED state must be guarded by a re-entrancy lock. Any transition must be atomic.

Part 2 — The Architecture

The AutoQuant-Alpha Architecture

We implement a DrawdownCircuitBreaker as a full state machine, separated from execution logic. The key architectural decisions:

TickStream ──► P&L Calculator ──► CircuitBreaker.evaluate()
                                        │
                    ┌───────────────────┼────────────────────┐
                    ▼                   ▼                    ▼
               MONITORING          TRIGGERED           COOLING_DOWN
            (update HWM)       (liquidate once,     (no new orders,
                                 lock acquired)       await reset)

Core principles:

Single-writer high-watermark (HWM): The HWM update and drawdown check are atomic under one threading.Lock. No two threads can read-then-write the HWM simultaneously.
Idempotent trigger: State transitions are one-way until explicit reset. TRIGGERED → MONITORING requires an operator action (manual reset or cooldown expiry).
Decimal P&L: All monetary math uses decimal.Decimal with ROUND_HALF_UP. The trigger threshold is stored as Decimal("0.05") — not 0.05.
Rate-limited cascade close: The liquidation function batches cancel + market-close orders with exponential backoff, respecting Alpaca’s 200 req/min limit.

Part 3 — Implementation Deep Dive

The State Machine Core

from enum import Enum, auto
from decimal import Decimal, ROUND_HALF_UP, getcontext
from threading import Lock
from dataclasses import dataclass, field
from typing import Callable
import time

getcontext().prec = 28  # 28 significant digits — institutional standard

class BreakerState(Enum):
    ARMED        = auto()  # pre-market, no HWM set
    MONITORING   = auto()  # active, tracking drawdown
    TRIGGERED    = auto()  # breaker fired, liquidation in progress
    COOLING_DOWN = auto()  # post-trigger, waiting for reset window
    HALTED       = auto()  # operator halt, requires manual reset

@dataclass
class DrawdownCircuitBreaker:
    max_drawdown: Decimal
    cooldown_seconds: float
    on_trigger: Callable[[], None]
    
    _state: BreakerState = field(default=BreakerState.ARMED, init=False)
    _hwm: Decimal = field(default=Decimal("0"), init=False)
    _trigger_time: float = field(default=0.0, init=False)
    _lock: Lock = field(default_factory=Lock, init=False)
    _trigger_count: int = field(default=0, init=False)

    def evaluate(self, current_equity: Decimal) -> BreakerState:
        with self._lock:
            return self._evaluate_locked(current_equity)

    def _evaluate_locked(self, equity: Decimal) -> BreakerState:
        match self._state:
            case BreakerState.ARMED:
                self._hwm = equity
                self._state = BreakerState.MONITORING
                
            case BreakerState.MONITORING:
                if equity > self._hwm:
                    self._hwm = equity
                drawdown = (self._hwm - equity) / self._hwm
                if drawdown >= self.max_drawdown:
                    self._state = BreakerState.TRIGGERED
                    self._trigger_count += 1
                    self._trigger_time = time.monotonic()
                    self.on_trigger()  # called under lock — must be non-blocking!
                    
            case BreakerState.TRIGGERED:
                elapsed = time.monotonic() - self._trigger_time
                if elapsed >= self.cooldown_seconds:
                    self._state = BreakerState.COOLING_DOWN
                    
            case BreakerState.COOLING_DOWN:
                pass  # awaiting manual reset
                
        return self._state

Notice the match statement — this is a Python 3.10+ feature. Each state has its own branch, and there is no way to accidentally fall through from one state to another. If _state is TRIGGERED, the MONITORING branch never runs. The idempotency guard is built into the structure itself, not bolted on with an if check.

Critical: `on_trigger` Must Be Non-Blocking

The on_trigger callback is invoked while holding _lock. If it blocks (HTTP request, time.sleep), you deadlock every subsequent evaluate() call. The pattern:

# WRONG: blocks under lock
def on_trigger():
    alpaca_client.close_all_positions()  # 80ms HTTP call — deadlock

# CORRECT: enqueue to a dedicated liquidation thread
import queue
_liquidation_queue: queue.Queue = queue.Queue(maxsize=1)

def on_trigger():
    try:
        _liquidation_queue.put_nowait("LIQUIDATE")  # non-blocking, O(1)
    except queue.Full:
        pass  # already queued, idempotent

The liquidation worker runs in a separate threading.Thread with a maxsize=1 queue. Duplicate trigger calls are silently dropped. The worker handles rate limiting and retries independently of the tick processing loop.

Why maxsize=1? Because if the queue can hold 10 items, 10 liquidation attempts will run sequentially. With maxsize=1, the second call to put_nowait raises queue.Full and we discard it. One liquidation, exactly once.

High-Watermark Precision Test

Here is the kind of test that separates production code from notebook code:

from decimal import Decimal

# Float version — will give wrong answer
hwm_float = 100_000.0
for _ in range(10_000):
    hwm_float += 0.001
drawdown_float = (hwm_float - 100_000.0) / hwm_float  # ≠ 0.1 exactly

# Decimal version — exact
hwm_dec = Decimal("100000.0")
increment = Decimal("0.001")
for _ in range(10_000):
    hwm_dec += increment
drawdown_dec = (hwm_dec - Decimal("100000.0")) / hwm_dec
assert drawdown_dec == Decimal("0.1") / (Decimal("1") + Decimal("0.1"))

Run this yourself. The float version will not raise an assertion error here because 10% is a round enough number — but change the threshold to 0.05 and add noise, and it starts failing randomly. That randomness is what kills real accounts.

Part 4 — Production Readiness

Metrics to Watch

Metric Target Measurement Point Breaker Evaluation Latency < 500µs P99 time.perf_counter() around evaluate() HWM Update Drift 0 bps Compare Decimal vs float HWM over 1M ticks Trigger-to-Cancel Latency < 2s From on_trigger() to Alpaca cancel confirmation False Trigger Rate 0% Log trigger count vs expected trigger count Liquidation Fill Rate 100% of open positions Compare pre-trigger position count to post-cooldown Lock Contention < 0.1% of evaluations blocked Track Lock.acquire() wait time histogram

Lock contention is the silent killer at scale. If you’re running 3 strategies simultaneously with a shared breaker, measure contention rate. Above 0.5%, shard the breaker per strategy.

Part 5 — Build, Run, and Test

Github Link:

https://github.com/sysdr/quantpython-p/tree/main/day11/autoquant-day11

What the Workspace Generator Creates

Before touching your terminal, understand what you’re about to generate. Running generate_workspace.py scaffolds a full project with this layout:

autoquant-day11/
├── .env.example                  # Alpaca credentials template
├── README.md
├── src/
│   ├── circuit_breaker.py        # The state machine (core logic)
│   ├── liquidation.py            # Daemon thread + rate-limited API calls
│   ├── equity_tracker.py         # Decimal equity snapshots from Alpaca
│   ├── mock_client.py            # Deterministic test double — no real API needed
│   ├── demo.py                   # Runnable crash/normal/recovery scenarios
│   └── dashboard.py              # Live Rich terminal dashboard
├── tests/
│   ├── test_circuit_breaker.py   # Unit tests: state, precision, concurrency, latency
│   └── test_stress.py            # 1M-tick stress test + throughput benchmark
└── scripts/
    ├── start.sh
    ├── demo.sh
    ├── verify.sh
    └── cleanup.sh

Every file is real, working code. There are no placeholders.

Step 1 — Check Your Python Version

python --version

You must see Python 3.11.x or higher. The match statement used in the state machine was introduced in 3.10, and the type hint syntax requires 3.11+. If you see an older version, install 3.11 first.

Step 2 — Install Dependencies

pip install alpaca-py rich pandas numpy python-dotenv pytest

What each package does here:

alpaca-py — the official Alpaca SDK for order management and account data
rich — the terminal dashboard renderer used in dashboard.py
pandas / numpy — data manipulation for the stress test scenarios
python-dotenv — loads your .env API keys without hardcoding them
pytest — the test runner

Step 3 — Generate the Project

Download generate_workspace.py and run it once from whatever folder you want to work in:

python generate_workspace.py

You should see:

Generating workspace at: /your/path/autoquant-day11
  ✓  README.md
  ✓  .env.example
  ✓  src/__init__.py
  ✓  src/circuit_breaker.py
  ✓  src/liquidation.py
  ✓  src/equity_tracker.py
  ✓  src/mock_client.py
  ✓  src/demo.py
  ✓  src/dashboard.py
  ✓  tests/__init__.py
  ✓  tests/test_circuit_breaker.py
  ✓  tests/test_stress.py
  ✓  scripts/start.sh
  ...

===================================================
Workspace generated. Next steps:
  cd autoquant-day11
  bash scripts/start.sh
===================================================

Then move into the project folder:

cd autoquant-day11

Step 4 — Configure Your Alpaca Keys (Optional)

The demo and tests work perfectly without real API keys — the MockAlpacaClient handles everything offline. But if you want to connect to a live paper trading account:

cp .env.example .env

Open .env and replace the placeholder values:

ALPACA_API_KEY=your_paper_api_key_here
ALPACA_SECRET_KEY=your_paper_secret_key_here
ALPACA_BASE_URL=https://paper-api.alpaca.markets

Get your paper trading keys from the Alpaca dashboard at alpaca.markets. Paper trading is free and uses simulated money — nothing real is at risk.

If .env is not configured, the system automatically falls back to MockAlpacaClient and logs a note telling you so.

Step 5 — Run the Unit Tests

Before running the demo, make sure everything is correct:

python -m pytest tests/ -v

You should see all 9 tests pass:

tests/test_circuit_breaker.py::TestStateTransitions::test_armed_to_monitoring_on_first_tick PASSED
tests/test_circuit_breaker.py::TestStateTransitions::test_hwm_updates_on_equity_increase PASSED
tests/test_circuit_breaker.py::TestStateTransitions::test_triggers_at_exact_threshold PASSED
tests/test_circuit_breaker.py::TestStateTransitions::test_does_not_trigger_below_threshold PASSED
tests/test_circuit_breaker.py::TestStateTransitions::test_triggered_to_cooling_down_after_cooldown PASSED
tests/test_circuit_breaker.py::TestIdempotency::test_on_trigger_called_exactly_once PASSED
tests/test_circuit_breaker.py::TestPrecision::test_decimal_hwm_no_float_drift PASSED
tests/test_circuit_breaker.py::TestConcurrency::test_concurrent_evaluations_no_race PASSED
tests/test_circuit_breaker.py::TestLatency::test_p99_latency_under_500us PASSED
tests/test_stress.py::test_1m_ticks_no_trigger_in_normal_market PASSED
tests/test_stress.py::test_trigger_fires_at_correct_drawdown PASSED
tests/test_stress.py::test_evaluation_throughput PASSED

12 passed in X.XXs

If any test fails, read the error message before touching the implementation. Pytest tells you exactly which assertion failed and what values were involved. Fix the root cause, not the assertion.

Step 6 — Run the Crash Scenario Demo

This simulates a market crash that pushes your account from $100,000 down past the 5% drawdown threshold:

python src/demo.py --scenario crash --drawdown 0.052 --ticks 120

Watch the log output. You will see the account equity drift downward over 120 ticks until it crosses the 5% threshold. At that point the breaker fires, the liquidation worker runs in its daemon thread, and the tick loop stops.

Your output must contain these lines:

[BREAKER] State: MONITORING → TRIGGERED
[BREAKER] HWM: $102,340.00 | Current: $97,122.00 | Drawdown: 5.10%
[LIQUIDATION] Cancelling 3 open orders...
[LIQUIDATION] Closing 2 positions: AAPL, TSLA
[BREAKER] State: TRIGGERED → COOLING_DOWN (cooldown: 300s)
[METRICS] Trigger latency: 212µs | Cancel latency: 847ms

Try the other scenarios to understand normal and recovery behaviour:

# Market moves randomly — breaker should never fire
python src/demo.py --scenario normal --ticks 200

# Equity rises — HWM should track upward correctly
python src/demo.py --scenario recovery --ticks 100

Step 7 — Launch the Live Dashboard

For a visual real-time view of everything happening tick by tick:

python src/dashboard.py

The dashboard uses the Rich library to render a live terminal panel that updates 10 times per second. You will see:

Breaker state (colour-coded: green for MONITORING, red for TRIGGERED)
Current equity vs the high-watermark
Drawdown percentage with a colour gradient
A rolling sparkline showing equity movement
P99 evaluation latency counter
Trigger count

Press Ctrl+C to exit at any time.

Step 8 — Run the Stress Test in Isolation

The stress test simulates 1,000,000 ticks of a noisy-but-healthy market. The breaker must not fire once:

python -m pytest tests/test_stress.py::test_1m_ticks_no_trigger_in_normal_market -v -s

This also validates throughput — the system must process more than 50,000 evaluations per second on a single thread. On most modern laptops you will see 200,000–500,000/sec.

Lifecycle Scripts

For convenience, the generator also creates shell scripts:

bash scripts/start.sh     # install deps, confirm setup
bash scripts/demo.sh      # run the standard crash demo
bash scripts/verify.sh    # run the full test suite
bash scripts/cleanup.sh   # remove __pycache__ and .pyc files

Part 6 — Gates and Success Criteria

You pass Day 11 when your output satisfies all six of the following gates. All six are required. There is no partial credit.

Gate 1 — Correct State Transition

Your demo log must contain this exact sequence (order matters):

state=ARMED      → tick 1
state=MONITORING → tick 2 onward
state=TRIGGERED  → the tick where drawdown first hits ≥ 5.00%
state=COOLING_DOWN (after cooldown_seconds elapsed)

Gate 2 — Exact Drawdown Value at Trigger

At the TRIGGERED tick, your log line must show a drawdown value computed using decimal.Decimal, not float. Verify it with:

from decimal import Decimal
drawdown_at_trigger = Decimal("0.05010")   # replace with your logged value
assert drawdown_at_trigger >= Decimal("0.05"), "Drawdown below threshold at trigger"
assert drawdown_at_trigger < Decimal("0.20"), "Drawdown suspiciously large — check HWM logic"

Gate 3 — on_trigger() Called Exactly Once

Even if your demo runs 120 ticks after the breaker fires, trigger_count must equal 1:

trigger_count=1

Gate 4 — Evaluation P99 Latency Under 500µs

Your log must show:

P99 eval latency: XXX µs    ← must be < 500

If your machine shows more than 500µs, you likely have a blocking call inside _step() or a print() statement running under the lock.

Gate 5 — Liquidation Fired Independently

The liquidation result line must appear after the tick loop completes, proving it ran in a separate thread:

Liquidation: cancelled=3 closed=2 duration=XXX.Xms errors=0

errors=0 is required.

Gate 6 — All Tests Pass

python -m pytest tests/ -v --tb=short 2>&1 | tail -3

Final line must read: 12 passed in X.XXs

Fail Conditions

Symptom Root Cause trigger_count > 1 Missing idempotency guard on state transition P99 > 500µs Blocking I/O or print() under _lock drawdown is a float You didn’t use decimal.Decimal errors > 0 in liquidation Retry logic or mock client broken Concurrent test fails intermittently Race condition — _lock not covering full update-and-check test_decimal_hwm_no_float_drift fails Using float somewhere in the HWM calculation path

Part 7 — Homework: Production Challenge

The current implementation uses a single global HWM from strategy start. Production systems use a rolling 30-day HWM — the peak is reset daily but the drawdown is measured against the peak of the trailing 30 sessions.

Your task:

Implement RollingDrawdownCircuitBreaker that accepts a window_days: int parameter.
Store per-day equity snapshots in a collections.deque(maxlen=window_days).
The effective HWM is max(deque). Update at market close (4:00 PM ET).
Write a stress test that seeds 30 days of synthetic equity data, then injects a 6% single-day drawdown on day 31. Assert breaker fires exactly once.
Benchmark: HWM computation for window_days=252 (1 trading year) must complete in < 100µs. Hint: a deque max is O(N). You need a monotonic deque.

The monotonic deque pattern reduces HWM lookup from O(N) to O(1) amortized. This is the same pattern used in sliding-window maximum problems in competitive programming — and it’s exactly how institutional risk engines track intraday high-watermarks at microsecond resolution.

Quant Python: Architecting Autonomous Trading Systems

Discussion about this post

Ready for more?