Day 64 — Persistence & Storage: Path Management
Your engine crashed at 14:31:07 during a fsync write to
fills/AAPL.json. On restart, that file was 0 bytes.FillAuditLogloaded an empty list, decided there was nothing to reconcile, and your position drifted by 40 shares with no alert.
The Naive Approach
Every persistence path in a trading engine starts the same way: a dict of fill data, a file path built from string concatenation, and open(path, "w").
# WRONG — non-atomic write, OS-specific path string
def save_fills(symbol, fills):
path = "data/fills/" + symbol + ".json"
with open(path, "w") as f:
json.dump(fills, f)
This looks fine because it works every time you test it manually. The string-concatenated path breaks the moment you deploy to Windows (
/isn’t a path separator there) or thedata/fills/directory doesn’t already exist (FileNotFoundError). Neither of those is the real problem.
The Failure Mode in Detail
open(path, "w")truncates the destination file to zero bytes before it writes a single character of new content.json.dumpthen streams the new JSON in. If the process is killed between those two steps —SIGKILLfrom a container orchestrator, an OOM kill during a memory spike, a power loss on a bare-metal box — the file on disk is left at whatever state the write reached. Usually that’s an empty file or a truncated fragment.This never shows up in backtesting, because backtests don’t crash mid-write — they run to completion or they don’t run at all. It only shows up in live paper trading, usually during a high-fill-rate period (market open, a volatility spike) when the process is under enough memory or CPU pressure to get killed.
The observable symptom is worse than a crash:
FillAuditLog.load()either throwsjson.JSONDecodeError: Expecting value: line 1 column 1 (char 0), or — if your loader is defensive and returns[]on a parse error — it silently reports zero prior fills.PositionReconcilerthen has nothing to compare against the broker’s reported position, so it never raisesReconciliationError. The drift is real, but the safety mechanism designed to catch it just stayed quiet.
Github Link :
https://github.com/sysdr/discord-flux-p/tree/main/day60/flux-gateway-observability
The Shape of the Fix
The fix is to never let a half-written file become the live file. Write the new content to a sibling temp file first, flush it to disk, then atomically rename it over the destination.
# CORRECT — atomic write via temp file + Path.replace()
def save_fills(base_dir: Path, symbol: str, fills: list[dict]) -> None:
target = base_dir / "fills" / f"{symbol}.json"
target.parent.mkdir(parents=True, exist_ok=True)
tmp = target.with_name(f"{target.name}.tmp")
tmp.write_text(json.dumps(fills))
tmp.replace(target) # atomic rename — old or new, never partial
Path.replace() is implemented by the OS as a single rename operation. The destination file is either the old complete content or the new complete content — there’s no window where it’s half of either. A crash before tmp.replace(target) leaves the original file untouched and an orphaned .tmp file you can safely delete on next startup.
This snippet doesn’t cover what happens when two coroutines write to the same path concurrently — you’ll get two temp files racing for the same rename, and the loser’s data is gone. It also doesn’t handle fsync ordering (without fsync, “written” doesn’t mean “on disk” — a power loss can still lose it), and it doesn’t validate the file on read before trusting it. Those three gaps are exactly what the paid workspace closes.
Gate Check
Before moving on, verify your storage layer against these three assertions:
assert not Path("fills/AAPL.json.tmp").exists()— no orphaned temp file after a successful writeassert isinstance(loaded_payload["price"], str)— Decimal values hit disk as strings, never as floatassert read_json("missing.json")raises an exception — a missing or corrupted file must never silently return{}or[]
What’s in the Paid Post
This week’s paid post delivers the complete PathStorage workspace: atomic writes with fsync-before-replace, path-traversal protection on every resolved path, and a StorageCorruptionError boundary that PositionReconciler can actually catch — plus all 16 pytest gates green.




