Day 17 — JSON Schemas: Pydantic Models for Trade Records
What You Will Build Today
By the end of this lesson you will have a working system that takes raw, unpredictable JSON from a real broker API and turns it into clean, guaranteed-correct Python objects — the kind of foundation that every serious trading system depends on. You will also understand why getting this wrong destroys accounts, and exactly how to get it right.
Learning Goals
Understand why raw dictionaries are dangerous in financial systems
Build a
TradeRecordmodel using Pydantic v2 with strict financial invariantsImplement a thread-safe, memory-bounded
TradeLogConnect your schema to real Alpaca Paper Trading API responses
Measure validation performance and set meaningful production alerts
Part 1 — The “Dict Soup” Trap
Let’s start with what a beginner — or honestly, a lot of intermediate programmers — would write when they first get fill notifications from a broker API.
import json
def process_fill(raw: str) -> dict:
record = json.loads(raw)
pnl = (record["filled_avg_price"] - record["cost_basis"]) * record["filled_qty"]
return {"symbol": record["symbol"], "pnl": pnl}
That code looks completely fine. It is clean, readable, and short. It passes unit tests on synthetic data. It clears the backtest. Then it hits a real Friday afternoon near market close during an earnings release — and then:
filled_qtycomes back as"10"(a string, not a number). Python silently multiplies. The P&L is garbage, and nothing crashes to warn you.cost_basisisnullon a short position. ATypeErrorfires. The process dies mid-trade.filled_avg_priceis0.0on a partial fill still pending settlement. A stop-loss triggers on a phantom price.
None of these are bugs in your code in the traditional sense. There is no syntax error. The logic is correct — it is just that you let unvalidated data from a third party, over a network, subject to undocumented changes flow directly into financial math. That is the bug.
Part 2 — Why This Breaks: The Missing Validation Boundary
Here is the deeper technical problem. Alpaca’s v2/orders endpoint has historically returned:
filled_qtyas bothstringandfloatdepending on API version and order typenullforfilled_avg_priceon GTD orders that have not triggered yetISO 8601 timestamps with and without millisecond precision depending on the event source
A plain Python dict enforces nothing. By the time bad data surfaces five call frames deep, the original payload is gone and the stack trace tells you nothing useful. You are debugging a symptom, not the cause.
The second problem is numeric precision. When JSON numbers become Python floats:
>>> 0.1 + 0.2
0.2999999999999999
>>> sum(0.1 for _ in range(10_000))
999.9999999998124 # 876 microunits of drift
At ten thousand fills per day, this is not a theoretical concern. It is a measurable P&L discrepancy that will appear in end-of-day reconciliation with no obvious cause — because the drift accumulated one fill at a time, invisibly, all day long.



