Original build
The project started as a live crypto options surface.
The proof of concept was simple on paper: ingest live options market data, compute implied volatility, fit an SVI surface, and stream smiles, skews, risk reversals, butterflies, and diagnostics to a frontend.
That description hides most of the real work. The surface fit mattered, but the surrounding system determined whether the output stayed usable while markets were moving.
- Consume websocket market data.
- Track live order-book state.
- Recalculate implied volatility.
- Maintain smile state by expiry.
- Calibrate an arbitrage-aware SVI surface.
- Stream surface updates to the dashboard.
First version
The single-exchange version held together until the workload changed.
The first version connected to one exchange, ran on a medium EC2 instance, and maintained a live surface well enough to prove the architecture was directionally sound.
Then the second exchange was added. The visible symptoms looked like websocket reliability problems: delayed heartbeats, messier reconnects, stale UI updates, and queues backing up. Profiling showed that the transport was not the real issue.
Root cause
The system had quietly moved from I/O-bound to CPU-bound.
Adding a venue did not just add messages. It multiplied downstream work. Every update could trigger order-book aggregation, implied-volatility recalculation, ATM refreshes, Greeks, smile state, fit preparation, arbitrage validation, surface patch generation, frontend broadcasts, persistence writes, and lifecycle logging.
Once CPU became saturated, the connection layer started missing its own timing obligations. Heartbeats looked unreliable because downstream computation was stealing enough time that the event loop could no longer behave reliably.
- Market data became stale.
- Websocket queues backed up.
- Frontend latency increased.
- Reconnect handling degraded.
- The dashboard inherited uneven state.
Optimisation phase
The useful fixes reduced work per update.
The obvious answer would have been to rewrite hot paths in Rust or C++. Lower-level code would help in places, but the bigger issue was architectural: the system was doing too much unnecessary work.
The next stage became less about making individual functions faster and more about deciding when full precision was actually needed.
- Ignore tiny spot moves that would not move displayed volatility by even 1bp.
- Use approximation paths for small moves instead of full implied-volatility recalculation.
- Batch incoming updates so duplicate recomputation is collapsed.
- Separate ingestion, fitting, persistence, and frontend broadcasting.

Outcome
The bottleneck became systems engineering.
After the optimisation work, the system stabilised around 5,000 market-data messages per second while still maintaining live smile state, updating the surface, broadcasting frontend updates, and persisting state.
That was enough to make the MVP useful, but not enough to scale cleanly forever. More exchanges, more currencies, and more downstream analytics all pointed toward a distributed ingestion and processing model with independent workers, distributed state, asynchronous fit jobs, decoupled persistence, and scalable broadcast infrastructure.
The quant model matters. The systems engineering around the model matters just as much.
Links