The Architecture of Precision: Variations in model quantizations

Summary of how quantization bridges the gap between trillion-parameter models and the hardware they run on, and why ‘smaller’ is almost always ‘faster’.

April 6, 2026 · 10 min

The Need For Speed: KV Cache and memory optimization at Inference

An introduction to KV Caching and its role in optimizing Transformer inference.

January 13, 2026 · 7 min