Probabilistic Report Cards: LLM Evaluation Metrics
From N-Grams to LLM-as-a-Judge: A deep dive into the evolution of evaluation metrics.
USB-C of AI Space: The Model Context Protocol
MCP is the open standard for connecting AI models to data and tools. Discover how Anthropic’s new protocol solves the $N imes M$ integration problem, creating a plug-and-play ecosystem for AI agents.
Electronic Executives: RAG, ReAct and MCP
A deep dive into the cognitive architectures of modern AI agents, exploring Retrieval-Augmented Generation (RAG), the ReAct reasoning pattern, and the Model Context Protocol (MCP).
The Need For Speed: KV Cache and memory optimization at Inference
An introduction to KV Caching and its role in optimizing Transformer inference.
Crafty Patchwork: Parameter-Efficient Fine-Tuning
An introduction to Parameter-Efficient Fine-Tuning (PEFT) techniques like LoRA, QLoRA, and more.
Mission Impossible: Fitting Trillion-Parameter Giants into 80GB GPUs
An introduction to optimizations for Large Language Models, covering GPU utilization, precision control, and memory management.
Anatomy of Trillion-Parameter Switchboards: Understanding Feedforward Blocks
Exploring the hidden layers of trillion-parameter switchboards: Feedforward Neural Networks and Activation Functions.
Attention Is All You Need, but exactly which one?: MHA, GQA and MLA
We are about to touch the holy grail of modern AI. From the original 2017 paper to DeepSeek’s MLA, how has the definition of ‘Attention’ transformed?
The Geometry of Meaning: Sine, ALiBi, RoPE, and HoPE
From Sinusoidal to RoPE and HoPE: How Transformers learn to process word order and sequence length.
The Global Accountant and the Subword Surgeon: Decoding GloVe and FastText
How subword units and global co-occurrence matrices allow GloVe and FastText to capture nuances that Word2Vec missed.