Posts

Probabilistic Report Cards: LLM Evaluation Metrics

From N-Grams to LLM-as-a-Judge: A deep dive into the evolution of evaluation metrics.

USB-C of AI Space: The Model Context Protocol

MCP is the open standard for connecting AI models to data and tools. Discover how Anthropic’s new protocol solves the $N imes M$ integration problem, creating a plug-and-play ecosystem for AI agents.

Electronic Executives: RAG, ReAct and MCP

A deep dive into the cognitive architectures of modern AI agents, exploring Retrieval-Augmented Generation (RAG), the ReAct reasoning pattern, and the Model Context Protocol (MCP).

The Need For Speed: KV Cache and memory optimization at Inference

An introduction to KV Caching and its role in optimizing Transformer inference.

Crafty Patchwork: Parameter-Efficient Fine-Tuning

An introduction to Parameter-Efficient Fine-Tuning (PEFT) techniques like LoRA, QLoRA, and more.

Mission Impossible: Fitting Trillion-Parameter Giants into 80GB GPUs

An introduction to optimizations for Large Language Models, covering GPU utilization, precision control, and memory management.

Anatomy of Trillion-Parameter Switchboards: Understanding Feedforward Blocks

Exploring the hidden layers of trillion-parameter switchboards: Feedforward Neural Networks and Activation Functions.

Attention Is All You Need, but exactly which one?: MHA, GQA and MLA

We are about to touch the holy grail of modern AI. From the original 2017 paper to DeepSeek’s MLA, how has the definition of ‘Attention’ transformed?

The Geometry of Meaning: Sine, ALiBi, RoPE, and HoPE

From Sinusoidal to RoPE and HoPE: How Transformers learn to process word order and sequence length.

The Global Accountant and the Subword Surgeon: Decoding GloVe and FastText

How subword units and global co-occurrence matrices allow GloVe and FastText to capture nuances that Word2Vec missed.