Large-Language-Models

Probabilistic Report Cards: LLM Evaluation Metrics

From N-Grams to LLM-as-a-Judge: A deep dive into the evolution of evaluation metrics.

Electronic Executives: RAG, ReAct and MCP

A deep dive into the cognitive architectures of modern AI agents, exploring Retrieval-Augmented Generation (RAG), the ReAct reasoning pattern, and the Model Context Protocol (MCP).

The Need For Speed: KV Cache and memory optimization at Inference

An introduction to KV Caching and its role in optimizing Transformer inference.

Crafty Patchwork: Parameter-Efficient Fine-Tuning

An introduction to Parameter-Efficient Fine-Tuning (PEFT) techniques like LoRA, QLoRA, and more.

Attention Is All You Need, but exactly which one?: MHA, GQA and MLA

We are about to touch the holy grail of modern AI. From the original 2017 paper to DeepSeek’s MLA, how has the definition of ‘Attention’ transformed?

Semantic Alchemy: Cracking Word2Vec with CBOW and Skip-Gram

Understanding the mathematics behind Word2Vec, CBOW, and Skip-Gram and how they map language to vector space.

The DNA of Language: A Deep Dive into LLM Tokenization concepts

A comprehensive guide to tokenization strategies: BPE, WordPiece, Unigram, and SentencePiece.