Attention Is All You Need, but exactly which one?: MHA, GQA and MLA

We are about to touch the holy grail of modern AI. From the original 2017 paper to DeepSeek’s MLA, how has the definition of ‘Attention’ transformed?

January 10, 2026 · 7 min

The Global Accountant and the Subword Surgeon: Decoding GloVe and FastText

How subword units and global co-occurrence matrices allow GloVe and FastText to capture nuances that Word2Vec missed.

January 8, 2026 · 5 min

Semantic Alchemy: Cracking Word2Vec with CBOW and Skip-Gram

Understanding the mathematics behind Word2Vec, CBOW, and Skip-Gram and how they map language to vector space.

January 7, 2026 · 12 min

The DNA of Language: A Deep Dive into LLM Tokenization concepts

A comprehensive guide to tokenization strategies: BPE, WordPiece, Unigram, and SentencePiece.

January 5, 2026 · 14 min