Attention Is All You Need, but exactly which one?: MHA, GQA and MLA
We are about to touch the holy grail of modern AI. From the original 2017 paper to DeepSeek’s MLA, how has the definition of ‘Attention’ transformed?
We are about to touch the holy grail of modern AI. From the original 2017 paper to DeepSeek’s MLA, how has the definition of ‘Attention’ transformed?
How subword units and global co-occurrence matrices allow GloVe and FastText to capture nuances that Word2Vec missed.
Understanding the mathematics behind Word2Vec, CBOW, and Skip-Gram and how they map language to vector space.
A comprehensive guide to tokenization strategies: BPE, WordPiece, Unigram, and SentencePiece.