The Global Accountant and the Subword Surgeon: Decoding GloVe and FastText

Imagine trying to teach a computer the difference between “Apple” the fruit and “Apple” the company. To us, the distinction is intuitive. To a machine, it’s just a string of characters. How do we turn these strings into meaningful math? While early attempts like Word2Vec gave us a great start, they missed the forest for the trees—or in some cases, the twigs for the branches. Enter GloVe and FastText: two algorithms that revolutionized how machines understand the nuances of human language. ...

January 8, 2026 · 5 min

Semantic Alchemy: Cracking Word2Vec with CBOW and Skip-Gram

Before we had Large Language Models writing poetry, we had to teach computers that “king” and “queen” are related not just by spelling, but by meaning. This is the story of that breakthrough. It’s the moment we stopped counting words and started mapping their souls—turning raw text into a mathematical landscape where math can solve analogies. Welcome to the world of Word2Vec. 🔮 Language models require vector representations of words to capture semantic relationships. Before the 2010s, models used word count-based vector representations that captured only the frequency of words (e.g., One-Hot Encoding). The Problems: 🚧 ...

January 7, 2026 · 12 min

The DNA of Language: A Deep Dive into LLM Tokenization concepts

Imagine you have to build a house. You cannot build a stable house using only massive boulders as walls (too big), nor can you build one using only tiny pebbles (too small). You need exactly the right-sized bricks. The same analogy applies to linguistics. We need to find strategies to break down petabytes of language data into usable, atomic chunks. In the context of Large Language Models (LLMs), these bricks are called tokens. Tokens enable us to transform a sizable amount of fluid language data into a discrete mathematical language that machines can process. It is the invisible filter at the heart of LLMs through which every prompt is passed and every response is born. ...

January 5, 2026 · 14 min