Transformers from Scratch

Build a decoder-only transformer end to end - tokenisation, embeddings, feed-forward layers, layer Norms, attention heads, training loops, and inference

Part 1 of Decoder-Only Transformers from Scratch Series

Embeddings, tokenisation, vocabulary, learned vs fixed, why geometry matters

May 4, 2026