First Principles
Home
About
Transformers from Scratch
Build a decoder-only transformer end to end - tokenisation, embeddings, feed-forward layers, layer Norms, attention heads, training loops, and inference
Part 1 of Decoder-Only Transformers from Scratch Series
Embeddings, tokenisation, vocabulary, learned vs fixed, why geometry matters
May 4, 2026
No matching items