First Principles
Making the unintuitive intuitive.
First Principles is a blog series covering machine learning and distributed systems from the ground up.
Series
Transformers from Scratch
Build a decoder-only transformer end to end - tokenisation, embeddings, feed-forward layers, layer Norms, attention heads, training loops, and inference.
In progress
Distributed Training from Scratch
Data parallelism, model parallelism, gradient checkpointing, and the systems that make large-scale training possible.
Coming soon