First Principles

Making the unintuitive intuitive.

First Principles is a blog series covering machine learning and distributed systems from the ground up.

Series

Build a decoder-only transformer end to end - tokenisation, embeddings, feed-forward layers, layer Norms, attention heads, training loops, and inference.

In progress

Distributed Training from Scratch

Data parallelism, model parallelism, gradient checkpointing, and the systems that make large-scale training possible.

Coming soon