Giles' blog

follow: @[email protected]

Posts

Automating starting Lambda Labs instances

Writing an LLM from scratch, part 32g -- Interventions: weight tying

Writing an LLM from scratch, part 32f -- Interventions: weight decay

Writing an LLM from scratch, part 32e -- Interventions: the learning rate

Writing an LLM from scratch, part 32d -- Interventions: adding attention bias

Writing an LLM from scratch, part 32c -- Interventions: removing dropout

Writing an LLM from scratch, part 32b -- Interventions: gradient clipping

Writing an LLM from scratch, part 32a -- Interventions: training a baseline model

Getting a custom PyTorch LLM onto the Hugging Face Hub (Transformers: AutoModel, pipeline, and Trainer)

Writing an LLM from scratch, part 31 -- the models are now on Hugging Face

Writing an LLM from scratch, part 30 -- digging into the LLM-as-a-judge results

Writing an LLM from scratch, part 30 -- digging into the LLM-as-a-judge results

Writing an LLM from scratch, part 29 -- using DistributedDataParallel to train a base model from scratch in the cloud

Writing an LLM from scratch, part 28 -- training a base model from scratch on an RTX 3090

Why smart instruction-following makes prompt injection easier

Writing an LLM from scratch, part 27 -- what's left, and what's next?

Writing an LLM from scratch, part 26 -- evaluating the fine-tuned model

Writing an LLM from scratch, part 25 -- instruction fine-tuning

Writing an LLM from scratch, part 24 -- the transcript hack

A classifier using Qwen3

Retro Language Models: Rebuilding Karpathy’s RNN in PyTorch

Writing an LLM from scratch, part 23 -- fine-tuning for classification

Writing an LLM from scratch, part 22 -- finally training our LLM!

Revisiting Karpathy’s 'The Unreasonable Effectiveness of Recurrent Neural Networks'

Writing an LLM from scratch, part 21 -- perplexed by perplexity

Writing an LLM from scratch, part 20 -- starting training, and cross entropy loss

How do LLMs work?

An addendum to 'the maths you need to start understanding LLMs'

The maths you need to start understanding LLMs

What AI chatbots are actually doing under the hood

Writing an LLM from scratch, part 19 -- wrapping up Chapter 4

Writing an LLM from scratch, part 18 -- residuals, shortcut connections, and the Talmud

The fixed length bottleneck and the feed forward network

Writing an LLM from scratch, part 17 -- the feed-forward network

Writing an LLM from scratch, part 16 -- layer normalisation