Giles' blog
Automating starting Lambda Labs instances
Writing an LLM from scratch, part 32g -- Interventions: weight tying
Writing an LLM from scratch, part 32f -- Interventions: weight decay
Writing an LLM from scratch, part 32e -- Interventions: the learning rate
Writing an LLM from scratch, part 32d -- Interventions: adding attention bias
Writing an LLM from scratch, part 32c -- Interventions: removing dropout
Writing an LLM from scratch, part 32b -- Interventions: gradient clipping
Writing an LLM from scratch, part 32a -- Interventions: training a baseline model
Getting a custom PyTorch LLM onto the Hugging Face Hub (Transformers: AutoModel, pipeline, and Trainer)
Writing an LLM from scratch, part 31 -- the models are now on Hugging Face
Writing an LLM from scratch, part 30 -- digging into the LLM-as-a-judge results
Writing an LLM from scratch, part 30 -- digging into the LLM-as-a-judge results
Writing an LLM from scratch, part 29 -- using DistributedDataParallel to train a base model from scratch in the cloud
Writing an LLM from scratch, part 28 -- training a base model from scratch on an RTX 3090
Why smart instruction-following makes prompt injection easier
Writing an LLM from scratch, part 27 -- what's left, and what's next?
Writing an LLM from scratch, part 26 -- evaluating the fine-tuned model
Writing an LLM from scratch, part 25 -- instruction fine-tuning
Writing an LLM from scratch, part 24 -- the transcript hack
A classifier using Qwen3
Retro Language Models: Rebuilding Karpathy’s RNN in PyTorch
Writing an LLM from scratch, part 23 -- fine-tuning for classification
Writing an LLM from scratch, part 22 -- finally training our LLM!
Revisiting Karpathy’s 'The Unreasonable Effectiveness of Recurrent Neural Networks'
Writing an LLM from scratch, part 21 -- perplexed by perplexity
Writing an LLM from scratch, part 20 -- starting training, and cross entropy loss
How do LLMs work?
An addendum to 'the maths you need to start understanding LLMs'
The maths you need to start understanding LLMs
What AI chatbots are actually doing under the hood
Writing an LLM from scratch, part 19 -- wrapping up Chapter 4
Writing an LLM from scratch, part 18 -- residuals, shortcut connections, and the Talmud
The fixed length bottleneck and the feed forward network
Writing an LLM from scratch, part 17 -- the feed-forward network
Writing an LLM from scratch, part 16 -- layer normalisation