Image missing.
LLM from scratch, part 28 – training a base model from scratch on an RTX 3090

created: Dec. 2, 2025, 6:17 p.m. | updated: Dec. 9, 2025, 5:36 p.m.

This report says in the "Replicating GPT-2" section that OpenAI trained it for 800k iterations with a batch size of 512. Tokens per second: 10,005 Testing with batch size 2 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:17<00:00, 5.60it/s] Done, trained on 204,800 tokens in 17.8631s. And it looks like a batch size of six is what we can fit into the RTX 3090's 24 GiB of VRAM. Training loss is a bit choppy, but that's because I erroneously only plotted the most recent iteration's training loss rather than an average over all iterations between the last and current validation run; the validation loss is correct because I did average all of the validation numbers. In the book, Raschka says that this is not normally done these days, which is why I didn't do it for this base model train.

1 week, 2 days ago: Hacker News