Presented at UWORCS 2025, Western University.
This talk provides an accessible end-to-end walkthrough of modern large language models—from raw text pretraining to practical deployment. The goal is to demystify the engineering and research decisions behind models like GPT and LLaMA.
Topics Covered
- Tokenization & vocabulary design — BPE, SentencePiece, and the trade-offs involved
- Pretraining objectives — next-token prediction, masked language modeling, and why they work
- Scaling laws — the Chinchilla compute-optimal regime and what it means for model design
- Instruction tuning — supervised fine-tuning (SFT) on curated demonstrations
- Alignment via RLHF — reward modeling, PPO, and DPO as a simpler alternative
- Practical considerations — mixed precision, gradient checkpointing, and distributed training
