Learning Modules
Follow the core RLHF curriculum. Each module combines equations, intuition, analogies, and interactive visualizations.
Introduction to RLHF
What RLHF does and why it matters
Problem Setup & Context
Foundations: definitions, pipelines, and data collection
Instruction Tuning
Teach the model the chat format before RLHF
Reward Modeling
Training reward models from preference data
Regularization & KL Control
Tame over-optimisation with KL penalties and auxiliary losses
Rejection Sampling
Baseline preference filtering with reward models
Policy Gradients (PPO)
Core RL optimization techniques
Direct Preference Optimization (DPO)
Alignment without reinforcement learning
Constitutional AI & AI Feedback
Leverage AI-written constitutions to scale safe alignment data
Reasoning Training & Inference Scaling
Train RLVR models and scale inference-time computation for reasoning
Tool Use & Function Calling
Teach models to call APIs and orchestrate external tools
Advanced RLHF Topics
Synthetic data, evaluation, over-optimisation, style trade-offs, and UX