Skip to main content

Enhanced Features - Chapter 10

Concept Playground

Experiment with RLHF training strategies using the pre-built scenarios derived from Chapters 10–12 of the RLHF book. Adjust parameters, compare methods, and record runs for a lightweight session log.

How to use

  1. Select a scenario tab to load its interactive simulation.
  2. Follow the guided experiment steps and observe the expected signals.
  3. Click Record run to log a configuration for later comparison.

Rejection sampling playground

Simulate Chapter 10's baseline: generate N completions per prompt, score them with a reward model, then select the best to finetune.

Parameters

Interactive visualization

Method comparison snapshot

Side-by-side summary of the most recent reading for each method. Values are normalised to make quick trade-off checks across quality, cost, and stability.

MethodQuality proxyCost proxyStability proxyLast note
Rejection Sampling Baseline
Chapter 10
Interact with the scenario to populate metrics.
PPO Policy Update
Chapter 11
Interact with the scenario to populate metrics.
DPO Weighting
Chapter 12
Interact with the scenario to populate metrics.

Session log

A lightweight record of the runs you captured this session (clears on refresh).

No runs captured yet.

Performance summary

Aggregated signals from this session, per scenario. Use it as a quick retrospective before you move on.

Rejection Sampling Baseline

No runs recorded yet.

PPO Policy Update

No runs recorded yet.

DPO Weighting

No runs recorded yet.