Skip to main content
Advanced Concepts

Constitutional AI & AI Feedback

Build synthetic critique and preference data with constitutions, compare AI vs human feedback, and inspect iteration workflows.

Estimated time
35 minutes
Difficulty
advanced
Prerequisites
2 module(s)
Equation

Constitutional Feedback Pipeline

Chapter 13 models Constitutional AI as two data-generation loops governed by a written constitution C\mathcal{C}. Each loop keeps the instruction prompting intact while substituting an LLM critic or judge in place of a human labeler.

Instruction Loop:y0=πdraft(x),for i=0..n1:ciC,  yi+1=Revise(x,yi,ci)DinstrDinstr{(x,yn)}Preference Loop:cC,  (yA,yB)Drlhfr=Judge(x,yA,yB,c),  DprefDpref{(x,yA,yB,r)}\begin{aligned} \text{Instruction Loop:}\quad &y_0 = \pi_{\text{draft}}(x),\\ &\text{for } i=0..n-1:\quad c_i \sim \mathcal{C},\; y_{i+1} = \text{Revise}(x, y_i, c_i)\\ &\mathcal{D}_{\text{instr}} \leftarrow \mathcal{D}_{\text{instr}} \cup \{(x, y_n)\}\\[4pt] \text{Preference Loop:}\quad &c \sim \mathcal{C},\; (y^A, y^B) \sim \mathcal{D}_{\text{rlhf}}\\ &r = \text{Judge}(x, y^A, y^B, c),\; \mathcal{D}_{\text{pref}} \leftarrow \mathcal{D}_{\text{pref}} \cup \{(x, y^A, y^B, r)\} \end{aligned}

The chapter stresses that these loops are cheap to run (sub-cent per judgement) but can import the biases of the source model and constitution. Blend principles and human audits to control drift.

Intuition

Why AI Feedback Works

Constitutional AI is a specific instance of RL from AI Feedback (RLAIF). We swap costly human critiques for a calibrated rubric applied by a large model. Because the rubric is public and repeatable, data quality is higher (low noise) even if it carries the constitution's bias. Chapter 13 encourages mixing a modest slice of human checks to keep that bias visible while scaling experimentation.

Think of the process as writing a playbook for your future reviewers. Once the playbook is in place, the same base model can self-critique, generate new instruction data, and produce preference comparisons for reward-model training. The playbook can evolve - expanding the constitution or swapping the critic model - without resetting the entire pipeline.

Claude, ChatGPT, Llama 2/3, and Nemotron all use constitution-style prompts or critic models to expand their safety data, confirming the approach scales to production systems.

Analogy

Analogy: Editorial Board Charter

In a newsroom, an editorial charter states what counts as publishable. Junior writers revise their drafts until senior editors sign off. Constitutional AI mirrors this dynamic with an LLM filling the role of the editor.

Editorial charter

A newsroom agrees on a charter so every editor applies the same lens before publishing. Chapter 13 uses written principles the same way.

Mock debate judges

Critics score practice debates using a rubric. Constitutional AI has an LLM judge samples with the constitution as its rubric.

Visualization

Constitution Lab

Use the tools below to assemble a constitution, compare AI versus human labelling costs, and size the critique loop you need before launching a self-improvement run.

Constitution workshop

Assemble a constitution, then watch how Chapter 13's critique-and-revise loop rewrites a draft answer.

Parameters

Choose scenario

Constitution principles

Interactive visualization

AI vs human feedback trade-offs

Estimate cost, turnaround, and risk when mixing Chapter 13's synthetic feedback with human annotations.

Parameters

Interactive visualization

Self-improvement iteration lab

Project how many critique iterations and principles you need for Chapter 13 style self-training.

Parameters

Interactive visualization
Takeaways

Operational Notes

  • Keep your constitution explicit: the chapter's examples use 8-16 short principles covering harmlessness, honesty, and helpfulness.
  • Blend data: synthetic critiques give scale, but retain periodic human audits to surface bias.
  • Track iterations: 2-4 critique passes per prompt usually reach parity with human review.
  • Version constitutions alongside model checkpoints so you can explain policy shifts.
  • When exporting datasets, annotate which critic model and constitution revision produced each sample.
Self-check

Constitutional AI Check

Confirm the workflow, cost trade-offs, and practical guardrails introduced in Chapter 13.

Answered 0/5 · Correct 0/5

  1. 1

    What distinction does Chapter 13 draw between human preference data and AI feedback?

  2. 2

    In the instruction data workflow Bai et al. describe, what happens after sampling a principle c_i?

  3. 3

    Why does Chapter 13 highlight AI-written feedback as a lever for experimentation?

  4. 4

    Which models does the chapter cite as early adopters of Constitutional AI?

  5. 5

    According to Chapter 13, what is a recommended mitigation when heavy AI feedback introduces constitution bias?