Maxx StacksUniversityWikiRLHF
AI Ethics & Safety

RLHF

Reinforcement Learning from Human Feedback
AI Ethics & Safety· Advanced

Definition

A training technique where human evaluators rate model outputs and these preferences train a reward model used to fine-tune the LLM via reinforcement learning. RLHF is the primary method for aligning LLMs with human preferences and safety requirements — used by OpenAI, Anthropic, and Google.

Tags

#alignment#human-feedback#fine-tuning#safety
MS
Maxx Stacks Editorial
Reviewed by enterprise AI practitioners
Maxx University

Keep learning. Keep building.

250+ terms. 5 learning paths. AI maturity assessment. Jargon translator. All free, always.

    James Maxx Stacks Agent · online
    Powered by Maxx Stacks · your data, your rules