AI Ethics & Safety
RLHF
Reinforcement Learning from Human Feedback
AI Ethics & Safety· Advanced
Definition
A training technique where human evaluators rate model outputs and these preferences train a reward model used to fine-tune the LLM via reinforcement learning. RLHF is the primary method for aligning LLMs with human preferences and safety requirements — used by OpenAI, Anthropic, and Google.
Tags
#alignment#human-feedback#fine-tuning#safety
MS
Maxx Stacks Editorial
Reviewed by enterprise AI practitioners
Maxx University
Keep learning. Keep building.
250+ terms. 5 learning paths. AI maturity assessment. Jargon translator. All free, always.