What is RLHF (Reinforcement Learning from Human Feedback)?

AI Ethics & Safety

RLHF

Reinforcement Learning from Human Feedback

AI Ethics & Safety· Advanced

Definition

A training technique where human evaluators rate model outputs and these preferences train a reward model used to fine-tune the LLM via reinforcement learning. RLHF is the primary method for aligning LLMs with human preferences and safety requirements — used by OpenAI, Anthropic, and Google.

Keep learning. Keep building.

250+ terms. 5 learning paths. AI maturity assessment. Jargon translator. All free, always.

Back to University →Request Platform Access

RLHF

Definition

Tags

Keep learning. Keep building.