Multi-Head Attention

Neural Networks· Advanced

Definition

The mechanism at the core of transformer models that allows the model to jointly attend to information from different representation subspaces at different positions. Multiple attention 'heads' run in parallel, each learning to focus on different aspects of the input — syntax, semantics, long-range dependencies.

Enterprise Context

Multi-head attention is what gives LLMs their power to understand complex relationships in text. More heads generally improve model capability at the cost of computation.

Keep learning. Keep building.

250+ terms. 5 learning paths. AI maturity assessment. Jargon translator. All free, always.

Back to University →Request Platform Access

Multi-Head Attention

Definition

Enterprise Context

Tags

Keep learning. Keep building.