Neural Networks
Multi-Head Attention
Neural Networks· Advanced
Definition
The mechanism at the core of transformer models that allows the model to jointly attend to information from different representation subspaces at different positions. Multiple attention 'heads' run in parallel, each learning to focus on different aspects of the input — syntax, semantics, long-range dependencies.
Enterprise Context
Multi-head attention is what gives LLMs their power to understand complex relationships in text. More heads generally improve model capability at the cost of computation.
Tags
#transformer#attention#architecture
MS
Maxx Stacks Editorial
Reviewed by enterprise AI practitioners
Maxx University
Keep learning. Keep building.
250+ terms. 5 learning paths. AI maturity assessment. Jargon translator. All free, always.