Maxx StacksUniversityWikiMulti-Head Attention
Neural Networks

Multi-Head Attention

Neural Networks· Advanced

Definition

The mechanism at the core of transformer models that allows the model to jointly attend to information from different representation subspaces at different positions. Multiple attention 'heads' run in parallel, each learning to focus on different aspects of the input — syntax, semantics, long-range dependencies.

Enterprise Context

Multi-head attention is what gives LLMs their power to understand complex relationships in text. More heads generally improve model capability at the cost of computation.

Tags

#transformer#attention#architecture
MS
Maxx Stacks Editorial
Reviewed by enterprise AI practitioners
Maxx University

Keep learning. Keep building.

250+ terms. 5 learning paths. AI maturity assessment. Jargon translator. All free, always.

    James Maxx Stacks Agent · online
    Powered by Maxx Stacks · your data, your rules