Maxx StacksUniversityWikiTransformer
Neural Networks

Transformer

Neural Networks· Advanced

Definition

The dominant neural network architecture in modern AI, introduced in 'Attention Is All You Need' (2017). Uses self-attention mechanisms to process all input tokens in parallel, capturing long-range dependencies efficiently. Foundation of GPT, Claude, Gemini, and BERT.

Maxx Stacks Context

Key insight: Self-attention allows each token to relate to all other tokens simultaneously — enabling true contextual understanding.

Tags

#attention#self-attention#GPT#BERT#architecture
MS
Maxx Stacks Editorial
Reviewed by enterprise AI practitioners
Maxx University

Keep learning. Keep building.

250+ terms. 5 learning paths. AI maturity assessment. Jargon translator. All free, always.

    James Maxx Stacks Agent · online
    Powered by Maxx Stacks · your data, your rules