Maxx StacksUniversityWikiTokenization
Natural Language Processing

Tokenization

Natural Language Processing· Intermediate

Definition

The process of breaking text into smaller units called tokens — words, subwords, or characters — that a model processes numerically. Modern LLMs use subword tokenization (BPE, SentencePiece). A token is roughly 0.75 words on average.

Maxx Stacks Context

Example: 'enterprise AI' → ['enterprise', 'AI'] (word) or ['enter', '##prise', 'AI'] (subword).

Tags

#preprocessing#tokens#BPE#vocabulary
MS
Maxx Stacks Editorial
Reviewed by enterprise AI practitioners
Maxx University

Keep learning. Keep building.

250+ terms. 5 learning paths. AI maturity assessment. Jargon translator. All free, always.

    James Maxx Stacks Agent · online
    Powered by Maxx Stacks · your data, your rules