What is Tokenization?

Maxx Stacks›University›Wiki›Tokenization

Natural Language Processing

Tokenization

Natural Language Processing· Intermediate

Definition

The process of breaking text into smaller units called tokens — words, subwords, or characters — that a model processes numerically. Modern LLMs use subword tokenization (BPE, SentencePiece). A token is roughly 0.75 words on average.

Maxx Stacks Context

Example: 'enterprise AI' → ['enterprise', 'AI'] (word) or ['enter', '##prise', 'AI'] (subword).

Keep learning. Keep building.

250+ terms. 5 learning paths. AI maturity assessment. Jargon translator. All free, always.

Back to University →Request Platform Access

Tokenization

Definition

Maxx Stacks Context

Tags

Keep learning. Keep building.