What is Quantization?

Maxx Stacks›University›Wiki›Quantization

Large Language Models

Quantization

Large Language Models· Advanced

Definition

Reducing the numerical precision of model weights from 32-bit floating point to lower precision formats (16-bit, 8-bit, 4-bit). Quantization drastically reduces model memory footprint and speeds inference with minimal quality loss — critical for deploying large models on consumer hardware or at scale.

Keep learning. Keep building.

250+ terms. 5 learning paths. AI maturity assessment. Jargon translator. All free, always.

Back to University →Request Platform Access

Quantization

Definition

Tags

Keep learning. Keep building.