Maxx StacksUniversityWikiQuantization
Large Language Models

Quantization

Large Language Models· Advanced

Definition

Reducing the numerical precision of model weights from 32-bit floating point to lower precision formats (16-bit, 8-bit, 4-bit). Quantization drastically reduces model memory footprint and speeds inference with minimal quality loss — critical for deploying large models on consumer hardware or at scale.

Tags

#compression#efficiency#inference#hardware#memory
MS
Maxx Stacks Editorial
Reviewed by enterprise AI practitioners
Maxx University

Keep learning. Keep building.

250+ terms. 5 learning paths. AI maturity assessment. Jargon translator. All free, always.

    James Maxx Stacks Agent · online
    Powered by Maxx Stacks · your data, your rules