Large Language Models
KV Cache
Key-Value Cache
Large Language Models· Advanced
Definition
An inference optimization storing computed key and value attention vectors from previous tokens so they need not be recomputed for each new token generation. KV caching dramatically reduces computational cost of autoregressive generation — essential for real-time LLM applications.
Tags
#inference#optimization#attention#memory#speed
MS
Maxx Stacks Editorial
Reviewed by enterprise AI practitioners
Maxx University
Keep learning. Keep building.
250+ terms. 5 learning paths. AI maturity assessment. Jargon translator. All free, always.