Maxx StacksUniversityWikiKV Cache
Large Language Models

KV Cache

Key-Value Cache
Large Language Models· Advanced

Definition

An inference optimization storing computed key and value attention vectors from previous tokens so they need not be recomputed for each new token generation. KV caching dramatically reduces computational cost of autoregressive generation — essential for real-time LLM applications.

Tags

#inference#optimization#attention#memory#speed
MS
Maxx Stacks Editorial
Reviewed by enterprise AI practitioners
Maxx University

Keep learning. Keep building.

250+ terms. 5 learning paths. AI maturity assessment. Jargon translator. All free, always.

    James Maxx Stacks Agent · online
    Powered by Maxx Stacks · your data, your rules