Maxx StacksUniversityWikiLatency
AI Ops & Deployment

Latency

AI Ops & Deployment· Foundational

Definition

The time delay between submitting a request to an AI model and receiving the response. Critical performance metric for production AI systems. Impacted by model size, infrastructure, and generation length. Key sub-metrics: Time-to-first-token (TTFT) and tokens-per-second (TPS).

Tags

#performance#TTFT#TPS#response-time
MS
Maxx Stacks Editorial
Reviewed by enterprise AI practitioners
Maxx University

Keep learning. Keep building.

250+ terms. 5 learning paths. AI maturity assessment. Jargon translator. All free, always.

    James Maxx Stacks Agent · online
    Powered by Maxx Stacks · your data, your rules