AI Ops & Deployment
Latency
AI Ops & Deployment· Foundational
Definition
The time delay between submitting a request to an AI model and receiving the response. Critical performance metric for production AI systems. Impacted by model size, infrastructure, and generation length. Key sub-metrics: Time-to-first-token (TTFT) and tokens-per-second (TPS).
Tags
#performance#TTFT#TPS#response-time
MS
Maxx Stacks Editorial
Reviewed by enterprise AI practitioners
Maxx University
Keep learning. Keep building.
250+ terms. 5 learning paths. AI maturity assessment. Jargon translator. All free, always.