The time delay between sending a request and receiving a response from an AI system. Critical for real-time applications.
Latency in AI systems measures the time from when a request is sent to when a response is fully received. For user-facing applications, latency directly impacts user experience and satisfaction.
Latency components:
Factors affecting latency:
Latency benchmarks:
Optimisation strategies:
Users expect responses in under 2 seconds. High latency frustrates customers and reduces AI adoption. Optimise with caching and model selection.
We optimise latency for Australian business deployments, using appropriate models, caching, and streaming to ensure responsive user experiences.
"A chatbot with 500ms latency feels instant; 5 seconds feels broken. The difference significantly impacts user satisfaction and adoption."