Using a trained model to make predictions or generate outputs on new data. This is the "runtime" phase of AI, as opposed to training.
Inference is the process of using a trained AI model to produce outputs from new inputs. While training teaches the model, inference puts that learning to work on real-world data.
The inference process:
Key inference considerations:
Inference optimisation techniques:
Inference costs are your ongoing AI expenses. Optimising inference speed and efficiency directly impacts operational costs and user experience.
We help Australian businesses optimise inference costs and performance. Smart model selection and caching strategies can reduce AI costs by 50-80% while maintaining quality.
"When a customer asks your chatbot a question, the model performs inference to generate a response in real-time."