Online inference is definitely more challenging than batch inference. Why? Due to the latency restrictions on our systems.
Online inference is about responding with a prediction to the request of the end user with a low latency.
What to optimize: latency
End user: usually interacts with a model directly available through an API
Validation: offline and online via A/B testing
Learn MLOps general concepts:
Next learn how to build and run pipelines for online serving
on Azure cloud:
- Orchestrate machine learning with pipelines on Azure
- Create Azure Machine Learning Pipeline
- Deploy real-time machine learning services with Azure Machine Learning
- Create a real-time inferencing service on Azure
- Deploy a machine learning model to Azure Functions
on AWS:
- Operationalizing machine learning pipeline on AWS
- Safe MLOps deployment pipeline on AWS
- AWS Lambda ML Model Deployment
overall:
This workshop is WIP
It will cover a real-life use case of deploying a machine learning model to Azure Functions with Python runtime and its troubleshooting.