Online serving (near real-time)

Online inference is definitely more challenging than batch inference. Why? Due to the latency restrictions on our systems.

Online inference is about responding with a prediction to the request of the end user with a low latency.

What to optimize: latency

End user: usually interacts with a model directly available through an API

Validation: offline and online via A/B testing

Where to start

Learn MLOps general concepts:

Next learn how to build and run pipelines for online serving

on Azure cloud:

on AWS:

overall:

This workshop is WIP

It will cover a real-life use case of deploying a machine learning model to Azure Functions with Python runtime and its troubleshooting.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md