In most of the use cases, RESTful API
is always the preference of deploying machine learning models. In this section, I will describe some popular approaches.
Clipper is a low-latency prediction serving system for machine learning. Clipper makes it simple to integrate machine learning into user-facing serving systems.
GraphPipe is a protocol and collection of software designed to simplify machine learning model deployment and decouple it from framework-specific model implementations.
TensorRT Inference Server (TRTIS) provides a cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or gRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server.
See README.md Upgrade Django Built-in Server section.