Machine Learning in Production

Serving ML/DL Models

In most of the use cases, RESTful API is always the preference of deploying machine learning models. In this section, I will describe some popular approaches.

Clipper

Clipper is a low-latency prediction serving system for machine learning. Clipper makes it simple to integrate machine learning into user-facing serving systems.

GraphPipe

GraphPipe is a protocol and collection of software designed to simplify machine learning model deployment and decouple it from framework-specific model implementations.

NVIDIA TensorRT Inference Server

TensorRT Inference Server (TRTIS) provides a cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or gRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server.

Use Nginx Proxy to Enable Multi-services

See README.md Upgrade Django Built-in Server section.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ML_IN_PRODUCTION.md

ML_IN_PRODUCTION.md

Machine Learning in Production

Serving ML/DL Models

Clipper

GraphPipe

NVIDIA TensorRT Inference Server

Use Nginx Proxy to Enable Multi-services

Files

ML_IN_PRODUCTION.md

Latest commit

History

ML_IN_PRODUCTION.md

File metadata and controls

Machine Learning in Production

Serving ML/DL Models

Clipper

GraphPipe

NVIDIA TensorRT Inference Server

Use Nginx Proxy to Enable Multi-services