Skip to content

cynthiaxu04/nlp-deploy-distilbert

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deploying a Web API with a Sentiment Analysis NLP Model with Kubernetes

Motivation

This project aims to run through the production phase of the MLOps life cycle by containerizing and deploying a web API with a machine learning model with Azure Kubernetes Service (AKS).

Application Features & Deployment Infrastructure

This application was created in FastAPI, where model inputs and outputs were defined with Pydantic. Outputs are also cached in redis for repeat inputs.

The application endpoints were tested locally with pytest.

The application was containerized in a Dockerfile, using python:3.11-slim as a base image.

The application's deployment was first tested locally with minikube. The deployment infrastructure is as follows:

Application Deployment

After deploying to AKS, load testing of the application was performed with k6 and the application's traffic was monitored with grafana.

How to deploy the Application

Requirements

This app requires Azure authentication and access to the W255 organization on Azure.

AKS Deployment

  1. Authenticate to Azure with [email protected] email.
az login --tenant berkeleydatasciw255.onmicrosoft.com
  1. Authenticate to the AKS cluster.
az aks get-credentials --name w255-aks --resource-group w255 --overwrite-existing
  1. Set the context appropriately for the AKS cluster.
kubectl config use-context w255-aks

kubectl config set-context --current --namespace=cynthiaxu04
  1. Run the build-push.sh script to:
  • set the image prefix, aka the DNS normalized form of [email protected] email.
  • set the image name, project
  • set the ACR domain name, w255mids.azurecr.io
  • get the latest git commit hash, which is the image tag
  • build and push the latest docker image to ACR
  • pull the latest docker image from the ACR based on the image tag
  1. Deploy to the AKS cluster.
kubectl apply -k .k8s/overlays/prod
  1. Wait approximately 45 seconds for the pods to launch. Check that they are running and ready with the command:
kubectl get deployments

Running the Application

  1. Once deployed, copy and past the given URL into your browser of choice:
https://cynthiaxu04.w255mids.com
  1. Access the application endpoints by adding the following to the URL:
https://cynthiaxu04.w255mids.com/project/health

https://cynthiaxu04.w255mids.com/project/bulk-predict
  1. To enter input for the sentiment analysis model, open a new terminal window and use the command:
curl -X 'POST' \
  'https://cynthiaxu04.mids255.com/project/bulk-predict' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "text": ["I am awesome", "This stinks"]
}'

The expected output is:

[ 
   [  {'label': 'POSITIVE', 'score': 0.9964176416397095}, 
      {'label': 'NEGATIVE', 'score': 0.003582375356927514}]
]

Any invalid inputs will generate an error.

Monitoring Application Traffic

  1. To test the application performance, run the load test:
k6 run -e NAMESPACE=cynthiaxu04 load.js
  1. Access Grafan with the command:
kubectl port-forward -n prometheus svc/grafana 3000:3000
  1. With limits of cpu=1100m and memory=1Gi, the Grafana metrics for service and workload are:

Istio Service Workload Dashboard

Istio Deployment Workload Dashboard

P50 and P90 are less than 2 seconds. P99 is usually less than 3 seconds with occasional spikes to 4 seconds.

k6 Load Test results

25 requests/s was also achieved.