This project aims to run through the production phase of the MLOps life cycle by containerizing and deploying a web API with a machine learning model with Azure Kubernetes Service (AKS).
This application was created in FastAPI, where model inputs and outputs were defined with Pydantic. Outputs are also cached in redis for repeat inputs.
The application endpoints were tested locally with pytest.
The application was containerized in a Dockerfile, using python:3.11-slim
as a base image.
The application's deployment was first tested locally with minikube. The deployment infrastructure is as follows:
After deploying to AKS, load testing of the application was performed with k6 and the application's traffic was monitored with grafana.
This app requires Azure authentication and access to the W255 organization on Azure.
- Authenticate to Azure with
[email protected]
email.
az login --tenant berkeleydatasciw255.onmicrosoft.com
- Authenticate to the AKS cluster.
az aks get-credentials --name w255-aks --resource-group w255 --overwrite-existing
- Set the context appropriately for the AKS cluster.
kubectl config use-context w255-aks
kubectl config set-context --current --namespace=cynthiaxu04
- Run the
build-push.sh
script to:
- set the image prefix, aka the DNS normalized form of
[email protected]
email. - set the image name,
project
- set the ACR domain name,
w255mids.azurecr.io
- get the latest git commit hash, which is the image tag
- build and push the latest docker image to ACR
- pull the latest docker image from the ACR based on the image tag
- Deploy to the AKS cluster.
kubectl apply -k .k8s/overlays/prod
- Wait approximately 45 seconds for the pods to launch. Check that they are running and ready with the command:
kubectl get deployments
- Once deployed, copy and past the given URL into your browser of choice:
https://cynthiaxu04.w255mids.com
- Access the application endpoints by adding the following to the URL:
https://cynthiaxu04.w255mids.com/project/health
https://cynthiaxu04.w255mids.com/project/bulk-predict
- To enter input for the sentiment analysis model, open a new terminal window and use the command:
curl -X 'POST' \
'https://cynthiaxu04.mids255.com/project/bulk-predict' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"text": ["I am awesome", "This stinks"]
}'
The expected output is:
[
[ {'label': 'POSITIVE', 'score': 0.9964176416397095},
{'label': 'NEGATIVE', 'score': 0.003582375356927514}]
]
Any invalid inputs will generate an error.
- To test the application performance, run the load test:
k6 run -e NAMESPACE=cynthiaxu04 load.js
- Access Grafan with the command:
kubectl port-forward -n prometheus svc/grafana 3000:3000
- With limits of cpu=1100m and memory=1Gi, the Grafana metrics for service and workload are:
P50 and P90 are less than 2 seconds. P99 is usually less than 3 seconds with occasional spikes to 4 seconds.
25 requests/s was also achieved.