Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Daya #2

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 12 additions & 4 deletions .github/workflows/cicd.yml
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
name: ci-cd

on: pull_request, push
# run the action on pull_requests and pushes
on: [pull_request, push]

jobs:
# first job to test the application using pytest
build:
runs-on: ubuntu-latest
runs-on: ubuntu-latest # choose the OS for running the action
# define the individual sequential steps to be run
steps:
- name: Checkout the repository
uses: actions/checkout@v2
Expand All @@ -18,11 +21,16 @@ jobs:
- name: Run pytest
run: |
pytest


# second job to zip the codebase and upload it as an artifact when build succeeds
upload_zip:
runs-on: ubuntu-latest
runs-on: ubuntu-latest # choose the OS for running the action
needs: build

# only run this action for pushes
if: ${{ github.event_name == 'push' }}

# define the individual sequential steps to be run
steps:
- name: Checkout the repository
uses: actions/checkout@v2
Expand Down
20 changes: 20 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# ML-Ops Demo/Assignment

This repository contains code which demonstrates ML-Ops using a `FastAPI` application which predicts the flower class using the IRIS dataset (https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html)

## Running Instructions
- Create a fork of the repo using the `fork` button.
- Clone your fork using `git clone https://www.github.com/<your-username>/mlops-iris.git`
- Install dependencies using `pip3 install -r requirements.txt`
- Run application using `python3 main.py`
- Run tests using `pytest`

## CI/CD
- `build` (test) for all the pull requests
- `build` (test) and `upload_zip` for all pushes

## Assignment Tasks
1. Change this README to add your name here: <Dayakar Kodirekka>. Add and commit changes to a new branch and create a pull request ONLY TO YOUR OWN FORK to see the CI/CD build happening. If the build succeeds, merge the pull request with master and see the CI/CD `upload_zip` take place.
2. Add 2 more unit tests of your choice to `test_app.py` and make sure they are passing.
3. Add one more classifier to startup and use only the one with better accuracy.
4. Add the attribute `timestamp` to the response and return the current time with it.
93 changes: 83 additions & 10 deletions main.py
Original file line number Diff line number Diff line change
@@ -1,36 +1,109 @@
import uvicorn
from fastapi import FastAPI
from pydantic import BaseModel
from ml_utils import load_model, predict
from ml_utils import load_model, predict, retrain, load_model_r,predict_r, retrain_r
from typing import List

app = FastAPI(
title="Iris Predictor",
docs_url="/"
)
# defining the main app
app = FastAPI(title="Iris Predictor", docs_url="/")

# calling the load_model during startup.
# this will train the model and keep it loaded for prediction.
app.add_event_handler("startup", load_model)

# class which is expected in the payload
class QueryIn(BaseModel):
sepal_length: float
sepal_width: float
petal_length: float
petal_width: float


# class which is returned in the response
class QueryOut(BaseModel):
flower_class: str

# class which is expected in the payload while re-training
class FeedbackIn(BaseModel):
sepal_length: float
sepal_width: float
petal_length: float
petal_width: float
flower_class: str

# Route definitions
@app.get("/ping")
# Healthcheck route to ensure that the API is up and running
def ping():
return {"ping": "pong"}


@app.post("/predict_flower", response_model=QueryOut, status_code=200)
def predict_flower(
query_data: QueryIn
):
output = {'flower_class': predict(query_data)}
# Route to do the prediction using the ML model defined.
# Payload: QueryIn containing the parameters
# Response: QueryOut containing the flower_class predicted (200)
def predict_flower(query_data: QueryIn):
output = {"flower_class": predict(query_data)}
return output

@app.post("/feedback_loop", status_code=200)
# Route to further train the model based on user input in form of feedback loop
# Payload: FeedbackIn containing the parameters and correct flower class
# Response: Dict with detail confirming success (200)
def feedback_loop(data: List[FeedbackIn]):
retrain(data)
return {"detail": "Feedback loop successful"}

# calling the load_model during startup. Random Forest
# this will train the model and keep it loaded for prediction.
app.add_event_handler("startup", load_model_r)

# class which is expected in the payload
class QueryIn(BaseModel):
sepal_length: float
sepal_width: float
petal_length: float
petal_width: float


# class which is returned in the response
class QueryOut(BaseModel):
flower_class: str

# class which is expected in the payload while re-training
class FeedbackIn(BaseModel):
sepal_length: float
sepal_width: float
petal_length: float
petal_width: float
flower_class: str

# Route definitions
@app.get("/ping")
# Healthcheck route to ensure that the API is up and running
def ping():
return {"ping": "pong"}


@app.post("/predict_flower_r", response_model=QueryOut, status_code=200)
# Route to do the prediction using the ML model defined.
# Payload: QueryIn containing the parameters
# Response: QueryOut containing the flower_class predicted (200)
def predict_flower_r(query_data: QueryIn):
output = {"flower_class": predict_r(query_data)}
return output

@app.post("/feedback_loop_r", status_code=200)
# Route to further train the model based on user input in form of feedback loop
# Payload: FeedbackIn containing the parameters and correct flower class
# Response: Dict with detail confirming success (200)
def feedback_loop_r(data: List[FeedbackIn]):
retrain_r(data)
return {"detail": "Feedback loop successful"}



# Main function to start the app when main.py is called
if __name__ == "__main__":
uvicorn.run("main:app", host='0.0.0.0', port=8888, reload=True)
# Uvicorn is used to run the server and listen for incoming API requests on 0.0.0.0:8888
uvicorn.run("main:app", host="0.0.0.0", port=8888, reload=True)
79 changes: 65 additions & 14 deletions ml_utils.py
Original file line number Diff line number Diff line change
@@ -1,31 +1,82 @@
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# define a Gaussain NB classifier
clf = GaussianNB()

classes = {
0: "Iris Setosa",
1: "Iris Versicolour",
2: "Iris Virginica"
}
# define the class encodings and reverse encodings
classes = {0: "Iris Setosa", 1: "Iris Versicolour", 2: "Iris Virginica"}
r_classes = {y: x for x, y in classes.items()}

# function to train and load the model during startup
def load_model():
X, y = datasets.load_iris(return_X_y=True)
# load the dataset from the official sklearn datasets
X, y = datasets.load_iris(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2)
clf.fit(X_train, y_train)
# do the test-train split and train the model
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
clf.fit(X_train, y_train)

acc = accuracy_score(y_test, clf.predict(X_test))
print(f"Model trained with accuracy: {round(acc, 3)}")
# calculate the print the accuracy score
acc = accuracy_score(y_test, clf.predict(X_test))
print(f"Model trained with accuracy (Gaussian): {round(acc, 3)}")


# function to predict the flower using the model
def predict(query_data):
x = list(query_data.dict().values())
prediction = clf.predict([x])[0]
print(f"Model prediction: {classes[prediction]}")
return classes[prediction]
x = list(query_data.dict().values())
prediction = clf.predict([x])[0]
print(f"Model prediction: {classes[prediction]}")
return classes[prediction]

# function to retrain the model as part of the feedback loop
def retrain(data):
# pull out the relevant X and y from the FeedbackIn object
X = [list(d.dict().values())[:-1] for d in data]
y = [r_classes[d.flower_class] for d in data]

# fit the classifier again based on the new data obtained
clf.fit(X, y)


# Addding the Random Forest Classifier to the model

# define a Random Forest classifier
clf_r = RandomForestClassifier()

# define the class encodings and reverse encodings
classes = {0: "Iris Setosa", 1: "Iris Versicolour", 2: "Iris Virginica"}
r_classes = {y: x for x, y in classes.items()}

# function to train and load the model during startup
def load_model_r():
# load the dataset from the official sklearn datasets
X, y = datasets.load_iris(return_X_y=True)

# do the test-train split and train the model
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
clf_r.fit(X_train, y_train)

# calculate the print the accuracy score
acc_r = accuracy_score(y_test, clf_r.predict(X_test))
print(f"Model trained with accuracy (Random Forest): {round(acc_r, 3)}")


# function to predict the flower using the model
def predict_r(query_data):
x = list(query_data.dict().values())
prediction = clf_r.predict([x])[0]
print(f"Model prediction: {classes[prediction]}")
return classes[prediction]

# function to retrain the model as part of the feedback loop
def retrain_r(data):
# pull out the relevant X and y from the FeedbackIn object
X = [list(d.dict().values())[:-1] for d in data]
y = [r_classes[d.flower_class] for d in data]

# fit the classifier again based on the new data obtained
clf_r.fit(X, y)
56 changes: 48 additions & 8 deletions test_app.py
Original file line number Diff line number Diff line change
@@ -1,21 +1,61 @@
from fastapi.testclient import TestClient
from main import app
# from datatime import datetime


# test to check the correct functioning of the /ping route
def test_ping():
with TestClient(app) as client:
response = client.get("/ping")
# asserting the correct response is received
assert response.status_code == 200
assert response.json() == {"ping":"pong"}
assert response.json() == {"ping": "pong"}


# test to check if Iris Virginica is classified correctly
def test_pred_virginica():
# defining a sample payload for the testcase
payload = {
"sepal_length": 3,
"sepal_width": 5,
"petal_length": 3.2,
"petal_width": 4.4,
}
with TestClient(app) as client:
response = client.post("/predict_flower", json=payload)
# asserting the correct response is received
assert response.status_code == 200
assert response.json() == {"flower_class": "Iris Virginica"}
# print(datatime.strftime("%H:%M:%S.%f",response.elapsed.total_seconds()))

# test to check if Iris Setosa is classified correctly
def test_pred_Setosa():
# defining a sample payload for the testcase
payload = {
"sepal_length": 3,
"sepal_width": 5,
"petal_length": 3.2,
"petal_width": 4.4
"sepal_length": 4.6,
"sepal_width": 3.1,
"petal_length": 1.5,
"petal_width": .2,
}
with TestClient(app) as client:
response = client.post('/predict_flower', json=payload)
response = client.post("/predict_flower", json=payload)
# asserting the correct response is received
assert response.status_code == 200
assert response.json() == {'flower_class': "Iris Virginica"}
assert response.json() == {"flower_class": "Iris Setosa"}

# test to check if Iris Versicolor is classified correctly
def test_pred_Versicolour():
# defining a sample payload for the testcase
payload = {
"sepal_length": 5.9,
"sepal_width": 3.0,
"petal_length": 4.2,
"petal_width": 1.5,
}
with TestClient(app) as client:
response = client.post("/predict_flower", json=payload)
# asserting the correct response is received
assert response.status_code == 200
assert response.json() == {"flower_class": "Iris Versicolour"}