Hi 😄, the implementation of the lung prognosis will be discussed here. A basic machine learning knowledge is preferred, so that the implementation can be understood straightforward.
Lung Cancer Prognosis was implemented with the help of Machine Learning libraries such as sklearn. This is a multi classification problem hence classification algorithms were used. The dataset used for this classification had around 1000 rows/records and 25 columns where only 23 columns were able to be taken as features for prediction. Notebook | Dataset
Before a model can be trained, we must first perform certain validations and checks (for data imbalance, presence of null values).
There is no presence of null values, however the classes are quite imbalanced.
Since the dataset is quite imbalanced sampling must be done
The following graphs are a comparison of the classes before and after sampling.
Once all the Data preprocessing and cleaning is completed the dataset is split into a ratio of 25% and 75% as the Test and Train set
Now that the data is prepped, we can start our model training. Since we do not know which exact algorithm would be the best for this use case multiple algorithms are tested against to be certain that the saved model is the best performing. (All model evaluations are present in the linked notebook)
Upon differentiating between the obtained evaluations it was determinded that the SVM performed the best.
The "Precision" & "Recall" metrics obtained by the confusion matrix along with the accuracy are used as evaluation metrics, since solely accuracy is not reliable. (Read more on these metrics)
To finish off the model implementation, the trained model is saved, so that it can be used in backend implementations, without the need to perform the training process again.
Now that the model is available, we can whip out a basic flask server to serve as our backend. app.py
Start off by initializing flask and loading the model
We can make a flask resource that will handle the prognosis of lung. Our request body will be a JSON object containing all of the properties that we pass as an argument to our request parser (these arguments are the same names as the columns of our dataset). Once the data is obtained, it is converted into a numpy array so that it can be fed into our model as an input to be evaluated.
Once this is done we can add our resource as an endpoint, so it can be accessed once deployed (or running locally).
Now that our server is implemented, all that remains is to deploy it to a service that could host a flask app, in this case Heroku was used 😁.
- Since the free tier of Heroku is used, the app size limit is 512mb. For the case of machine learning models (pkl), the models are small in size, and hence Heroku is practical to be used. However, for the case of larger - deep learning models, an alternative must be considered. For the case of the deep learning models implemented in this project, Microsoft Azure was used through Azure Functions.