Skip to content

Lung Cancer Classification with CT Scans [Labs of AI and DS Course Project]

License

Notifications You must be signed in to change notification settings

EstevesX10/Lung-Cancer-Classification-with-CT-Scans

Repository files navigation

Labs AI & DS | Lung Cancer Classification with CT Scans


Project Overview

Lung cancer remains the leading cause of cancer-related mortality worldwide. Unfortunately, only 16% of cases are diagnosed at an early, localized stage, where patients have a five-year survival rate exceeding 50%. When lung cancer is identified at more advanced stages, the survival rate plummets to just 5%.

Given this stark difference, early diagnosis is critical for improving patient outcomes. Non-invasive imaging methods, such as computed tomography (CT), have proven effective in providing crucial information regarding tumor status. This opens opportunities for developing computer-aided diagnosis (CAD) systems capable of assessing the malignancy risk of lung nodules and supporting clinical decision-making.

The goal of this project is to create a machine learning-based solution for classifying lung nodules as benign or malignant using CT images available within LIDC-IDRI dataset.

Project Development

Dependencies & Execution

As a request from ou professor this project was developed using a Notebook. Therefore if you're looking forward to test it out yourself, keep in mind to either use a Anaconda Distribution or a 3rd party software that helps you inspect and execute it.

Therefore, for more informations regarding the Virtual Environment used in Anaconda, consider checking the DEPENDENCIES.md file.

Planned Work

The project will involve several key phases, including:

  • Data Preprocessing : Cleaning and preparing the CT scan data to ensure its quality and consistency for further analysis.
  • Feature Engineering : Leveraging radiomics to extract meaningful features from the scans.
  • Model Development and Evaluation : Training and fine-tuning machine learning models to accurately classify lung nodules based on their malignancy status. It also focuses on assessing model performance using key metrics such as balanced accuracy and AUC, and validating results through robust methods such as k-fold cross-validation.
  • Statistical Inference : Conduct a statistical analysis to determine performance differences between the models and identify which one delivers the best results for this classification task.

The ultimate objective of this automated classification system is to aid in clinical decision-making, offering a supplementary screening tool that reduces the workload on radiologists while improving early detection rates for lung cancer.

Datasets

If you're interested in inspecting and executing this project yourself, you'll need access to all the datasets we've created.

Since GitHub has file size limits, we've made them all available in a Cloud Storage provided by Google Drive which you can access here.

Project Results

[Initial] Target Class Distribution

Here’s a quick overview of how the nodular malignancy in the dataset is distributed across five different levels of malignancy.

Machine Learning Models Evaluation

Here are some of the results obtained from various selected machine learning algorithms, which we found to be the most interesting based on their balanced accuracy scores.

Performance Evaluation
Algorithm
Metrics
SVM

Random Forest

XGBoost

Voting Classifier

Critical Differences Diagram

To better illustrate the performance differences between the models, let's examine their respective critical differences diagram.


In this diagram, XGBoost and the Voting Classifier share the same rank (2.2), suggesting that they performed similarly and may be the most suited for providing a solution to the classification problem.

Authorship

README.md by Gonçalo Esteves

About

Lung Cancer Classification with CT Scans [Labs of AI and DS Course Project]

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •