Labs AI & DS | Lung Cancer Classification with CT Scans

Project Overview

Lung cancer remains the leading cause of cancer-related mortality worldwide. Unfortunately, only 16% of cases are diagnosed at an early, localized stage, where patients have a five-year survival rate exceeding 50%. When lung cancer is identified at more advanced stages, the survival rate plummets to just 5%.

Given this stark difference, early diagnosis is critical for improving patient outcomes. Non-invasive imaging methods, such as computed tomography (CT), have proven effective in providing crucial information regarding tumor status. This opens opportunities for developing computer-aided diagnosis (CAD) systems capable of assessing the malignancy risk of lung nodules and supporting clinical decision-making.

The goal of this project is to create a machine learning-based solution for classifying lung nodules as benign or malignant using CT images available within LIDC-IDRI dataset.

Project Development

Dependencies & Execution

As a request from ou professor this project was developed using a Notebook. Therefore if you're looking forward to test it out yourself, keep in mind to either use a Anaconda Distribution or a 3rd party software that helps you inspect and execute it.

Therefore, for more informations regarding the Virtual Environment used in Anaconda, consider checking the DEPENDENCIES.md file.

Planned Work

The project will involve several key phases, including:

Data Preprocessing : Cleaning and preparing the CT scan data to ensure its quality and consistency for further analysis.
Feature Engineering : Leveraging radiomics to extract meaningful features from the scans.
Model Development and Evaluation : Training and fine-tuning machine learning models to accurately classify lung nodules based on their malignancy status. It also focuses on assessing model performance using key metrics such as balanced accuracy and AUC, and validating results through robust methods such as k-fold cross-validation.
Statistical Inference : Conduct a statistical analysis to determine performance differences between the models and identify which one delivers the best results for this classification task.

The ultimate objective of this automated classification system is to aid in clinical decision-making, offering a supplementary screening tool that reduces the workload on radiologists while improving early detection rates for lung cancer.

Datasets

If you're interested in inspecting and executing this project yourself, you'll need access to all the datasets we've created.

Since GitHub has file size limits, we've made them all available in a Cloud Storage provided by Google Drive which you can access here.

Project Results

[Initial] Target Class Distribution

Here’s a quick overview of how the nodular malignancy in the dataset is distributed across five different levels of malignancy.

Machine Learning Models Evaluation

Here are some of the results obtained from various selected machine learning algorithms, which we found to be the most interesting based on their balanced accuracy scores.

Performance Evaluation
Algorithm	Metrics
SVM
Random Forest
XGBoost
Voting Classifier

Critical Differences Diagram

To better illustrate the performance differences between the models, let's examine their respective critical differences diagram.

In this diagram, XGBoost and the Voting Classifier share the same rank (2.2), suggesting that they performed similarly and may be the most suited for providing a solution to the classification problem.

Authorship

Authors → Francisco Macieira, Gonçalo Esteves and Nuno Gomes
Course → Laboratory of AI and DS [CC3044]
University → Faculty of Sciences, University of Porto

_{README.md by Gonçalo Esteves}

Name		Name	Last commit message	Last commit date
Latest commit History 238 Commits
Lung Cancer Classification with CT Scans		Lung Cancer Classification with CT Scans
.gitignore		.gitignore
DEPENDENCIES.md		DEPENDENCIES.md
LICENSE		LICENSE
LungCancer.yml		LungCancer.yml
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Labs AI & DS | Lung Cancer Classification with CT Scans

Project Overview

Project Development

Dependencies & Execution

Planned Work

Datasets

Project Results

[Initial] Target Class Distribution

Machine Learning Models Evaluation

Critical Differences Diagram

Authorship

About

Releases

Packages

Contributors 3

Languages

License

EstevesX10/Lung-Cancer-Classification-with-CT-Scans

Folders and files

Latest commit

History

Repository files navigation

Labs AI & DS | Lung Cancer Classification with CT Scans

Project Overview

Project Development

Dependencies & Execution

Planned Work

Datasets

Project Results

[Initial] Target Class Distribution

Machine Learning Models Evaluation

Critical Differences Diagram

Authorship

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages