1.Abnormality detection in mammography

Convolutional Neural Network for Medical Imaging Analysis - Abnormality detection in mammography

The objective is to perform abnormality classification in mammography using Convolutional Neural Networks

Original Dataset

The dataset we will focus on is CBIS DDSM: Curated Breast Imaging Subset of Digital Database for Screening Mammography.

The original data, along with a detailed description of the collection and the policies about usage and citation, can be found here: https://wiki.cancerimagingarchive.net/display/Public/CBIS-DDSM

This collection is freely available to browse, download, and use for commercial, scientific and educational purposes as outlined in the Creative Commons Attribution 3.0 Unported License.

A description of the dataset is provided in:

Lee, Rebecca Sawyer, et al. "A curated mammography data set for use in computer-aided detection and diagnosis research." Scientific data 4 (2017): 170177. URL: https://www.nature.com/articles/sdata2017177

The original images are in DICOM format, the standard format for the communication and management of medical imaging information and related data.

Description of abnormalities and Classification Tasks

The CBIS-DDSM dataset represents a collection of images from two classes of abnormalities. Indeed, it enables a task of abnormality classification, which aims at distinguishing the following classes:

Mass
Calcification

Furthermore, several csv files are hosted here and provide a detailed description of each image, according to the following fields:

patient_id, breast density, left or right breast, image view, abnormality id, abnormality type, calc type, calc distribution, assessment, pathology, subtlety, image file path, cropped image file path, ROI mask file path

Such description enables another fine-grained task: abnormality diagnosis classification. It aims at distinguishing the following classes:

Mass, Benign (with or without callback)
Mass, Malignant
Calcification, Benign
Calcification, Malignant

In the following you can find a sample image from the original dataset:

Project Dataset

Please notice that:

Full images have a high resolution, e.g. 3000x4000
Full images and patches are grayscale images with a depth of 16bit

Dataset as it is provided for the final project

Dealing with original dataset is critical since full images are high resolution and the DICOM format is not natively supported in tf.keras.

Indeed, you are provided with numpy arrays containing images and labels from training and test sets.

The steps performed on each original image are described below:

the abnormality patch has been extracted from the original image according to the binary mask;
a patch of healthy tissue (baseline patch) adjacent to the abnormality patch has been extracted from the original image (left, right, top or bottom - no overlap). Both abnormality patch and baseline patch have been added to the images tensor; in other words, an abnormality patch has been ignored if a related baseline patch could not be extracted.
both abnormality patch and baseline patch have been resized to shape (150x150) using OpenCV resize function: cv2.resize(img, dsize=(shape, shape), interpolation=cv2.INTER_NEAREST)
class labels have been assigned to the patches according to the following mapping:
- 0: Baseline patch
- 1: Mass, benign
- 2: Mass, malignant
- 3: Calcification, benign
- 4: Calcification, malignant
images of baseline patch and abnormality patch, and their related labels, have been added to distinct numpy arrays for images and labels.
- train_tensor.npy: images tensor for training
- train_labels.npy: labels tensor for training
- public_test_tensor.npy: images tensor for test
- public_test_labels.npy: images tensor for test
The images tensor of a private test set is also provided. The relative labels tensor is not published within the project files.
- private_test_tensor.npy
- private_test_labels.npy

Dataset Structure

Training set:
- images tensor shape (5352, 150, 150)
- labels tensor shape (5352,)
Public Test set:
- images tensor shape (672, 150, 150)
- labels tensor shape (672,)

Classes distribution of images

Train

-	benign	malignant	total
Train Masses	620	598	1218
Train Calcification	948	510	1458
Total	1568	1108	2676

Test

-	benign	malignant	total
Global Test Masses	214	144	358
Global Test Calcification	192	122	314
Total	406	266	672

Baseline patches number and position

-	right	left	top	bottom	total baselines
Baseline for train Masses	836	240	89	53	1218
Baseline for train Calcification	894	246	163	155	1458
Baseline for global Test Masses	241	78	27	12	358
Baseline for global Test Calcification	180	50	42	42	314

In light of the procedure described above, particular attention should be paid to the structure of the input tensors:

odd indices [2i + 1 for i in range(0,len(tensor)/2)] will refer to abnormality patches
previous even indices [2i for i in range(0,len(tensor)/2)] will refer to respective baseline patches

You will be able to load the arrays using numpy load function.

def load_training():
  images = np.load(os.path.join(out_path,'train_tensor.npy'))
  labels = np.load(os.path.join(out_path,'train_labels.npy'))
  return images,labels

images, labels= load_training()
print(labels[:10])
>>> array([0, 2, 0, 2, 0, 1, 0, 1, 0, 1])

The first item is the baseline patch (label 0) that is adjacent to the patch described by the second element of the array, i.e. the first abnormality patch (malignant mass, label 2), and so on.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly