diff --git a/README.md b/README.md index 978819d..bc13744 100644 --- a/README.md +++ b/README.md @@ -114,24 +114,28 @@ identified as an outlier. 0 refers to the first volume. For example (these outlier IDs are completely random, for illustration): ``` -data/group-01/sub-08/func/sub-08_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 18, 19, 133, 134, 135, 136, 154, 155, 157 -data/group-01/sub-08/func/sub-08_task-taskzero_run-02_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 9, 17, 53, 54, 63, 78, 79, 151, 152, 153 -data/group-01/sub-01/func/sub-01_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 20, 21, 157, 158 -data/group-01/sub-01/func/sub-01_task-taskzero_run-02_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 17, 19, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160 -data/group-01/sub-06/func/sub-06_task-taskzero_run-02_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 153, 154, 155, 156, 157, 158, 159, 160, 161 -data/group-01/sub-06/func/sub-06_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 19, 24, 25, 26, 27, 28, 29, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159 -data/group-01/sub-07/func/sub-07_task-taskzero_run-02_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 -data/group-01/sub-07/func/sub-07_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 132, 136, 137, 138, 139, 140, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161 -data/group-01/sub-09/func/sub-09_task-taskzero_run-01_bold.nii.gz, 0, 134, 135, 136, 143, 144, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160 -data/group-01/sub-09/func/sub-09_task-taskzero_run-02_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 36, 79, 80, 150, 151, 152, 153, 154, 155, 156, 157 -data/group-01/sub-10/func/sub-10_task-taskzero_run-02_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 26, 104 -data/group-01/sub-10/func/sub-10_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159 -data/group-01/sub-05/func/sub-05_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 26, 27, 48, 49, 52, 76, 77, 150 -data/group-01/sub-05/func/sub-05_task-taskzero_run-02_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 28, 50, 51, 52, 54, 157, 158, 159, 160, 161 -data/group-01/sub-02/func/sub-02_task-taskzero_run-02_bold.nii.gz, 34, 65, 105, 106, 107, 135, 140, 148 -data/group-01/sub-02/func/sub-02_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 -data/group-01/sub-03/func/sub-03_task-taskzero_run-02_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 101, 102, 103, 160, 161 -data/group-01/sub-03/func/sub-03_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 136, 137, 138, 139, 140, 142, 156, 157, 158, 159 -data/group-01/sub-04/func/sub-04_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 59, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161 -data/group-01/sub-04/func/sub-04_task-taskzero_run-02_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 49, 50, 52, 53, 54, 55, 57, 58, 148, 149, 157 -``` +data/group-01/sub-08/func/sub-08_task-taskzero_run-01_bold.nii.gz, 11, 157 +data/group-01/sub-08/func/sub-08_task-taskzero_run-02_bold.nii.gz, 79, 153 +data/group-01/sub-01/func/sub-01_task-taskzero_run-01_bold.nii.gz, 0, 153 +data/group-01/sub-01/func/sub-01_task-taskzero_run-02_bold.nii.gz, 151 +data/group-01/sub-06/func/sub-06_task-taskzero_run-02_bold.nii.gz, 0, 1, 21, 22, 23, 24, 25, 26, 28, 29, 155 +data/group-01/sub-06/func/sub-06_task-taskzero_run-01_bold.nii.gz, 1 +data/group-01/sub-07/func/sub-07_task-taskzero_run-02_bold.nii.gz, 79, 80 +data/group-01/sub-07/func/sub-07_task-taskzero_run-01_bold.nii.gz, 85 +data/group-01/sub-09/func/sub-09_task-taskzero_run-02_bold.nii.gz, 23, 24, 25, 26, 27, 28, 30 +data/group-01/sub-10/func/sub-10_task-taskzero_run-02_bold.nii.gz, 104 +data/group-01/sub-05/func/sub-05_task-taskzero_run-01_bold.nii.gz, 0, 49, 77, 150 +data/group-01/sub-05/func/sub-05_task-taskzero_run-02_bold.nii.gz, 3, 4, 5, 6, 9, 20, 23, 28, 49, 54 +data/group-01/sub-03/func/sub-03_task-taskzero_run-02_bold.nii.gz, 160, 161 +data/group-01/sub-03/func/sub-03_task-taskzero_run-01_bold.nii.gz, 11, 14, 15, 132, 156, 157 +data/group-01/sub-04/func/sub-04_task-taskzero_run-01_bold.nii.gz, 1, 144, 154, 158, 159, 160, 161 +data/group-01/sub-04/func/sub-04_task-taskzero_run-02_bold.nii.gz, 0, 1, 28, 35, 49, 52, 53 +``` + +Shown below are the plots of the mean of voxel intensities for each time point vs time points. The detected outliers are marked in orange colour: +![](sub_01_02.png) +![](sub_03_04.png) +![](sub_05_06.png) +![](sub_07_08.png) +![](sub_09_10.png) + diff --git a/algorithm.txt b/algorithm.txt index 0e0ed20..c4d6179 100644 --- a/algorithm.txt +++ b/algorithm.txt @@ -1,10 +1,28 @@ Algorithm: 1. The 4D image is first segmented (using otsu threshold) to segment the brain voxels from the background. (x) + +I. Median absolute deviation over voxels and median absolute deviation over time points 2. The median(med_voxel(x)) and median absolute deviation(mad_voxel(x)) is calculated for each of the brain voxels. 3. The brain voxels lying outside the interval [med_voxel(x)-a*mad_voxel(x), med_voxel(x)+a*mad_voxel(x)] are considered as outliers. a = 3.5 4. For each time t, the number of outlying voxels n(t) is counted. 5. The median (n_med) and MAD (n_mad) of n(t) are calculated. Any time t with n(t)>n_med+3.5*n_mad are considered as outliers. +II. DVARS and median absolute deviation over time points +6. The dvars(t) of the brain voxels are calculated. +7. The median (dvars_med) and MAD (dvars_mad) of dvars(t) are calculated. Any time t with |dvars(t)-dvars_med|>3.5*dvars_mad are considered as outliers. + +III. Sliding window and median absolute deviation over time points +8. A fraction of the time points are chosen using a sliding window, mean over voxel intensities for each time point (m(t)) in this sliding window is calculated. +9. The median (m_med) and MAD (m_mad) of m(t) in this sliding window are calculated. Any time t with |m(t)-m_med|>3.5*m_mad are considered as outliers. +10. This is repeated till the sliding window covers all the time points. +11. All the outliers in each sliding window are merged. + +12. The outliers from the I approach is a more global approach and filters a lot of time-points as outliers. +13. The outliers from the II(DVARS) approach compares successive volumes and if the difference is large considers the preceding volumes as outliers. +14. The outliers from the III(sliding window) approach compares volumes within a certain range and then detect outliers. This is mostly done to take care of drift. + +15. Outliers from I and II are merged (o_total) (Global + Local). If these o_total agree with the outliers from sliding window, they are classified as final outliers (o_final). + References: 1. Cox, R.W. Outlier Detection in FMRl Time Series. ISMRM(2002). \ No newline at end of file diff --git a/findoutlie/detectors.py b/findoutlie/detectors.py index 564203f..2ebcc0a 100644 --- a/findoutlie/detectors.py +++ b/findoutlie/detectors.py @@ -45,7 +45,7 @@ def mad_voxel_detector(img,threshold=3.5): outlier_tf = np.abs(img-med)>(threshold*mad) return outlier_tf -def mad_time_detector(measures, threshold=3.5): +def mad_time_detector(measures, lower_bound, threshold=3.5): """ Detect outliers in 'measures' using median absolute deviation. Returns 1D vector of same length as 'measures', where True means the corresponsding value in 'measures' is an outlier. @@ -71,7 +71,10 @@ def mad_time_detector(measures, threshold=3.5): # Calculate median absoulte deviation of measures mad = np.median(np.abs(measures-med)) # Calculate the outliers - outlier_tf = measures>med+threshold*mad + if lower_bound: + outlier_tf = np.abs(measures-med)>threshold*mad + else: + outlier_tf = measures>med+threshold*mad return outlier_tf def iqr_detector(measures, iqr_proportion=1.5): diff --git a/findoutlie/metrics.py b/findoutlie/metrics.py index 8681597..d334fb8 100644 --- a/findoutlie/metrics.py +++ b/findoutlie/metrics.py @@ -5,6 +5,26 @@ # +++your code here+++ import numpy as np +def dvars_voxel(voxels): + """ Calculate dvars metric on 2D array with voxels in rows and time-points(volumes) in columns + + The dvars calculation between two volumes is defined as the square root of + (the mean of the (voxel differences square)). + + Parameters + ---------- + voxels : 2D array + + Returns + ------- + dvals : 1D array + One-dimensional array with n-1 elements, where n is the number of + volumes in 'img'. + """ + vol_diff = voxels[..., 1:] - voxels[..., :-1] + dvar_val = np.sqrt(np.mean(vol_diff ** 2, axis=0)) + return dvar_val + def dvars(img): """ Calculate dvars metric on Nibabel image `img` @@ -33,14 +53,8 @@ def dvars(img): data = img.get_fdata() voxel_by_time = np.reshape(data, (-1, data.shape[-1])) + dvar_val = dvars_voxel(voxel_by_time) - vol_diff = voxel_by_time[..., 1:] - voxel_by_time[..., :-1] # 2D array - # print(vol_diff.shape()) - # print(vol_diff) - # vol_diff_1D=vol_diff.flatten() - dvar_val = np.sqrt(np.mean(vol_diff ** 2, axis=0)) - # print(dvar_val.shape()) - # print(dvar_val) return dvar_val raise NotImplementedError("Code up this function") diff --git a/findoutlie/outfind.py b/findoutlie/outfind.py index 9779d57..ce44619 100644 --- a/findoutlie/outfind.py +++ b/findoutlie/outfind.py @@ -9,7 +9,7 @@ from skimage.filters import threshold_otsu -from .metrics import dvars +from .metrics import dvars,dvars_voxel from .detectors import iqr_detector,mad_voxel_detector,mad_time_detector def segment_brain(img): @@ -27,14 +27,12 @@ def segment_brain(img): mean_img = np.mean(img, axis=-1) # calculate the threshold for segmenting brain from background threshold = threshold_otsu(mean_img) - mask = np.expand_dims(mean_img > threshold, axis=1) - mask_2D = np.tile(mask, (1, img.shape[-1])) - thresholded_img = np.where(mask_2D, img, np.nan) + mask = mean_img > threshold # filter only brain voxels - brain_voxels = thresholded_img[~np.isnan(thresholded_img).all(axis=1)] + brain_voxels = img[mask] return brain_voxels -def detect_outliers_mean_absolute_deviation_mask(fname): +def detect_outliers_mad_median_absolute_deviation_mask(fname): """ Detect outliers given image file path 'filename' Parameters @@ -47,7 +45,7 @@ def detect_outliers_mean_absolute_deviation_mask(fname): outliers : array Indices of outlier volumes. """ - # A mask is used to first segment the brain regions from the background, then mean absolute deviation is used to detect outliers + # A mask is used to first segment the brain regions from the background, then median absolute deviation is used to detect outliers img = nib.load(fname) img_data = img.get_fdata() # reshape from 4D to 2D @@ -59,9 +57,71 @@ def detect_outliers_mean_absolute_deviation_mask(fname): # calculate the number of outlying voxels for each time point voxel_outliers_per_time = np.nansum(outliers_voxel,axis=0) # find the outliers in the time-series - outliers_time = mad_time_detector(voxel_outliers_per_time) + outliers_time = mad_time_detector(voxel_outliers_per_time, lower_bound=False) # Return indices of True values from Boolean array. - return np.nonzero(outliers_time)[0] + return np.nonzero(outliers_time) + +def detect_outliers_mad_dvars_mask(fname): + """ Detect outliers given image file path 'filename' + + Parameters + ---------- + fname : str or Path + Filename of 4D image, as string or Path object + + Returns + ------- + outliers : array + Indices of outlier volumes. + """ + # A mask is used to first segment the brain regions from the background, dvars is calculated and then median absolute deviation is used to detect outliers + img = nib.load(fname) + img_data = img.get_fdata() + # reshape from 4D to 2D + img_data_2D = np.reshape(img_data, (-1,img_data.shape[-1])) + # segment brain from background + brain_voxels = segment_brain(img_data_2D) + # calculate dvars + dvs = dvars_voxel(brain_voxels) + # detect outliers + is_outlier = mad_time_detector(dvs, lower_bound=True) + return np.nonzero(is_outlier) + +def detect_outliers_mad_sliding_window_mask(fname): + """ Detect outliers given image file path 'filename' + + Parameters + ---------- + fname : str or Path + Filename of 4D image, as string or Path object + + Returns + ------- + outliers : array + Indices of outlier volumes. + """ + # A mask is used to first segment the brain regions from the background, sliding window approach is used to detect outliers in each window using median absolute deviation + img = nib.load(fname) + img_data = img.get_fdata() + # reshape from 4D to 2D + img_data_2D = np.reshape(img_data, (-1,img_data.shape[-1])) + # segment brain from background + brain_voxels = segment_brain(img_data_2D) + # apply sliding window + overlap = 10 + window_length = 20 + outliers = [] + for i in range(0, brain_voxels.shape[-1],overlap): + if i+window_length>=brain_voxels.shape[-1]: + elements = np.mean(brain_voxels[:,i:],axis=0) + outliers_1 = np.nonzero(mad_time_detector(elements, lower_bound=True))[0] + i + outliers.extend(outliers_1) + break + else: + elements = np.mean(brain_voxels[:,i:i+window_length],axis=0) + outliers_1 = np.nonzero(mad_time_detector(elements, lower_bound=True))[0] + i + outliers.extend(outliers_1) + return np.unique(outliers) def detect_outliers(fname): @@ -82,7 +142,7 @@ def detect_outliers(fname): dvs = dvars(img) is_outlier = iqr_detector(dvs, iqr_proportion=2) # Return indices of True values from Boolean array. - return np.nonzero(is_outlier)[0] + return np.nonzero(is_outlier) def find_outliers(data_directory): @@ -102,7 +162,13 @@ def find_outliers(data_directory): image_fnames = Path(data_directory).glob("**/sub-*.nii.gz") outlier_dict = {} for fname in image_fnames: - outliers = detect_outliers_mean_absolute_deviation_mask(fname) + # detect outliers using mad over voxels and mad over time points + outliers_mad = detect_outliers_mad_median_absolute_deviation_mask(fname) + # detect outliers using dvars and mad over time points + outliers_dvars = detect_outliers_mad_dvars_mask(fname) + # detect outliers using sliding window and mad over time points + outliers_sliding_window = np.array(detect_outliers_mad_sliding_window_mask(fname)) + outliers = np.intersect1d(outliers_sliding_window,np.union1d(outliers_mad, outliers_dvars)) #outliers = detect_outliers(fname) outlier_dict[fname] = outliers return outlier_dict diff --git a/sub_01_02.png b/sub_01_02.png new file mode 100644 index 0000000..e77f609 Binary files /dev/null and b/sub_01_02.png differ diff --git a/sub_03_04.png b/sub_03_04.png new file mode 100644 index 0000000..014ad2a Binary files /dev/null and b/sub_03_04.png differ diff --git a/sub_05_06.png b/sub_05_06.png new file mode 100644 index 0000000..2c5ef11 Binary files /dev/null and b/sub_05_06.png differ diff --git a/sub_07_08.png b/sub_07_08.png new file mode 100644 index 0000000..5bc910e Binary files /dev/null and b/sub_07_08.png differ diff --git a/sub_09_10.png b/sub_09_10.png new file mode 100644 index 0000000..f243514 Binary files /dev/null and b/sub_09_10.png differ