Skip to content

Commit

Permalink
Merge pull request #27 from jagruti8/add-detectors-metrics
Browse files Browse the repository at this point in the history
Added new outlier detector methods, modified the README.md file and modified the algorithm.txt
  • Loading branch information
jagruti8 authored Oct 26, 2022
2 parents f0129d5 + 6e7d954 commit 6653c80
Show file tree
Hide file tree
Showing 10 changed files with 146 additions and 41 deletions.
46 changes: 25 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,24 +114,28 @@ identified as an outlier. 0 refers to the first volume. For example (these
outlier IDs are completely random, for illustration):

```
data/group-01/sub-08/func/sub-08_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 18, 19, 133, 134, 135, 136, 154, 155, 157
data/group-01/sub-08/func/sub-08_task-taskzero_run-02_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 9, 17, 53, 54, 63, 78, 79, 151, 152, 153
data/group-01/sub-01/func/sub-01_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 20, 21, 157, 158
data/group-01/sub-01/func/sub-01_task-taskzero_run-02_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 17, 19, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160
data/group-01/sub-06/func/sub-06_task-taskzero_run-02_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 153, 154, 155, 156, 157, 158, 159, 160, 161
data/group-01/sub-06/func/sub-06_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 19, 24, 25, 26, 27, 28, 29, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159
data/group-01/sub-07/func/sub-07_task-taskzero_run-02_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28
data/group-01/sub-07/func/sub-07_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 132, 136, 137, 138, 139, 140, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161
data/group-01/sub-09/func/sub-09_task-taskzero_run-01_bold.nii.gz, 0, 134, 135, 136, 143, 144, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160
data/group-01/sub-09/func/sub-09_task-taskzero_run-02_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 36, 79, 80, 150, 151, 152, 153, 154, 155, 156, 157
data/group-01/sub-10/func/sub-10_task-taskzero_run-02_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 26, 104
data/group-01/sub-10/func/sub-10_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159
data/group-01/sub-05/func/sub-05_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 25, 26, 27, 48, 49, 52, 76, 77, 150
data/group-01/sub-05/func/sub-05_task-taskzero_run-02_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 28, 50, 51, 52, 54, 157, 158, 159, 160, 161
data/group-01/sub-02/func/sub-02_task-taskzero_run-02_bold.nii.gz, 34, 65, 105, 106, 107, 135, 140, 148
data/group-01/sub-02/func/sub-02_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21
data/group-01/sub-03/func/sub-03_task-taskzero_run-02_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 101, 102, 103, 160, 161
data/group-01/sub-03/func/sub-03_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 136, 137, 138, 139, 140, 142, 156, 157, 158, 159
data/group-01/sub-04/func/sub-04_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 59, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161
data/group-01/sub-04/func/sub-04_task-taskzero_run-02_bold.nii.gz, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 49, 50, 52, 53, 54, 55, 57, 58, 148, 149, 157
```
data/group-01/sub-08/func/sub-08_task-taskzero_run-01_bold.nii.gz, 11, 157
data/group-01/sub-08/func/sub-08_task-taskzero_run-02_bold.nii.gz, 79, 153
data/group-01/sub-01/func/sub-01_task-taskzero_run-01_bold.nii.gz, 0, 153
data/group-01/sub-01/func/sub-01_task-taskzero_run-02_bold.nii.gz, 151
data/group-01/sub-06/func/sub-06_task-taskzero_run-02_bold.nii.gz, 0, 1, 21, 22, 23, 24, 25, 26, 28, 29, 155
data/group-01/sub-06/func/sub-06_task-taskzero_run-01_bold.nii.gz, 1
data/group-01/sub-07/func/sub-07_task-taskzero_run-02_bold.nii.gz, 79, 80
data/group-01/sub-07/func/sub-07_task-taskzero_run-01_bold.nii.gz, 85
data/group-01/sub-09/func/sub-09_task-taskzero_run-02_bold.nii.gz, 23, 24, 25, 26, 27, 28, 30
data/group-01/sub-10/func/sub-10_task-taskzero_run-02_bold.nii.gz, 104
data/group-01/sub-05/func/sub-05_task-taskzero_run-01_bold.nii.gz, 0, 49, 77, 150
data/group-01/sub-05/func/sub-05_task-taskzero_run-02_bold.nii.gz, 3, 4, 5, 6, 9, 20, 23, 28, 49, 54
data/group-01/sub-03/func/sub-03_task-taskzero_run-02_bold.nii.gz, 160, 161
data/group-01/sub-03/func/sub-03_task-taskzero_run-01_bold.nii.gz, 11, 14, 15, 132, 156, 157
data/group-01/sub-04/func/sub-04_task-taskzero_run-01_bold.nii.gz, 1, 144, 154, 158, 159, 160, 161
data/group-01/sub-04/func/sub-04_task-taskzero_run-02_bold.nii.gz, 0, 1, 28, 35, 49, 52, 53
```

Shown below are the plots of the mean of voxel intensities for each time point vs time points. The detected outliers are marked in orange colour:
![](sub_01_02.png)
![](sub_03_04.png)
![](sub_05_06.png)
![](sub_07_08.png)
![](sub_09_10.png)

18 changes: 18 additions & 0 deletions algorithm.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,28 @@
Algorithm:

1. The 4D image is first segmented (using otsu threshold) to segment the brain voxels from the background. (x)

I. Median absolute deviation over voxels and median absolute deviation over time points
2. The median(med_voxel(x)) and median absolute deviation(mad_voxel(x)) is calculated for each of the brain voxels.
3. The brain voxels lying outside the interval [med_voxel(x)-a*mad_voxel(x), med_voxel(x)+a*mad_voxel(x)] are considered as outliers. a = 3.5
4. For each time t, the number of outlying voxels n(t) is counted.
5. The median (n_med) and MAD (n_mad) of n(t) are calculated. Any time t with n(t)>n_med+3.5*n_mad are considered as outliers.

II. DVARS and median absolute deviation over time points
6. The dvars(t) of the brain voxels are calculated.
7. The median (dvars_med) and MAD (dvars_mad) of dvars(t) are calculated. Any time t with |dvars(t)-dvars_med|>3.5*dvars_mad are considered as outliers.

III. Sliding window and median absolute deviation over time points
8. A fraction of the time points are chosen using a sliding window, mean over voxel intensities for each time point (m(t)) in this sliding window is calculated.
9. The median (m_med) and MAD (m_mad) of m(t) in this sliding window are calculated. Any time t with |m(t)-m_med|>3.5*m_mad are considered as outliers.
10. This is repeated till the sliding window covers all the time points.
11. All the outliers in each sliding window are merged.

12. The outliers from the I approach is a more global approach and filters a lot of time-points as outliers.
13. The outliers from the II(DVARS) approach compares successive volumes and if the difference is large considers the preceding volumes as outliers.
14. The outliers from the III(sliding window) approach compares volumes within a certain range and then detect outliers. This is mostly done to take care of drift.

15. Outliers from I and II are merged (o_total) (Global + Local). If these o_total agree with the outliers from sliding window, they are classified as final outliers (o_final).

References:
1. Cox, R.W. Outlier Detection in FMRl Time Series. ISMRM(2002).
7 changes: 5 additions & 2 deletions findoutlie/detectors.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ def mad_voxel_detector(img,threshold=3.5):
outlier_tf = np.abs(img-med)>(threshold*mad)
return outlier_tf

def mad_time_detector(measures, threshold=3.5):
def mad_time_detector(measures, lower_bound, threshold=3.5):
""" Detect outliers in 'measures' using median absolute deviation.
Returns 1D vector of same length as 'measures', where True means the corresponsding
value in 'measures' is an outlier.
Expand All @@ -71,7 +71,10 @@ def mad_time_detector(measures, threshold=3.5):
# Calculate median absoulte deviation of measures
mad = np.median(np.abs(measures-med))
# Calculate the outliers
outlier_tf = measures>med+threshold*mad
if lower_bound:
outlier_tf = np.abs(measures-med)>threshold*mad
else:
outlier_tf = measures>med+threshold*mad
return outlier_tf

def iqr_detector(measures, iqr_proportion=1.5):
Expand Down
28 changes: 21 additions & 7 deletions findoutlie/metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,26 @@
# +++your code here+++
import numpy as np

def dvars_voxel(voxels):
""" Calculate dvars metric on 2D array with voxels in rows and time-points(volumes) in columns
The dvars calculation between two volumes is defined as the square root of
(the mean of the (voxel differences square)).
Parameters
----------
voxels : 2D array
Returns
-------
dvals : 1D array
One-dimensional array with n-1 elements, where n is the number of
volumes in 'img'.
"""
vol_diff = voxels[..., 1:] - voxels[..., :-1]
dvar_val = np.sqrt(np.mean(vol_diff ** 2, axis=0))
return dvar_val


def dvars(img):
""" Calculate dvars metric on Nibabel image `img`
Expand Down Expand Up @@ -33,14 +53,8 @@ def dvars(img):
data = img.get_fdata()

voxel_by_time = np.reshape(data, (-1, data.shape[-1]))
dvar_val = dvars_voxel(voxel_by_time)

vol_diff = voxel_by_time[..., 1:] - voxel_by_time[..., :-1] # 2D array
# print(vol_diff.shape())
# print(vol_diff)
# vol_diff_1D=vol_diff.flatten()
dvar_val = np.sqrt(np.mean(vol_diff ** 2, axis=0))
# print(dvar_val.shape())
# print(dvar_val)
return dvar_val

raise NotImplementedError("Code up this function")
88 changes: 77 additions & 11 deletions findoutlie/outfind.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

from skimage.filters import threshold_otsu

from .metrics import dvars
from .metrics import dvars,dvars_voxel
from .detectors import iqr_detector,mad_voxel_detector,mad_time_detector

def segment_brain(img):
Expand All @@ -27,14 +27,12 @@ def segment_brain(img):
mean_img = np.mean(img, axis=-1)
# calculate the threshold for segmenting brain from background
threshold = threshold_otsu(mean_img)
mask = np.expand_dims(mean_img > threshold, axis=1)
mask_2D = np.tile(mask, (1, img.shape[-1]))
thresholded_img = np.where(mask_2D, img, np.nan)
mask = mean_img > threshold
# filter only brain voxels
brain_voxels = thresholded_img[~np.isnan(thresholded_img).all(axis=1)]
brain_voxels = img[mask]
return brain_voxels

def detect_outliers_mean_absolute_deviation_mask(fname):
def detect_outliers_mad_median_absolute_deviation_mask(fname):
""" Detect outliers given image file path 'filename'
Parameters
Expand All @@ -47,7 +45,7 @@ def detect_outliers_mean_absolute_deviation_mask(fname):
outliers : array
Indices of outlier volumes.
"""
# A mask is used to first segment the brain regions from the background, then mean absolute deviation is used to detect outliers
# A mask is used to first segment the brain regions from the background, then median absolute deviation is used to detect outliers
img = nib.load(fname)
img_data = img.get_fdata()
# reshape from 4D to 2D
Expand All @@ -59,9 +57,71 @@ def detect_outliers_mean_absolute_deviation_mask(fname):
# calculate the number of outlying voxels for each time point
voxel_outliers_per_time = np.nansum(outliers_voxel,axis=0)
# find the outliers in the time-series
outliers_time = mad_time_detector(voxel_outliers_per_time)
outliers_time = mad_time_detector(voxel_outliers_per_time, lower_bound=False)
# Return indices of True values from Boolean array.
return np.nonzero(outliers_time)[0]
return np.nonzero(outliers_time)

def detect_outliers_mad_dvars_mask(fname):
""" Detect outliers given image file path 'filename'
Parameters
----------
fname : str or Path
Filename of 4D image, as string or Path object
Returns
-------
outliers : array
Indices of outlier volumes.
"""
# A mask is used to first segment the brain regions from the background, dvars is calculated and then median absolute deviation is used to detect outliers
img = nib.load(fname)
img_data = img.get_fdata()
# reshape from 4D to 2D
img_data_2D = np.reshape(img_data, (-1,img_data.shape[-1]))
# segment brain from background
brain_voxels = segment_brain(img_data_2D)
# calculate dvars
dvs = dvars_voxel(brain_voxels)
# detect outliers
is_outlier = mad_time_detector(dvs, lower_bound=True)
return np.nonzero(is_outlier)

def detect_outliers_mad_sliding_window_mask(fname):
""" Detect outliers given image file path 'filename'
Parameters
----------
fname : str or Path
Filename of 4D image, as string or Path object
Returns
-------
outliers : array
Indices of outlier volumes.
"""
# A mask is used to first segment the brain regions from the background, sliding window approach is used to detect outliers in each window using median absolute deviation
img = nib.load(fname)
img_data = img.get_fdata()
# reshape from 4D to 2D
img_data_2D = np.reshape(img_data, (-1,img_data.shape[-1]))
# segment brain from background
brain_voxels = segment_brain(img_data_2D)
# apply sliding window
overlap = 10
window_length = 20
outliers = []
for i in range(0, brain_voxels.shape[-1],overlap):
if i+window_length>=brain_voxels.shape[-1]:
elements = np.mean(brain_voxels[:,i:],axis=0)
outliers_1 = np.nonzero(mad_time_detector(elements, lower_bound=True))[0] + i
outliers.extend(outliers_1)
break
else:
elements = np.mean(brain_voxels[:,i:i+window_length],axis=0)
outliers_1 = np.nonzero(mad_time_detector(elements, lower_bound=True))[0] + i
outliers.extend(outliers_1)
return np.unique(outliers)


def detect_outliers(fname):
Expand All @@ -82,7 +142,7 @@ def detect_outliers(fname):
dvs = dvars(img)
is_outlier = iqr_detector(dvs, iqr_proportion=2)
# Return indices of True values from Boolean array.
return np.nonzero(is_outlier)[0]
return np.nonzero(is_outlier)


def find_outliers(data_directory):
Expand All @@ -102,7 +162,13 @@ def find_outliers(data_directory):
image_fnames = Path(data_directory).glob("**/sub-*.nii.gz")
outlier_dict = {}
for fname in image_fnames:
outliers = detect_outliers_mean_absolute_deviation_mask(fname)
# detect outliers using mad over voxels and mad over time points
outliers_mad = detect_outliers_mad_median_absolute_deviation_mask(fname)
# detect outliers using dvars and mad over time points
outliers_dvars = detect_outliers_mad_dvars_mask(fname)
# detect outliers using sliding window and mad over time points
outliers_sliding_window = np.array(detect_outliers_mad_sliding_window_mask(fname))
outliers = np.intersect1d(outliers_sliding_window,np.union1d(outliers_mad, outliers_dvars))
#outliers = detect_outliers(fname)
outlier_dict[fname] = outliers
return outlier_dict
Binary file added sub_01_02.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added sub_03_04.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added sub_05_06.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added sub_07_08.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added sub_09_10.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 6653c80

Please sign in to comment.