Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Initial implementation of Hilbert detector #23

Merged
merged 31 commits into from
Apr 6, 2021
Merged

Conversation

pmyers16
Copy link
Collaborator

@pmyers16 pmyers16 commented Mar 12, 2021

PR Description

Addresses part of: #18

Refactor detector into 3 step process of calculate, threshold, merge, and implement Hilbert Detector in that scheme.

Merge checklist

Maintainer, please confirm the following before merging:

  • All comments resolved
  • This is not your own PR
  • All CIs are happy
  • PR title starts with [MRG]
  • whats_new.rst is updated
  • PR description includes phrase "closes <#issue-number>"

@@ -79,8 +84,144 @@ def _write_json(fname, dictionary, overwrite=False, verbose=False):
print(os.linesep + f"Writing '{fname}'..." + os.linesep)
print(json_output)

def _band_zscore_detect(signal, sfreq, band_idx, l_freq, h_freq, n_times,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not quite fix the loop issue here, but I figured out how this works and tried to document it better.

mne_hfo/utils.py Outdated
@@ -139,26 +282,166 @@ def compute_line_length(signal, win_size=6):
return data[start:-stop]


def threshold_std(signal, threshold):
def compute_hilbert(signal, extra_params):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a bit different because you compute the metric per freq band, so the return ends up being an ndarray

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document what the array axes mean, so I can take a closer look?

cycles_threshold, gap_threshold,
zscore_threshold]
tdetects.append(_band_zscore_detect(args))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From here, we should have band indices, start and stop times, and max values during that band

mne_hfo/utils.py Outdated
return tdetects


def _run_detect_branch(detects, det_idx, HFO_outline):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe all this does is merge overlapping events

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's see if we can refactor this then to use our version, which imo is more explicit.

This recursion imo is completely unnecessary and hard to decipher heh

mne_hfo/utils.py Outdated
# corresponding to start and stop times
outlines = []
if len(tdetects):
while sum(tdetects[:, 0] != 0):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to look more into what this function's purpose is...I don't see how tdetects is changing, so not sure how this loop would ever end

mne_hfo/utils.py Outdated
max_amplitude = max(outline[:, 3])
ch_hfos.append((start, stop, freq_min, freq_max,
frq_at_max, max_amplitude))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the additional info comes in. Hilbert calculates start and stop times, the frq band that the detection was found in, the max amplitude of the signal during this window, and the freq where that max amplitude occurred. Should we try to work this info into the output data structures?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So based on a threshold applied, the start/stop times can get converted to our hfo_event_arr_, while we can store in parallel a same-size array with freq-band (l_freq and h_freq) for each index, and the also another array with the same-size with the max_amplitude and also another array with the freq of the max amplitude.

We can either use these downstream, or scrap these and pipe them into a cohesive dataframe. This tbd, since I'm not sure how this would look, so better to be explicit rn.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, you are proposing that a HilbertDetector would have additional objects assigned to it at the end of class. We put start/stop times into the hfo_event_arr_ that we have been using. Then we have another array for freq bands and another array for amplitudes. Then we compare them just by matching row indices? Seems easy enough

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. These can all be "additional" data structures that are unique to just Hilbert detectors. We can later figure out how to best combine these if one would need programmatic access to it when doing post-hoc anallysis.

verbose: bool = False):

def __init__(self,
sfreq: float,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need.

X = mne.filter.filter_data(X, sfreq=self.sfreq,
l_freq=self.l_freq,
h_freq=self.h_freq,
method='iir', verbose=self.verbose)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remind me why iir vs fir for this one?

Do they use IIR in the original implementation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, just document it here perhaps inline comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is what was in the original implementation, so I can comment

@pemyers27
Copy link
Contributor

So right now, _post_process_ch_hfos simply calls the threshold_func to determine the threshold, then applies this threshold to the data to identify events. Obviously this is too simple for a Hilbert detector, or something more complex. My initial attempt seems slightly wrong, so what do you think about this workflow.
if threshold-method == '<method>': threshold_det_func = threshold_det_<method> event_identification_func = event_identification_<method> event_merging_func = event_merging_<method> extra_params = dict(values=values,...)
This will allow us to keep things simple for RMS/Line Length Detectors, but allow for any level of complexity at each step for the more complicated detectors.

@adam2392
Copy link
Member

adam2392 commented Mar 15, 2021

So right now, _post_process_ch_hfos simply calls the threshold_func to determine the threshold, then applies this threshold to the data to identify events. Obviously this is too simple for a Hilbert detector, or something more complex. My initial attempt seems slightly wrong, so what do you think about this workflow.
if threshold-method == '<method>': threshold_det_func = threshold_det_<method> event_identification_func = event_identification_<method> event_merging_func = event_merging_<method> extra_params = dict(values=values,...)

Would you mind explaining to me with some more detail here? I'm not quite following. So Hilbert rn requires 3 thresholds: i) zscore of the Hilbert transform envelope, ii) number of cycles found and iii) gaps (idr what gaps were, so maybe we need to document that better).

The issue is _post_process_ch_hfos performs thresholding of only one metric. It seems based on our convos so far, that Hilbert needs to store the output of the thresholding step? Or how is the thresholding step connected to later steps that are not amenable to the current API of _compute_hfo_event, _post_process ?

Some thoughts: My impression is that Hilbert only needs to refactor the _band_z_score_detect step into what we call _post_process. It is essentially, taking the thresholds of the detector and looking for contiguously aligned windows that meet this "relatively complicated" threshold criterion. Thus, I thought all one needed to do was to rewrite fit, _compute_hfo_event, and _post_process_ch_hfos.

If thresholding needs to be a distinct step in of itself, then we can have each fit be: i) _compute_hfo_metrics, ii) threshold and iii) post_process into HFO detections. Would that work? You then can i) refactor the threshold step for all detectors out of the _post_proc function and then ii) refactor the _band_z_score_detect function into postprocess? This might make things more generalizable.

If I'm missing anything, lmk.

@pmyers16
Copy link
Collaborator Author

pmyers16 commented Mar 17, 2021

So the basic idea of _run_detect_branch is to find the entire freq band that the HFO occurred in. Each freq band has a band_idx, so the goal is to check if there exists eventi[band_idx], eventj[band_idx+1] that overlap, and the recursion allows that to keep going for eventi[band_idx], eventj[band_idx+1], eventk[band_idx+2], .... is equal to one event

Once the overlapping events are merged, the HFO is defined as the [min start time, max end time] and the freq band is defined as [l_freq(band_idx), h_freq(band_idx+(n-1)] for n overlapping events.

Unfortunately I don't think computational efficiency is going to be good for this no matter what, but this is generally the new idea:
# For simplicity lets assume each detection has the form [start, stop, band_idx]
outlines = list
for detection in detections:
----if band_idx is 0:
--------append detection to outlines
----else:
--------for outline in outlines:
------------if detection overlaps outline:
----------------merge detection and outline
------------else:
----------------append detection to outlines

And the merging step would be something simple like:
def merge_outline(outline, detection):
----start = min(outline[0], detection[0]
----stop = max(outline[1], detection[1]
----freq_band = [outline[2][0], detection[2][1]]
----new outline = [start, stop, freq_band]

Then outlines should only have distinct HFO events

mne_hfo/base.py Outdated Show resolved Hide resolved
print(f'Using {threshold_method} to perform HFO '
f'thresholding.')
if method == "time-windows":
return detections
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't merge time-windows?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do, but we do it upstream. I left a TODO where its done to refactor it here if you wanted. I think refactoring will slow it down a bit, which is why I held off

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm okay I will take a look

Comment on lines 198 to 207
if band_method == 'log':
low_fc = float(filter_band[0])
high_fc = float(filter_band[1])
freq_cutoffs = np.logspace(0, np.log10(high_fc), n_bands)
self.freq_cutoffs = freq_cutoffs[(freq_cutoffs > low_fc) &
(freq_cutoffs < high_fc)]
self.freq_span = len(freq_cutoffs) - 1
elif band_method == 'linear':
self.freq_cutoffs = np.arange(filter_band[0], filter_band[1])
self.freq_span = (filter_band[1] - filter_band[0]) - 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but will not work because goes against sklearn-style API. Stick inside fit() at the beginning instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning, sklearn imposes that one does not define additional things inside __init__ except what is passed in. Not sure why, but just something to do w/ it's API compliance.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I stick it in Hilbert's _compute_hfo_statistic function? That way we avoid a bunch of if statements in the base fit method

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that makes sense, unless there's any logic in fit that relies on this before calling _compute_hfo_statistic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it just calls _check_input_raw, then right into _compute_hfo_statistic

Copy link
Member

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some changes, but overall looks good, especially if unit tests are still passing.

Can you also add to the docstring of Detector, the general flow of an algorithm? This way, any downstream Detectors follow this abstraction.

E.g.

Step 1: compute a statistic on the data
Step 2: ...
etc.

Once you also add unit tests for Hilbert and validate that this works, then LGTM.

mne_hfo/utils.py Outdated

# return the absolute value of the Hilbert transform.
# (i.e. the envelope)
hfx = np.abs(hilbert(signal))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step is what was throwing the memory error

@adam2392
Copy link
Member

If you fix the CI and resolve the resolved comments, then I can take another review. Can merge and then we can move on with other high pri tasks.

@codecov
Copy link

codecov bot commented Apr 1, 2021

Codecov Report

Merging #23 (5c95c39) into master (beb0932) will decrease coverage by 0.95%.
The diff coverage is 35.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #23      +/-   ##
==========================================
- Coverage   70.96%   70.00%   -0.96%     
==========================================
  Files          13       13              
  Lines        1054     1167     +113     
==========================================
+ Hits          748      817      +69     
- Misses        306      350      +44     
Impacted Files Coverage Δ
mne_hfo/utils.py 37.76% <26.35%> (-26.17%) ⬇️
mne_hfo/detect.py 59.09% <38.88%> (+21.65%) ⬆️
mne_hfo/base.py 74.85% <54.90%> (+1.13%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update beb0932...5c95c39. Read the comment docs.

@adam2392
Copy link
Member

adam2392 commented Apr 1, 2021

Looks like _band_zscore_detect in utils.py isn't covered. I think you don't need that anymore right since you refactored things?

@pmyers16
Copy link
Collaborator Author

pmyers16 commented Apr 1, 2021

Looks like _band_zscore_detect in utils.py isn't covered. I think you don't need that anymore right since you refactored things?

No, it is used by apply_hilbert function. I can test

@pmyers16 pmyers16 self-assigned this Apr 6, 2021
@codecov-io
Copy link

Codecov Report

Merging #23 (f255b7e) into master (b9cc29c) will increase coverage by 1.26%.
The diff coverage is 46.03%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #23      +/-   ##
==========================================
+ Coverage   70.96%   72.23%   +1.26%     
==========================================
  Files          13       13              
  Lines        1054     1167     +113     
==========================================
+ Hits          748      843      +95     
- Misses        306      324      +18     
Impacted Files Coverage Δ
mne_hfo/detect.py 59.09% <38.88%> (+21.65%) ⬆️
mne_hfo/utils.py 51.59% <46.51%> (-12.34%) ⬇️
mne_hfo/base.py 74.85% <54.90%> (+1.13%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b9cc29c...f255b7e. Read the comment docs.

mne_hfo/base.py Outdated Show resolved Hide resolved
mne_hfo/base.py Outdated Show resolved Hide resolved
mne_hfo/base.py Outdated Show resolved Hide resolved
mne_hfo/base.py Outdated Show resolved Hide resolved
@adam2392 adam2392 changed the title [WIP] Initial implementation of Hilbert detector [MRG] Initial implementation of Hilbert detector Apr 6, 2021
@adam2392 adam2392 merged commit 91c8ed7 into master Apr 6, 2021
@adam2392 adam2392 deleted the Hilbert branch April 6, 2021 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants