Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Correct annotation onset for exportation to EDF and EEGLAB #12656

Merged
merged 65 commits into from
Jan 21, 2025

Conversation

qian-chu
Copy link
Contributor

@qian-chu qian-chu commented Jun 10, 2024

When cropping the start of a recording, raw.first_time is updated while annotations.onset is conveniently untouched. However, when exporting to another format where times are reset (starting from zero), annotations.onset should be corrected so that they represent relative time from the first sample.

This correction has been performed when fmt=‘brainvision’:

if events is not None:
# subtract raw.first_samp because brainvision marks events starting from the
# first available data point and ignores the raw.first_samp
assert isinstance(events, np.ndarray), msg
assert events.ndim == 2, msg
assert events.shape[-1] == 3, msg
events[:, 0] -= raw.first_samp
events = events[:, [0, 2]] # reorder for pybv required order

But is curiously missing when fmt=‘edf’ or fmt=‘eeglab’:

annotations = []
for desc, onset, duration, ch_names in zip(
raw.annotations.description,
raw.annotations.onset,
raw.annotations.duration,
raw.annotations.ch_names,
):
if ch_names:
for ch_name in ch_names:
annotations.append(
EdfAnnotation(onset, duration, desc + f"@@{ch_name}")
)
else:
annotations.append(EdfAnnotation(onset, duration, desc))

annotations = [
raw.annotations.description,
raw.annotations.onset,
raw.annotations.duration,
]

This PR aims to fix this by performing the similar correction (annotations.onset - raw.first_time

Comment on lines 125 to 165
@pytest.mark.parametrize("tmin", (0, 1, 5, 10))
def test_export_raw_eeglab_annotations(tmp_path, tmin):
"""Test that exporting EEGLAB preserves annotations and corects for raw.first_time."""
pytest.importorskip("eeglabio")
raw = read_raw_fif(fname_raw, preload=True)
raw.apply_proj()
annotations = Annotations(
onset=[0.01, 0.05, 0.90, 1.05],
duration=[0, 1, 0, 0],
description=["test1", "test2", "test3", "test4"],
ch_names=[["MEG 0113"], ["MEG 0113", "MEG 0132"], [], ["MEG 0143"]],
)
raw.set_annotations(annotations)
raw.crop(tmin)

# export
temp_fname = tmp_path / "test.set"
raw.export(temp_fname)

# read in the file
with pytest.warns(RuntimeWarning, match="is above the 99th percentile"):
raw_read = read_raw_eeglab(temp_fname, preload=True, montage_units="m")
assert raw_read.first_time == 0

valid_annot = raw.annotations.onset >= tmin
assert_array_almost_equal(
raw.annotations.onset[valid_annot] - raw.first_time,
raw_read.annotations.onset - raw_read.first_time,
)
assert_array_equal(
raw.annotations.duration[valid_annot], raw_read.annotations.duration
)
assert_array_equal(
raw.annotations.description[valid_annot], raw_read.annotations.description
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually should this annotation test for EEGLAB been written before, the bug should be noticable because the test Raw has actually been cropped. The saved onset would not correspond to the original onset.

@qian-chu qian-chu changed the title Correct annotation onset for exportation to EDF and EEGLAB [BUG] Correct annotation onset for exportation to EDF and EEGLAB Jun 10, 2024
Comment on lines 277 to 327
if tmin % 1 == 0:
expectation = nullcontext()
else:
expectation = pytest.warns(
RuntimeWarning, match="EDF format requires equal-length data blocks"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this check doing? If you are checking for tmin to be an integer, you could also use tmin.is_integer(), but is this what is required to have "equal-length data blocks"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the constructed raw signal is 2 sec long and edfio can segment it into 2 data records of 1 sec. If a non-integer amount of time is cropped, then the signal is no longer a multiple of 1 sec and edfio will append zeroes and issue a RuntimeWarning. Maybe this should have been a test on its own but I'm adding it here since pytest wouldn't pass me otherwise.

As for the %1 == 0 condition, I was thinking to make space for more flexible use should I know how edfio determines data record length. For example if one can specify a data record length of .5 or 2 s, then the statement can be replaced with %data_length == 0. But I agree it looks uncessary in its current form.

@cbrnr
Copy link
Contributor

cbrnr commented Jun 11, 2024

When cropping the start of a recording, raw.first_time is corrected while annotations.onset is conveniently untouched.

Hmmm, really? Annotations are not affected by cropping? Why would that be convenient?

@qian-chu
Copy link
Contributor Author

When cropping the start of a recording, raw.first_time is corrected while annotations.onset is conveniently untouched.

Hmmm, really? Annotations are not affected by cropping? Why would that be convenient?

At least when annotations.orig_time is not None:

mne-python/mne/io/base.py

Lines 1560 to 1570 in e2c8010

annotations = self.annotations
# now call setter to filter out annotations outside of interval
if annotations.orig_time is None:
assert self.info["meas_date"] is None
# When self.info['meas_date'] is None (which is guaranteed if
# self.annotations.orig_time is None), when we do the
# self.set_annotations, it's assumed that the annotations onset
# are relative to first_time, so we have to subtract it, then
# set_annotations will put it back.
annotations.onset -= self.first_time
self.set_annotations(annotations, False)

So that when annotations have their own time reference, cropping the data wouldn't affect them.

Actually this is a good reminder that we might need to account for different annotations.orig_time. The onset correction should have been corrected when annotations.orig_time is None. Will do a fix later.

@cbrnr
Copy link
Contributor

cbrnr commented Jun 11, 2024

To be honest, I didn't even know that annotations work like that. I always thought that annotations.onset are onsets in seconds relative to the start of the data. So I expected that cropping should shift the onsets in general, not just for export. If that was the case, we would not have to deal with correcting onsets on export in the first place.

@qian-chu
Copy link
Contributor Author

To be honest, I didn't even know that annotations work like that. I always thought that annotations.onset are onsets in seconds relative to the start of the data. So I expected that cropping should shift the onsets in general, not just for export. If that was the case, we would not have to deal with correcting onsets on export in the first place.

I'm not too sure how that will work out. Do you mean that cropping should always reset annotations.onset and then force annotations.orig_time=None?

----------- meas_date=XX, orig_time=YY -----------------------------

     |              +------------------+
     |______________|     RAW          |
     |              |                  |
     |              +------------------+
 meas_date      first_samp
     .
     .         |         +------+
     .         |_________| ANOT |
     .         |         |      |
     .         |         +------+
     .     orig_time   onset[0]
     .
     |                   +------+
     |___________________|      |
     |                   |      |
     |                   +------+
 orig_time            onset[0]'

----------- meas_date=XX, orig_time=None ---------------------------

     |              +------------------+
     |______________|     RAW          |
     |              |                  |
     |              +------------------+
     .              N         +------+
     .              o_________| ANOT |
     .              n         |      |
     .              e         +------+
     .
     |                        +------+
     |________________________|      |
     |                        |      |
     |                        +------+
 orig_time                 onset[0]'

----------- meas_date=None, orig_time=YY ---------------------------

     N              +------------------+
     o______________|     RAW          |
     n              |                  |
     e              +------------------+
               |         +------+
               |_________| ANOT |
               |         |      |
               |         +------+

            [[[ CRASH ]]]

----------- meas_date=None, orig_time=None -------------------------

     N              +------------------+
     o______________|     RAW          |
     n              |                  |
     e              +------------------+
     .              N         +------+
     .              o_________| ANOT |
     .              n         |      |
     .              e         +------+
     .
     N                        +------+
     o________________________|      |
     n                        |      |
     e                        +------+
 orig_time                 onset[0]'

@cbrnr
Copy link
Contributor

cbrnr commented Jul 3, 2024

Maybe it's just me, but I gave up trying to understand how this works. The ASCII diagram is probably meant to be helpful, but for me it is the complete opposite, I have no idea how these different concepts (meas_date, orig_time, first_samp, and whatnot) actually work, sorry.

@hoechenberger
Copy link
Member

hoechenberger commented Jul 3, 2024

I agree, I've tried several times over the past couple of years to decipher what it's trying to tell me and at one point just gave up. It's just been trial and error for me regarding all things annotations ever since 😅

@qian-chu
Copy link
Contributor Author

qian-chu commented Jul 3, 2024

Maybe it's just me, but I gave up trying to understand how this works. The ASCII diagram is probably meant to be helpful, but for me it is the complete opposite, I have no idea how these different concepts (meas_date, orig_time, first_samp, and whatnot) actually work, sorry.

It's definitely OK! As I'm re-looking at this PR after some time I'm also struggling to wrap my head around this system. FYI this diagram was copied from https://mne.tools/dev/generated/mne.Annotations.html.

One potential conflict I found is, the diagram says when meas_date=None, orig_time=YY it should result in error, yet in crop() it asserts the following:

mne-python/mne/io/base.py

Lines 1562 to 1563 in 4954672

if annotations.orig_time is None:
assert self.info["meas_date"] is None

instead of

if self.info["meas_date"] is None:
     assert annotations.orig_time is None

If someone who's familiar with the design can clarify that would be great. But I do confirm that the EDF and EEGLAB export will malfunction without correcting for first_time so eventually we would want this fix.

@qian-chu
Copy link
Contributor Author

Coming back to this issue after some time, I re-confirmed the existence of the problem with a minimalist code:

import numpy as np
from mne import create_info, Annotations
from mne.io import RawArray, read_raw_brainvision, read_raw_edf, read_raw_eeglab
from mne.viz import set_browser_backend

set_browser_backend('qt')

# Create a raw object of SR 1000 Hz, all zero, except for 1s of 1e-6 from 2-3s
data = np.zeros((1, 5000))
data[0, 2000:3000] = 1
scalings = dict(eeg=1)

info = create_info(['CH1'], 1000, ['eeg'])
raw_orig = RawArray(data, info)

annot = Annotations(onset=[2], duration=[1], description=['stim'])
raw_orig.set_annotations(annot)

# Crop raw to 1-5s
raw_orig.crop(1)
fig_orig = raw_orig.plot(scalings=scalings)
fig_orig.grab().save('orig.png')

# Export to BrainVision and re-read
raw_orig.export('test.vhdr')
raw_brainvision = read_raw_brainvision('test.vhdr')
fig_brainvision = raw_brainvision.plot(scalings=scalings)
fig_brainvision.grab().save('brainvision.png')

# Export to EDF and re-read
raw_orig.export('test.edf')
raw_edf = read_raw_edf('test.edf')
fig_edf = raw_edf.plot(scalings=scalings)
fig_edf.grab().save('edf.png')

# Export to EEGLAB and re-read
raw_orig.export('test.set')
raw_eeglab = read_raw_eeglab('test.set')
fig_eeglab = raw_eeglab.plot(scalings=scalings)
fig_eeglab.grab().save('eeglab.png')

Outputs using the current main of MNE

Original raw array

orig

BrainVision (functions properly)

brainvision

EDF

edf

EEGLAB

eeglab

Outputs using the PR branch

Original raw array

orig

BrainVision

brainvision

EDF

edf

EEGLAB

eeglab

@cbrnr
Copy link
Contributor

cbrnr commented Dec 13, 2024

Very nice, thanks for this example, this makes the issue really easy to see! Could you take a look at the failing tests?

@qian-chu qian-chu requested a review from cbrnr December 16, 2024 18:23
@cbrnr
Copy link
Contributor

cbrnr commented Jan 3, 2025

This looks good to me, just one last comment: do you think it makes sense to update the export code for BrainVision so that all formats consistently use _sync_onset? I think it would be great to be as consistent as possible.

For merging, I'd appreciate if @larsoner and/or @drammock could have a final look.

@qian-chu
Copy link
Contributor Author

qian-chu commented Jan 3, 2025

This looks good to me, just one last comment: do you think it makes sense to update the export code for BrainVision so that all formats consistently use _sync_onset? I think it would be great to be as consistent as possible.

For merging, I'd appreciate if @larsoner and/or @drammock could have a final look.

It might be a bit tricky as _mne_annots2pybv_events() loops over annotations instead of taking the numpy arrays individually:

def _mne_annots2pybv_events(raw):
"""Convert mne Annotations to pybv events."""
events = []
for annot in raw.annotations:
# handle onset and duration: seconds to sample, relative to
# raw.first_samp / raw.first_time
onset = annot["onset"] - raw.first_time
onset = raw.time_as_index(onset).astype(int)[0]
duration = int(annot["duration"] * raw.info["sfreq"])

One would need to change the code structure if _sync_onset() is to be used. However, we can simply add assert raw.info["meas_date"] == raw.annotations.orig_time to impose the same requirement as on other formats. What do you think?

Also I can add a note to %(export_warning_note_raw)s to document the new requirement.

@cbrnr
Copy link
Contributor

cbrnr commented Jan 3, 2025

OK, if it's too complicated then don't bother trying to use _sync_onset(). But yes, good idea to add an assert and documentation (and maybe a comment explaining why we're not using _sync_onset() here)!

@drammock drammock enabled auto-merge (squash) January 8, 2025 17:51
auto-merge was automatically disabled January 8, 2025 18:57

Head branch was pushed to by a user without write access

Comment on lines 1526 to 1534
docdict["export_warning_annotations_raw"] = """\
.. warning::
When exporting ``Raw`` with annotations, ``raw.info["meas_date"]`` must be the same
as ``raw.annotations.orig_time``. This guarantees that the annotations are in the
same reference frame as the samples.
When `Raw.first_time` is not zero (e.g., after cropping), the onsets are
automatically corrected so that onsets are always relative to the first sample.
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it is necessary to add a docdict entry if this entry is only used exactly once. Please move that text directly into the docstring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just did. Let me know how it looks to you!

@qian-chu
Copy link
Contributor Author

Hi everyone, is there anything else needed for this PR to be completed? Thanks!

@cbrnr
Copy link
Contributor

cbrnr commented Jan 20, 2025

I think this is good to go, but can you rebase please so that all tests pass?

larsoner and others added 13 commits January 20, 2025 11:07
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Larson <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Larson <[email protected]>
…s#13069)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Larson <[email protected]>
… Principal Component Analysis - Optimal Basis Sets (PCA-OBS) algorithm (mne-tools#13037)

Co-authored-by: Emma Bailey <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Steinn Magnusson <[email protected]>
Co-authored-by: Eric Larson <[email protected]>
Co-authored-by: emma-bailey <[email protected]>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel McCloy <[email protected]>
@qian-chu
Copy link
Contributor Author

Oops, seems I've triggered a grand review request through a rookie rebase... Sorry for bugging everyone

Copy link
Member

@drammock drammock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @cbrnr feel free to merge if you're happy!

@cbrnr cbrnr merged commit 3c6a054 into mne-tools:main Jan 21, 2025
29 checks passed
@cbrnr
Copy link
Contributor

cbrnr commented Jan 21, 2025

Thanks @qian-chu, great contribution!

@qian-chu
Copy link
Contributor Author

Thanks everyone!

@qian-chu qian-chu deleted the correct_annot_export branch January 21, 2025 20:49
larsoner added a commit to larsoner/mne-python that referenced this pull request Jan 24, 2025
* upstream/main: (57 commits)
  Allow lasso selection sensors in a plot_evoked_topo (mne-tools#12071)
  MAINT: Fix doc build (mne-tools#13076)
  BUG: Improve sklearn compliance (mne-tools#13065)
  [pre-commit.ci] pre-commit autoupdate (mne-tools#13073)
  MAINT: Add Numba to 3.13 test (mne-tools#13075)
  Bump autofix-ci/action from ff86a557419858bb967097bfc916833f5647fa8c to 551dded8c6cc8a1054039c8bc0b8b48c51dfc6ef in the actions group (mne-tools#13071)
  [BUG] Correct annotation onset for exportation to EDF and EEGLAB (mne-tools#12656)
  New feature for removing heart artifacts from EEG or ESG data using a Principal Component Analysis - Optimal Basis Sets (PCA-OBS) algorithm (mne-tools#13037)
  [BUG] Fix taper weighting in computation of TFR multitaper power (mne-tools#13067)
  [FIX] Reading an EDF with preload=False and mixed frequency (mne-tools#13069)
  Fix evoked topomap colorbars, closes mne-tools#13050 (mne-tools#13063)
  [pre-commit.ci] pre-commit autoupdate (mne-tools#13060)
  BUG: Fix bug with interval calculation (mne-tools#13062)
  [DOC] extend documentation for add_channels (mne-tools#13051)
  Add `combine_tfr` to API (mne-tools#13054)
  Add `combine_spectrum()` function and allow `grand_average()` to support `Spectrum` data (mne-tools#13058)
  BUG: Fix bug with helium anon (mne-tools#13056)
  [ENH] Add option to store and return TFR taper weights (mne-tools#12910)
  BUG: viz plot window's 'title' argument showed no effect. (mne-tools#12828)
  MAINT: Ensure limited set of tests are skipped (mne-tools#13053)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.