Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Correct annotation onset for exportation to EDF and EEGLAB #12656

Open
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

qian-chu
Copy link
Contributor

@qian-chu qian-chu commented Jun 10, 2024

When cropping the start of a recording, raw.first_time is updated while annotations.onset is conveniently untouched. However, when exporting to another format where times are reset (starting from zero), annotations.onset should be corrected so that they represent relative time from the first sample.

This correction has been performed when fmt=‘brainvision’:

if events is not None:
# subtract raw.first_samp because brainvision marks events starting from the
# first available data point and ignores the raw.first_samp
assert isinstance(events, np.ndarray), msg
assert events.ndim == 2, msg
assert events.shape[-1] == 3, msg
events[:, 0] -= raw.first_samp
events = events[:, [0, 2]] # reorder for pybv required order

But is curiously missing when fmt=‘edf’ or fmt=‘eeglab’:

annotations = []
for desc, onset, duration, ch_names in zip(
raw.annotations.description,
raw.annotations.onset,
raw.annotations.duration,
raw.annotations.ch_names,
):
if ch_names:
for ch_name in ch_names:
annotations.append(
EdfAnnotation(onset, duration, desc + f"@@{ch_name}")
)
else:
annotations.append(EdfAnnotation(onset, duration, desc))

annotations = [
raw.annotations.description,
raw.annotations.onset,
raw.annotations.duration,
]

This PR aims to fix this by performing the similar correction (annotations.onset - raw.first_time

Comment on lines 125 to 159
@pytest.mark.parametrize("tmin", (0, 1, 5, 10))
def test_export_raw_eeglab_annotations(tmp_path, tmin):
"""Test that exporting EEGLAB preserves annotations and corects for raw.first_time."""
pytest.importorskip("eeglabio")
raw = read_raw_fif(fname_raw, preload=True)
raw.apply_proj()
annotations = Annotations(
onset=[0.01, 0.05, 0.90, 1.05],
duration=[0, 1, 0, 0],
description=["test1", "test2", "test3", "test4"],
ch_names=[["MEG 0113"], ["MEG 0113", "MEG 0132"], [], ["MEG 0143"]],
)
raw.set_annotations(annotations)
raw.crop(tmin)

# export
temp_fname = tmp_path / "test.set"
raw.export(temp_fname)

# read in the file
with pytest.warns(RuntimeWarning, match="is above the 99th percentile"):
raw_read = read_raw_eeglab(temp_fname, preload=True, montage_units="m")
assert raw_read.first_time == 0

valid_annot = raw.annotations.onset >= tmin
assert_array_almost_equal(
raw.annotations.onset[valid_annot] - raw.first_time,
raw_read.annotations.onset - raw_read.first_time,
)
assert_array_equal(
raw.annotations.duration[valid_annot], raw_read.annotations.duration
)
assert_array_equal(
raw.annotations.description[valid_annot], raw_read.annotations.description
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually should this annotation test for EEGLAB been written before, the bug should be noticable because the test Raw has actually been cropped. The saved onset would not correspond to the original onset.

@qian-chu qian-chu changed the title Correct annotation onset for exportation to EDF and EEGLAB [BUG] Correct annotation onset for exportation to EDF and EEGLAB Jun 10, 2024
Comment on lines 277 to 282
if tmin % 1 == 0:
expectation = nullcontext()
else:
expectation = pytest.warns(
RuntimeWarning, match="EDF format requires equal-length data blocks"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this check doing? If you are checking for tmin to be an integer, you could also use tmin.is_integer(), but is this what is required to have "equal-length data blocks"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the constructed raw signal is 2 sec long and edfio can segment it into 2 data records of 1 sec. If a non-integer amount of time is cropped, then the signal is no longer a multiple of 1 sec and edfio will append zeroes and issue a RuntimeWarning. Maybe this should have been a test on its own but I'm adding it here since pytest wouldn't pass me otherwise.

As for the %1 == 0 condition, I was thinking to make space for more flexible use should I know how edfio determines data record length. For example if one can specify a data record length of .5 or 2 s, then the statement can be replaced with %data_length == 0. But I agree it looks uncessary in its current form.

@cbrnr
Copy link
Contributor

cbrnr commented Jun 11, 2024

When cropping the start of a recording, raw.first_time is corrected while annotations.onset is conveniently untouched.

Hmmm, really? Annotations are not affected by cropping? Why would that be convenient?

@qian-chu
Copy link
Contributor Author

When cropping the start of a recording, raw.first_time is corrected while annotations.onset is conveniently untouched.

Hmmm, really? Annotations are not affected by cropping? Why would that be convenient?

At least when annotations.orig_time is not None:

mne-python/mne/io/base.py

Lines 1560 to 1570 in e2c8010

annotations = self.annotations
# now call setter to filter out annotations outside of interval
if annotations.orig_time is None:
assert self.info["meas_date"] is None
# When self.info['meas_date'] is None (which is guaranteed if
# self.annotations.orig_time is None), when we do the
# self.set_annotations, it's assumed that the annotations onset
# are relative to first_time, so we have to subtract it, then
# set_annotations will put it back.
annotations.onset -= self.first_time
self.set_annotations(annotations, False)

So that when annotations have their own time reference, cropping the data wouldn't affect them.

Actually this is a good reminder that we might need to account for different annotations.orig_time. The onset correction should have been corrected when annotations.orig_time is None. Will do a fix later.

@cbrnr
Copy link
Contributor

cbrnr commented Jun 11, 2024

To be honest, I didn't even know that annotations work like that. I always thought that annotations.onset are onsets in seconds relative to the start of the data. So I expected that cropping should shift the onsets in general, not just for export. If that was the case, we would not have to deal with correcting onsets on export in the first place.

@qian-chu
Copy link
Contributor Author

To be honest, I didn't even know that annotations work like that. I always thought that annotations.onset are onsets in seconds relative to the start of the data. So I expected that cropping should shift the onsets in general, not just for export. If that was the case, we would not have to deal with correcting onsets on export in the first place.

I'm not too sure how that will work out. Do you mean that cropping should always reset annotations.onset and then force annotations.orig_time=None?

----------- meas_date=XX, orig_time=YY -----------------------------

     |              +------------------+
     |______________|     RAW          |
     |              |                  |
     |              +------------------+
 meas_date      first_samp
     .
     .         |         +------+
     .         |_________| ANOT |
     .         |         |      |
     .         |         +------+
     .     orig_time   onset[0]
     .
     |                   +------+
     |___________________|      |
     |                   |      |
     |                   +------+
 orig_time            onset[0]'

----------- meas_date=XX, orig_time=None ---------------------------

     |              +------------------+
     |______________|     RAW          |
     |              |                  |
     |              +------------------+
     .              N         +------+
     .              o_________| ANOT |
     .              n         |      |
     .              e         +------+
     .
     |                        +------+
     |________________________|      |
     |                        |      |
     |                        +------+
 orig_time                 onset[0]'

----------- meas_date=None, orig_time=YY ---------------------------

     N              +------------------+
     o______________|     RAW          |
     n              |                  |
     e              +------------------+
               |         +------+
               |_________| ANOT |
               |         |      |
               |         +------+

            [[[ CRASH ]]]

----------- meas_date=None, orig_time=None -------------------------

     N              +------------------+
     o______________|     RAW          |
     n              |                  |
     e              +------------------+
     .              N         +------+
     .              o_________| ANOT |
     .              n         |      |
     .              e         +------+
     .
     N                        +------+
     o________________________|      |
     n                        |      |
     e                        +------+
 orig_time                 onset[0]'

@cbrnr
Copy link
Contributor

cbrnr commented Jul 3, 2024

Maybe it's just me, but I gave up trying to understand how this works. The ASCII diagram is probably meant to be helpful, but for me it is the complete opposite, I have no idea how these different concepts (meas_date, orig_time, first_samp, and whatnot) actually work, sorry.

@hoechenberger
Copy link
Member

hoechenberger commented Jul 3, 2024

I agree, I've tried several times over the past couple of years to decipher what it's trying to tell me and at one point just gave up. It's just been trial and error for me regarding all things annotations ever since 😅

@qian-chu
Copy link
Contributor Author

qian-chu commented Jul 3, 2024

Maybe it's just me, but I gave up trying to understand how this works. The ASCII diagram is probably meant to be helpful, but for me it is the complete opposite, I have no idea how these different concepts (meas_date, orig_time, first_samp, and whatnot) actually work, sorry.

It's definitely OK! As I'm re-looking at this PR after some time I'm also struggling to wrap my head around this system. FYI this diagram was copied from https://mne.tools/dev/generated/mne.Annotations.html.

One potential conflict I found is, the diagram says when meas_date=None, orig_time=YY it should result in error, yet in crop() it asserts the following:

mne-python/mne/io/base.py

Lines 1562 to 1563 in 4954672

if annotations.orig_time is None:
assert self.info["meas_date"] is None

instead of

if self.info["meas_date"] is None:
     assert annotations.orig_time is None

If someone who's familiar with the design can clarify that would be great. But I do confirm that the EDF and EEGLAB export will malfunction without correcting for first_time so eventually we would want this fix.

@qian-chu
Copy link
Contributor Author

Coming back to this issue after some time, I re-confirmed the existence of the problem with a minimalist code:

import numpy as np
from mne import create_info, Annotations
from mne.io import RawArray, read_raw_brainvision, read_raw_edf, read_raw_eeglab
from mne.viz import set_browser_backend

set_browser_backend('qt')

# Create a raw object of SR 1000 Hz, all zero, except for 1s of 1e-6 from 2-3s
data = np.zeros((1, 5000))
data[0, 2000:3000] = 1
scalings = dict(eeg=1)

info = create_info(['CH1'], 1000, ['eeg'])
raw_orig = RawArray(data, info)

annot = Annotations(onset=[2], duration=[1], description=['stim'])
raw_orig.set_annotations(annot)

# Crop raw to 1-5s
raw_orig.crop(1)
fig_orig = raw_orig.plot(scalings=scalings)
fig_orig.grab().save('orig.png')

# Export to BrainVision and re-read
raw_orig.export('test.vhdr')
raw_brainvision = read_raw_brainvision('test.vhdr')
fig_brainvision = raw_brainvision.plot(scalings=scalings)
fig_brainvision.grab().save('brainvision.png')

# Export to EDF and re-read
raw_orig.export('test.edf')
raw_edf = read_raw_edf('test.edf')
fig_edf = raw_edf.plot(scalings=scalings)
fig_edf.grab().save('edf.png')

# Export to EEGLAB and re-read
raw_orig.export('test.set')
raw_eeglab = read_raw_eeglab('test.set')
fig_eeglab = raw_eeglab.plot(scalings=scalings)
fig_eeglab.grab().save('eeglab.png')

Outputs using the current main of MNE

Original raw array

orig

BrainVision (functions properly)

brainvision

EDF

edf

EEGLAB

eeglab

Outputs using the PR branch

Original raw array

orig

BrainVision

brainvision

EDF

edf

EEGLAB

eeglab

@cbrnr
Copy link
Contributor

cbrnr commented Dec 13, 2024

Very nice, thanks for this example, this makes the issue really easy to see! Could you take a look at the failing tests?

raw.annotations.onset,
# subtract raw.first_time because EDF marks events starting from the first
# available data point and ignores raw.first_time
_sync_onset(raw, raw.annotations.onset, inverse=False),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also re-wrote the tests and they are passed without an issue now :D From my side it's in principle good to go.

One final note though is I'm using _sync_onset, which in addition to performing annot_start = onset - raw._first_time also assert raw.info["meas_date"] == raw.annotations.orig_time. Therefore, this would enforce users to export only Raw that has identical meas_date and orig_time.

@qian-chu qian-chu requested a review from cbrnr December 16, 2024 18:23
@cbrnr
Copy link
Contributor

cbrnr commented Jan 3, 2025

This looks good to me, just one last comment: do you think it makes sense to update the export code for BrainVision so that all formats consistently use _sync_onset? I think it would be great to be as consistent as possible.

For merging, I'd appreciate if @larsoner and/or @drammock could have a final look.

@qian-chu
Copy link
Contributor Author

qian-chu commented Jan 3, 2025

This looks good to me, just one last comment: do you think it makes sense to update the export code for BrainVision so that all formats consistently use _sync_onset? I think it would be great to be as consistent as possible.

For merging, I'd appreciate if @larsoner and/or @drammock could have a final look.

It might be a bit tricky as _mne_annots2pybv_events() loops over annotations instead of taking the numpy arrays individually:

def _mne_annots2pybv_events(raw):
"""Convert mne Annotations to pybv events."""
events = []
for annot in raw.annotations:
# handle onset and duration: seconds to sample, relative to
# raw.first_samp / raw.first_time
onset = annot["onset"] - raw.first_time
onset = raw.time_as_index(onset).astype(int)[0]
duration = int(annot["duration"] * raw.info["sfreq"])

One would need to change the code structure if _sync_onset() is to be used. However, we can simply add assert raw.info["meas_date"] == raw.annotations.orig_time to impose the same requirement as on other formats. What do you think?

Also I can add a note to %(export_warning_note_raw)s to document the new requirement.

@cbrnr
Copy link
Contributor

cbrnr commented Jan 3, 2025

OK, if it's too complicated then don't bother trying to use _sync_onset(). But yes, good idea to add an assert and documentation (and maybe a comment explaining why we're not using _sync_onset() here)!

@drammock drammock enabled auto-merge (squash) January 8, 2025 17:51
auto-merge was automatically disabled January 8, 2025 18:57

Head branch was pushed to by a user without write access

Comment on lines 1526 to 1534
docdict["export_warning_annotations_raw"] = """\
.. warning::
When exporting ``Raw`` with annotations, ``raw.info["meas_date"]`` must be the same
as ``raw.annotations.orig_time``. This guarantees that the annotations are in the
same reference frame as the samples.
When `Raw.first_time` is not zero (e.g., after cropping), the onsets are
automatically corrected so that onsets are always relative to the first sample.
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it is necessary to add a docdict entry if this entry is only used exactly once. Please move that text directly into the docstring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just did. Let me know how it looks to you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants