Within Session splitter #664

brunaafl · 2024-10-18T11:03:21Z

This PR is a follow-up on PR #624 and is related to issue #612. It includes just the implementation for the WithinSessionSplitter data splitter.

Deleting unified_eval, so it can be addressed on another pr. Working on tests.

…ters

Adding: figures for documentation

Signed-off-by: Bruna Junqueira Lopes <[email protected]>

# Conflicts: # moabb/evaluations/metasplitters.py # moabb/tests/metasplits.py

Add shuffle and random_state parameters to WithinSession

bruAristimunha · 2024-10-18T17:43:43Z

docs/source/images/withinsess.png

better represent only one session @brunaafl

moabb/tests/splits.py

moabb/evaluations/splitters.py

# Conflicts: # moabb/tests/splits.py

# Conflicts: # moabb/evaluations/splitters.py

PierreGtch

Hi @brunaafl,
Thanks for this PR, it looks good!
I left one comment regarding a test I think you should add

PierreGtch · 2024-11-11T11:12:37Z

moabb/tests/splits.py

+@pytest.mark.parametrize("shuffle", [True, False])
+@pytest.mark.parametrize("random_state", [0, 42])
+def test_within_session(shuffle, random_state):
+    X, y, metadata = paradigm.get_data(dataset=dataset)


I think it is important to check if the split is the same when we load the data of one/a few subject(s) only, paradigm.get_data(dataset=dataset, subjects=[m, n...])

tomMoral

A frist batch of comments. Overall, it looks good but I think it could be improved by adding the possibility to pass in a cv object, that would allow to control the intrasession splits (for instance doing TimeSeriesSplit, which makes sense in an online setting).

moabb/evaluations/splitters.py

tomMoral · 2024-10-27T06:52:48Z

images/withinsess.png

why not using only the other version?

tomMoral · 2024-10-27T06:53:37Z

moabb/evaluations/splitters.py

+    Parameters
+    ----------
+    n_folds : int
+        Number of folds. Must be at least 2.


Suggested change

Number of folds. Must be at least 2.

Number of folds for the within-session k-fold split. Must be at least 2.

tomMoral · 2024-10-27T14:52:23Z

moabb/evaluations/splitters.py

+        random_state: int = 42,
+        shuffle_subjects: bool = False,
+        shuffle_session: bool = True,


Default random_state should be None. Also, convention would be to put it at the end of the argument list.

Suggested change

random_state: int = 42,

shuffle_subjects: bool = False,

shuffle_session: bool = True,

shuffle_subjects: bool = False,

shuffle_session: bool = True,

random_state: int = None,

moabb/evaluations/splitters.py

tomMoral · 2024-10-27T14:55:00Z

moabb/evaluations/splitters.py

+    shuffle_session : bool, default=True
+        Whether to shuffle each class's samples before splitting into batches.
+        Note that the samples within each split will not be shuffled.
+    shuffle_subjects : bool, default=False
+        Apply shuffle in mixing subjects and sessions, this parameter allows
+        sample iterations of the sppliter.


Do you think it is necessary to have both? I don't really see any use case where I would only use one no?
I would only have a shuffle

tomMoral · 2024-10-28T07:51:51Z

moabb/evaluations/splitters.py

+        self.n_folds = n_folds
+        self.shuffle_subjects = shuffle_subjects
+        self.shuffle_session = shuffle_session
+        self.random_state = check_random_state(random_state)


If you use it like that, this is not a random_state anymore but a rng.

moabb/evaluations/splitters.py

tomMoral · 2024-11-11T20:36:56Z

moabb/evaluations/splitters.py

+        for subject in subjects:
+            subject_mask = metadata.subject == subject
+            subject_indices = all_index[subject_mask]
+            subject_metadata = metadata[subject_mask]
+            sessions = subject_metadata.session.unique()
+
+            # Shuffle sessions if required
+            if self.shuffle_session:
+                self.random_state.shuffle(sessions)
+
+            for session in sessions:
+                session_mask = subject_metadata.session == session
+                indices = subject_indices[session_mask]
+                group_y = y[indices]
+
+                # Use StratifiedKFold with the group-specific random state
+                cv = StratifiedKFold(
+                    n_splits=self.n_folds,
+                    shuffle=self.shuffle_session,
+                    random_state=self.random_state,
+                )
+                for ix_train, ix_test in cv.split(indices, group_y):
+                    yield indices[ix_train], indices[ix_test]


We talk a bit with @sylvchev and I think the best would be to modify this, to take a cv object in the constructor (default would be StratifiedKFold), clone it with a different random seed for each group subject, session, and then yield the right indices.

That way, we can do a real shuffle, with shuffling the groups from which we retrieve the next split.
Would this make sense?

Make sense, I'm working on that, thanks!

great! Thank you so much! And thanks for your patience :)

Co-authored-by: Thomas Moreau <[email protected]> Signed-off-by: Bruna Junqueira Lopes <[email protected]>

Adding PseudoOnlineSplit (time series splitter) Fixing tests

brunaafl · 2024-11-29T10:56:12Z

Hi, @tomMoral, @bruAristimunha , @PierreGtch !
I'm so sorry for the delay, I had some other problems that made it difficult to pay more attention to the PR.

I added the functionality to pass a metasplitter, such as TimeSeries/PseudoOnline. The way I designed this object, metasplitter returns indexes for calibration and test sets. To ensure the splitter returns indexes for a train set also, if needed, I was wondering if we could always have StratifiedKFold to split train/test, and allow to pass PseudoOnline as an inner_cv to further split the test set into calibration and test if wanted.

PierreGtch

Hi @brunaafl, thanks for all the work you put in!! About the delay, this is all voluntary work, so no need to apologise :)

I added the functionality to pass a metasplitter, such as TimeSeries/PseudoOnline. The way I designed this object, metasplitter returns indexes for calibration and test sets. To ensure the splitter returns indexes for a train set also, if needed, I was wondering if we could always have StratifiedKFold to split train/test, and allow to pass PseudoOnline as an inner_cv to further split the test set into calibration and test if wanted.

I’m not sure I understand your question. What is the difference between the train and the calibration sets?

I also left a few comments on the code

moabb/evaluations/metasplitters.py

moabb/evaluations/splitters.py

PierreGtch · 2024-11-30T15:44:23Z

moabb/evaluations/splitters.py

+
+    def __init__(
+        self,
+        cv=StratifiedKFold,


In the scikit-learn framework, cv is a cross-validator object, not a class. I think it would be best to stick to it. This would avoid to instantiate it during the split call. You can have split=None by default and instantiate cv=StratifiedKFold() class in the __init__ method.
Also, you can check the cf argument with sklearn’s check_cv

I understand the concern, but I'm a bit unsure on how to implement it in the case shuffle=True, since I'm defining a different seed for each (subject, session). The suggestion is to instantiate cv in the init method in case split is not needed, and keep how it is being done otherwise?

Indeed it's not easy. But you could for example make a wrapper around StratifiedKFolds which would instantiate a different cv with a different seed for each subject/session.

Also I just noticed that at the moment, the seeds for each subject/session are chosen at random. We will not be able to have reproducible results this way. Instead, you could add a parameter global_seed to your wrapper and use, for each cv, random_state = global_seed + 10000*subject_number + session_number (it's safe to say we will never have 10000 sessions) if global_seed is an integer and none otherwise

…oss validators

# Conflicts: # moabb/evaluations/metasplitters.py # moabb/evaluations/splitters.py

brunaafl and others added 30 commits June 5, 2024 21:56

Creating new splitters and base evaluation

bacedc5

Adding metasplitters

419b2ca

Fixing LazyEvaluation

d6e795d

Merge branch 'NeuroTechX:develop' into eval_splitters

140670c

[pre-commit.ci] auto fixes from pre-commit.com hooks

d724674

More optimized version of TimeSeriesSplit

a278026

More optimized version of TimeSeriesSplit

300a6b9

[pre-commit.ci] auto fixes from pre-commit.com hooks

7cb79f6

Addressing some comments: documentation, types, inconsistencies

55db70f

[pre-commit.ci] auto fixes from pre-commit.com hooks

2851a15

Addressing some comments: optimizing code, adjusts

c73dd1a

Deleting unified_eval, so it can be addressed on another pr. Working on tests.

[pre-commit.ci] auto fixes from pre-commit.com hooks

2b0e735

Adding examples

cf4b709

[pre-commit.ci] auto fixes from pre-commit.com hooks

177bf65

Adding: Pytests for evaluation splitters, and examples for meta split…

a6b5772

…ters

Changing: name of TimeSeriesSplit to PseudoOnlineSplit

26b13d5

Adding: figures for documentation

Merge branch 'develop' into eval_splitters

e6661c4

Signed-off-by: Bruna Junqueira Lopes <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

430e3a8

Fixing pre-commit

698e539

Merge remote-tracking branch 'origin/eval_splitters' into eval_splitters

0fff053

[pre-commit.ci] auto fixes from pre-commit.com hooks

98d12ac

Adding some tests for metasplitters

558d27b

Merge remote-tracking branch 'origin/eval_splitters' into eval_splitters

34ea645

[pre-commit.ci] auto fixes from pre-commit.com hooks

b435bf8

Fixing pre-commit

d8f26a3

Merge remote-tracking branch 'origin/eval_splitters' into eval_splitters

eaf0fb9

# Conflicts: # moabb/evaluations/metasplitters.py # moabb/tests/metasplits.py

[pre-commit.ci] auto fixes from pre-commit.com hooks

e5159f2

Fixing pre-commit

516a5e8

Merge remote-tracking branch 'origin/eval_splitters' into eval_splitters

b29ecd2

Fix example SamplerSplit

37cff03

Add shuffle and random_state parameters to WithinSession

bruAristimunha added 3 commits October 18, 2024 19:40

FIX: fixing the import and docs/docstring

c181c59

FIX: fixing the import and docs/docstring

8f034c8

FIX: removing cross-session and cross-subject

fbef726

bruAristimunha reviewed Oct 18, 2024

View reviewed changes

docs/source/images/withinsess.png Outdated

Copy link

Collaborator

bruAristimunha Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better represent only one session @brunaafl

FIX: focus only in the within-session

837c061

bruAristimunha reviewed Oct 18, 2024

View reviewed changes

moabb/tests/splits.py Outdated Show resolved Hide resolved

bruAristimunha reviewed Oct 18, 2024

View reviewed changes

moabb/evaluations/splitters.py Show resolved Hide resolved

bruAristimunha and others added 8 commits October 19, 2024 10:41

Merge branch 'develop' into within_session

34822e9

Fix test

39e92e5

Merge remote-tracking branch 'origin/within_session' into within_session

612c6a6

# Conflicts: # moabb/tests/splits.py

[FIX] I think it is fixed.

590edb1

[FIX] shuffle everything

b151d61

Merge remote-tracking branch 'origin/within_session' into within_session

602ccd5

# Conflicts: # moabb/evaluations/splitters.py

Changing WithinSession image

74cf246

[pre-commit.ci] auto fixes from pre-commit.com hooks

c85928d

PierreGtch requested changes Nov 11, 2024

View reviewed changes

tomMoral reviewed Nov 11, 2024

View reviewed changes

brunaafl and others added 8 commits November 25, 2024 18:56

Update moabb/evaluations/splitters.py

f3020ee

Co-authored-by: Thomas Moreau <[email protected]> Signed-off-by: Bruna Junqueira Lopes <[email protected]>

Update moabb/evaluations/splitters.py

c628fd7

Co-authored-by: Thomas Moreau <[email protected]> Signed-off-by: Bruna Junqueira Lopes <[email protected]>

Update moabb/evaluations/splitters.py

83e425f

Co-authored-by: Thomas Moreau <[email protected]> Signed-off-by: Bruna Junqueira Lopes <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

a785ef7

Merge branch 'develop' into within_session

686572e

Adding possibility of passing a specific cv to do inner cv

8f8ada9

Adding PseudoOnlineSplit (time series splitter) Fixing tests

Merge remote-tracking branch 'origin/within_session' into within_session

b87cf25

[pre-commit.ci] auto fixes from pre-commit.com hooks

42e6b7b

PierreGtch reviewed Nov 30, 2024

View reviewed changes

brunaafl and others added 3 commits December 8, 2024 14:55

Changing metasplitter behaviour to have the same behavior as other cr…

f66878a

…oss validators

Merge remote-tracking branch 'origin/within_session' into within_session

71cebb1

# Conflicts: # moabb/evaluations/metasplitters.py # moabb/evaluations/splitters.py

[pre-commit.ci] auto fixes from pre-commit.com hooks

a948175

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Within Session splitter #664

Within Session splitter #664

brunaafl commented Oct 18, 2024

bruAristimunha Oct 18, 2024

PierreGtch left a comment

PierreGtch Nov 11, 2024

tomMoral left a comment

tomMoral Oct 27, 2024

tomMoral Oct 27, 2024

tomMoral Oct 27, 2024

tomMoral Oct 27, 2024

tomMoral Oct 28, 2024

tomMoral Nov 11, 2024

brunaafl Nov 14, 2024

tomMoral Nov 14, 2024

brunaafl commented Nov 29, 2024

PierreGtch left a comment

PierreGtch Nov 30, 2024

brunaafl Dec 10, 2024

PierreGtch Dec 10, 2024

	Number of folds. Must be at least 2.
	Number of folds for the within-session k-fold split. Must be at least 2.

Within Session splitter #664

Are you sure you want to change the base?

Within Session splitter #664

Conversation

brunaafl commented Oct 18, 2024

Choose a reason for hiding this comment

PierreGtch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomMoral left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brunaafl commented Nov 29, 2024

PierreGtch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment