Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating data splitters for moabb evaluation #624

Open
wants to merge 29 commits into
base: develop
Choose a base branch
from

Conversation

brunaafl
Copy link
Collaborator

Based on the Issue #612 (comment), I've created 3 data splitters related to each of the three types of moabb evaluation: WithinSubjectSplitter, CrossSessionSplitter, and CrossSubjectSplitter, defined in the file splitters.py, two evaluation splits: OfflineSplit and TimeSeriesSplit, and one meta-split, SamplerSplit, defined on meta_splitters.py file.

For the intra-subject splitters (Within and CrossSession Splitters), I assumed that all data and metadata from all subjects was already known and loaded, which can maybe create a problem with the lazy loading done in this cases. Therefore, I based them on 'Individual' versions (IndividualWithin and IndividualCrossSession) that assume metadata from a specific subject.

I ended up creating two draft versions (Group and LazyEvaluation on unified_eval.py) of an evaluation integrating all modalities, with LazyEvaluation trying to maintain the loading of data on the intra-subjects evaluations just when needed. However, taking a look at #481 and #486, maybe it was not the best and easiest solution, so I stopped working on that since it may not be that useful.

I'm working now on building the tests and refining and fixing the code a bit.

Copy link
Collaborator

@bruAristimunha bruAristimunha Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add this file in another PR

Copy link
Collaborator

@bruAristimunha bruAristimunha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more documentations

Copy link
Collaborator

@PierreGtch PierreGtch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job already :)
I left a few comments but didn’t look at unified_eval as you said you dropped it

moabb/evaluations/metasplitters.py Outdated Show resolved Hide resolved
moabb/evaluations/metasplitters.py Outdated Show resolved Hide resolved
moabb/evaluations/metasplitters.py Outdated Show resolved Hide resolved
moabb/evaluations/metasplitters.py Outdated Show resolved Hide resolved
sessions = metadata.session.unique()
subjects = metadata.subject.unique()

if len(runs) > 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if len(runs)>1 then calib_size is ignored.

Is this a desired behaviour @brunaafl @bruAristimunha ?

If yes, this must be very clear in the doc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was what I intended, but I don't know if it is the best solution, thought.

moabb/evaluations/metasplitters.py Outdated Show resolved Hide resolved
moabb/evaluations/splitters.py Outdated Show resolved Hide resolved
moabb/evaluations/splitters.py Show resolved Hide resolved
moabb/evaluations/splitters.py Outdated Show resolved Hide resolved
moabb/evaluations/splitters.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@tomMoral tomMoral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only reviewed the WithinSessionSplitter. I think the PR is a little too big for easy review.
What I propose is that we do a first one with only this WithinSessionSplitter, we fix everything for this one and then move on to the extra ones. This requires a little bit of copy paste but should not be too hard to do.

From what I see, I think we need to improve the management of random_state to get reproducible splits. Another thing that would be nice is to support random order for the split (random on patient and sessions) but maybe it is more work that can be done after the first PR with the reproducible splitter.

Adding test on the reproducible order will be nice.


"""

def __init__(self, n_folds=5):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing that would be nice is to add a shuffle option to pass on these splits in a random order (random on the patients/session and folds).

This will allow to subsample a number of folds with diverse patient/session fi we don't want to do the full CV procedure.

assert isinstance(self.n_folds, int)

subjects = metadata.subject.values
cv = StratifiedKFold(n_splits=self.n_folds, shuffle=True, **kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be able to set a random_state for each of this CV to be able to have reproducible splits.

@bruAristimunha
Copy link
Collaborator

I will help you @brunaafl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants