-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating data splitters for moabb evaluation #624
base: develop
Are you sure you want to change the base?
Conversation
moabb/evaluations/unified_eval.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add this file in another PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more documentations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job already :)
I left a few comments but didn’t look at unified_eval as you said you dropped it
moabb/evaluations/metasplitters.py
Outdated
sessions = metadata.session.unique() | ||
subjects = metadata.subject.unique() | ||
|
||
if len(runs) > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if len(runs)>1
then calib_size
is ignored.
Is this a desired behaviour @brunaafl @bruAristimunha ?
If yes, this must be very clear in the doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was what I intended, but I don't know if it is the best solution, thought.
Deleting unified_eval, so it can be addressed on another pr. Working on tests.
Adding: figures for documentation
Signed-off-by: Bruna Junqueira Lopes <[email protected]>
# Conflicts: # moabb/evaluations/metasplitters.py # moabb/tests/metasplits.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only reviewed the WithinSessionSplitter
. I think the PR is a little too big for easy review.
What I propose is that we do a first one with only this WithinSessionSplitter
, we fix everything for this one and then move on to the extra ones. This requires a little bit of copy paste but should not be too hard to do.
From what I see, I think we need to improve the management of random_state
to get reproducible splits. Another thing that would be nice is to support random order for the split (random on patient and sessions) but maybe it is more work that can be done after the first PR with the reproducible splitter.
Adding test on the reproducible order will be nice.
|
||
""" | ||
|
||
def __init__(self, n_folds=5): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing that would be nice is to add a shuffle
option to pass on these splits in a random order (random on the patients/session and folds).
This will allow to subsample a number of folds with diverse patient/session fi we don't want to do the full CV procedure.
assert isinstance(self.n_folds, int) | ||
|
||
subjects = metadata.subject.values | ||
cv = StratifiedKFold(n_splits=self.n_folds, shuffle=True, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to be able to set a random_state for each of this CV to be able to have reproducible splits.
I will help you @brunaafl |
Based on the Issue #612 (comment), I've created 3 data splitters related to each of the three types of moabb evaluation: WithinSubjectSplitter, CrossSessionSplitter, and CrossSubjectSplitter, defined in the file
splitters.py
, two evaluation splits: OfflineSplit and TimeSeriesSplit, and one meta-split, SamplerSplit, defined onmeta_splitters.py
file.For the intra-subject splitters (Within and CrossSession Splitters), I assumed that all data and metadata from all subjects was already known and loaded, which can maybe create a problem with the lazy loading done in this cases. Therefore, I based them on 'Individual' versions (IndividualWithin and IndividualCrossSession) that assume metadata from a specific subject.
I ended up creating two draft versions (Group and LazyEvaluation on
unified_eval.py
) of an evaluation integrating all modalities, with LazyEvaluation trying to maintain the loading of data on the intra-subjects evaluations just when needed. However, taking a look at #481 and #486, maybe it was not the best and easiest solution, so I stopped working on that since it may not be that useful.I'm working now on building the tests and refining and fixing the code a bit.