--datalad option should treat output folder as a datalad superdataset, and each subject as a git submodule #237

ted-strauss-K1 · 2018-08-01T16:59:18Z

Currently when I use the --datalad option, and run the heuristic on a series of subject dicom folders, the heudiconv output generates a single datalad dataset (i.e. git repo). I would like to take advantage of datalad create's superdataset feature so that each subject will be treated as a distinct datalad dataset, that is a submodule to the parent directory. This would match the structure used by datasets.datalad.org.

Am I doing it wrong?

The text was updated successfully, but these errors were encountered:

yarikoptic · 2018-08-01T18:58:28Z

not "wrong" at all, and might be the only way in some cases! So we should keep this one open to provide such functionality (eventually).
But might not be needed/desired unless you are planing to have a dataset with a very significant (thousands?) of subjects. Operation with git submodules is getting better, and datalad already eases at many corners, but still, unless you have a very good use case for it, better to keep all subjects in the BIDS dataset within the same dataset/git repo. So what would be your use case?

ted-strauss-K1 · 2018-08-01T19:09:41Z

The use case is to copy the structure of https://github.com/datalad/datasets.datalad.org (as I understand it)

treat individual BIDS datasets as submodules within the superdataset,
users can datalad search aggregated metadata of the superdataset (which includes the metadata of constituents by -r option) to find the scans that they are looking for, then datalad get & install only those that match their search query.
We're definitely talking about hundreds, eventually thousands.

yarikoptic · 2018-08-01T23:50:38Z

yeap -- but there (at datasets.datalad.org) we do not have BIDS datasets with per-subject subdatasets. So far the only dataset we (will) need to do that is HCP where there is simply too much of data under each subject directory to keep all of them in the same repository.
So, currently with heudiconv we create DataLad datasets per each BIDS dataset, organizing them into a hierarchy based on (optional) "locator" provided in command line or by the heuristic as we do in reproin.

mattcieslak · 2018-08-03T17:52:41Z

Just to be sure - the recommended method is to have all subjects part of a single repository unless there are thousands of subjects? If there are thousands then each subject should be a submodule? I skipped the --datalad option when running heudiconv was wondering how to initialize a repository for the combined outputs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

--datalad option should treat output folder as a datalad superdataset, and each subject as a git submodule #237

--datalad option should treat output folder as a datalad superdataset, and each subject as a git submodule #237

ted-strauss-K1 commented Aug 1, 2018

yarikoptic commented Aug 1, 2018

ted-strauss-K1 commented Aug 1, 2018 •

edited

Loading

yarikoptic commented Aug 1, 2018

mattcieslak commented Aug 3, 2018

--datalad option should treat output folder as a datalad superdataset, and each subject as a git submodule #237

--datalad option should treat output folder as a datalad superdataset, and each subject as a git submodule #237

Comments

ted-strauss-K1 commented Aug 1, 2018

yarikoptic commented Aug 1, 2018

ted-strauss-K1 commented Aug 1, 2018 • edited Loading

yarikoptic commented Aug 1, 2018

mattcieslak commented Aug 3, 2018

ted-strauss-K1 commented Aug 1, 2018 •

edited

Loading