Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--datalad option should treat output folder as a datalad superdataset, and each subject as a git submodule #237

Open
ted-strauss-K1 opened this issue Aug 1, 2018 · 4 comments

Comments

@ted-strauss-K1
Copy link

Currently when I use the --datalad option, and run the heuristic on a series of subject dicom folders, the heudiconv output generates a single datalad dataset (i.e. git repo). I would like to take advantage of datalad create's superdataset feature so that each subject will be treated as a distinct datalad dataset, that is a submodule to the parent directory. This would match the structure used by datasets.datalad.org.

Am I doing it wrong?

@yarikoptic
Copy link
Member

not "wrong" at all, and might be the only way in some cases! So we should keep this one open to provide such functionality (eventually).
But might not be needed/desired unless you are planing to have a dataset with a very significant (thousands?) of subjects. Operation with git submodules is getting better, and datalad already eases at many corners, but still, unless you have a very good use case for it, better to keep all subjects in the BIDS dataset within the same dataset/git repo. So what would be your use case?

@ted-strauss-K1
Copy link
Author

ted-strauss-K1 commented Aug 1, 2018

The use case is to copy the structure of https://github.com/datalad/datasets.datalad.org (as I understand it)

  1. treat individual BIDS datasets as submodules within the superdataset,
  2. users can datalad search aggregated metadata of the superdataset (which includes the metadata of constituents by -r option) to find the scans that they are looking for, then datalad get & install only those that match their search query.
  3. We're definitely talking about hundreds, eventually thousands.

@yarikoptic
Copy link
Member

yeap -- but there (at datasets.datalad.org) we do not have BIDS datasets with per-subject subdatasets. So far the only dataset we (will) need to do that is HCP where there is simply too much of data under each subject directory to keep all of them in the same repository.
So, currently with heudiconv we create DataLad datasets per each BIDS dataset, organizing them into a hierarchy based on (optional) "locator" provided in command line or by the heuristic as we do in reproin.

@mattcieslak
Copy link

Just to be sure - the recommended method is to have all subjects part of a single repository unless there are thousands of subjects? If there are thousands then each subject should be a submodule? I skipped the --datalad option when running heudiconv was wondering how to initialize a repository for the combined outputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants