Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset management module contents #10

Closed
mslw opened this issue Oct 21, 2021 · 6 comments
Closed

Dataset management module contents #10

mslw opened this issue Oct 21, 2021 · 6 comments
Labels
content discussion Discussion regarding course content

Comments

@mslw
Copy link
Contributor

mslw commented Oct 21, 2021

Summary:

  • This would be on day 2 (so 3rd or 4th module of the workshop).
  • The original idea was to include dataset nesting and datalad run, but run fits better in the 1st module (List of modules #9)
  • Within this module we could (rough idea - I don't have a clear vision)
    • talk about YODA
    • walk through an example of dataset nesting (what would be the example?)
    • maybe use the time to introduce datalad containers run?
    • maybe limit the contents to allow some time for questions?

Questions:

  • Should this come before or after the data publishing / consumption module?
  • What should be the contents?
@mslw mslw added the content discussion Discussion regarding course content label Oct 21, 2021
@adswa
Copy link
Contributor

adswa commented Oct 26, 2021

walk through an example of dataset nesting (what would be the example?)

I like to use actual published analyses that follow the YODA principles, eg https://github.com/lnnrtwttkhn/highspeed-analysis

Should this come before or after the data publishing / consumption module?

I have a slight preference for after, because consumption is a very immediate, easy benefit, and having installed a dataset primes for installing datasets as subdatasets and the differences when its a dataset hierarchy.

@mslw
Copy link
Contributor Author

mslw commented Dec 14, 2021

I think i can write part of the module around exploring the lightspeed-analysis dataset and installing subdatasts with datalad get --no-data, but I'd also like to include a toy example of building such a nested dataset from ground up, similar to what's done in the first module (probably with create rather than clone), but I don't have a good idea yet - can you suggest something?

Other than that, I think I could use some help with the remainder of this module. Perhaps some space for more general concepts?

@adswa
Copy link
Contributor

adswa commented Dec 14, 2021

Maybe it could use https://github.com/datalad-handbook/repro-paper-sketch/, a template that @m-wierzba and I once created. In addition to nesting a dataset for analyses, its also about reproducible manuscripts. It could be rewritten to not rely on make, or to use containers-run in addition. But thats just a quick brainstorming - feel free to ignore if that's out of scope.

@mslw
Copy link
Contributor Author

mslw commented Dec 14, 2021

Maybe it could use https://github.com/datalad-handbook/repro-paper-sketch/, a template that @m-wierzba and I once created. In addition to nesting a dataset for analyses, its also about reproducible manuscripts.

Just to make sure - you mean building something like this from scratch? I'll take a closer look.

It could be rewritten to not rely on make, or to use containers-run in addition

Good idea. I'll need to work through that to see how long it may take. I'm also tempted by containers run, but sometimes less is more.

But thats just a quick brainstorming - feel free to ignore if that's out of scope.

That's what we need - I feel there's some space left for something other than datalad create -d . something and datalad get --no-data.

@mslw
Copy link
Contributor Author

mslw commented Dec 14, 2021

As an alternative, we could just present the basics of subdatasets, and include some of more general dataset management themes posted by @jsheunis in #9

I wonder which would be more helpful assuming a very basic audience.

@mslw
Copy link
Contributor Author

mslw commented Jan 14, 2022

Thanks for the suggestions. The module is now complete, but the issue can be reopened for future tweaks.

@mslw mslw closed this as completed Jan 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content discussion Discussion regarding course content
Projects
None yet
Development

No branches or pull requests

2 participants