using a dataset in an analysis #188

alecristia · 2021-04-23T10:42:54Z

alecristia
Apr 23, 2021
Maintainer

What is the best practice when doing an analysis that relies on a dataset, say EL1000?
Right now, I have the dataset within my github repo, but the EL1000 folder is not tracked. Should I add it to gitignore, or specify this dataset otherwise?
Thanks in advance for pointers to reading on this!

Answered by lucasgautheron

Apr 23, 2021

The best practice is to do dataset nesting.
In other words, you should install EL1000 as a subdataset of your analysis.

You can do this from your analysis subfolder:

datalad install -d .  --source='https://gin.g-node.org/EL1000/EL1000' -r

The -r flag is here because EL1k contains subdatasets itself. You can have a look at datalad install's documentation here: http://docs.datalad.org/en/stable/generated/man/datalad-install.html

Then you can then cd into EL1000 and do what you need (e.g. download the files required by your analysis):

cd EL1000
datalad run-procedure setup
datalad get */annotations/*/converted

If EL1000 is updated, you will need to pull the changes in your analysis:

git pull …

View full answer

lucasgautheron · 2021-04-23T11:02:53Z

lucasgautheron
Apr 23, 2021
Maintainer

The best practice is to do dataset nesting.
In other words, you should install EL1000 as a subdataset of your analysis.

You can do this from your analysis subfolder:

datalad install -d .  --source='https://gin.g-node.org/EL1000/EL1000' -r

The -r flag is here because EL1k contains subdatasets itself. You can have a look at datalad install's documentation here: http://docs.datalad.org/en/stable/generated/man/datalad-install.html

Then you can then cd into EL1000 and do what you need (e.g. download the files required by your analysis):

cd EL1000
datalad run-procedure setup
datalad get */annotations/*/converted

If EL1000 is updated, you will need to pull the changes in your analysis:

git pull origin main --recurse-submodules

(replace main with whatever branch is the one you need)

DataLad's handbook dedicates a few sections to dataset nesting:

1 reply

alecristia Apr 27, 2021
Maintainer Author

okay, in full disclosure, I had already started from code that seemed to have an EL1000 folder, so I had cd'd into that, then set up correctly the datasets I needed from EL1000, including downloading all the annotations.

So as to not lose that work, I moved my copy of EL1000 (which was a folder, not a proper nested dataset) outside of my work directory. I then attempted the set up, thinking that if that worked, instead of getting the annotations, I would move them from my local copy. However, run-procedure gave me an error:

source ~/ChildProjectVenv/bin/activate
mv EL1000 ..
datalad install -d .  --source='https://gin.g-node.org/EL1000/EL1000' -r
cd EL1000
datalad run-procedure setup

[ERROR ] No idea how to execute procedure /Users/acristia/Documents/gitrepos/EL1000-CR/EL1000/.datalad/procedures/setup.py. Missing 'execute' permissions? [run_procedure.py:call:435] (ValueError)

I don't see anything wrong with the installation output:

$ datalad install -d . --source='https://gin.g-node.org/EL1000/EL1000' -r

[INFO ] Scanning for unlocked files (this may take some time)
install(ok): EL1000 (dataset)
[INFO ] Installing Dataset(/Users/acristia/Documents/gitrepos/EL1000-CR/EL1000) to get /Users/acristia/Documents/gitrepos/EL1000-CR/EL1000 recursively
Installing: 0.00 datasets [00:00, ? datasets/s] Warning: untrusted X11 forwarding setup failed: xauth key data not generated███████████████████████████████| 3.00/3.00 [00:00<00:00, 3.95 Candidate locations/s]
Warning: untrusted X11 forwarding setup failed: xauth key data not generated
[INFO ] Scanning for unlocked files (this may take some time)
Installing: 0.00 datasets [00:29, ? datasets/s]Warning: untrusted X11 forwarding setup failed: xauth key data not generated
[INFO ] Reset branch 'main' to bff10484 (from 5cc84948) to avoid a detached HEAD
install(ok): EL1000/bergelson (dataset)
Installing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 1.00/1.00 [00:00<00:00, 1.08k datasets/sWarning: untrusted X11 forwarding setup failed: xauth key data not generated███████████████████████████████| 3.00/3.00 [00:00<00:00, 4.08 Candidate locations/s]
Warning: untrusted X11 forwarding setup failed: xauth key data not generated
[INFO ] Scanning for unlocked files (this may take some time)
Installing: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 1.00/1.00 [00:19<00:00, 19.6s/ datasets]Warning: untrusted X11 forwarding setup failed: xauth key data not generated
install(ok): EL1000/kidd (dataset)
Installing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 2.00/2.00 [00:00<00:00, 1.90k datasets/sWarning: untrusted X11 forwarding setup failed: xauth key data not generated███████████████████████████████| 3.00/3.00 [00:00<00:00, 4.10 Candidate locations/s]
Warning: untrusted X11 forwarding setup failed: xauth key data not generated
[INFO ] Scanning for unlocked files (this may take some time)
Installing: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 2.00/2.00 [00:28<00:00, 14.2s/ datasets]Warning: untrusted X11 forwarding setup failed: xauth key data not generated
[INFO ] Reset branch 'main' to 41fb635e (from 93004e6b) to avoid a detached HEAD
install(ok): EL1000/lucid (dataset)
Installing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 3.00/3.00 [00:00<00:00, 3.25k datasets/sWarning: untrusted X11 forwarding setup failed: xauth key data not generated███████████████████████████████| 3.00/3.00 [00:00<00:00, 4.11 Candidate locations/s]
Warning: untrusted X11 forwarding setup failed: xauth key data not generated
[INFO ] Scanning for unlocked files (this may take some time)
Installing: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 3.00/3.00 [00:04<00:00, 1.47s/ datasets]Warning: untrusted X11 forwarding setup failed: xauth key data not generated
[INFO ] Reset branch 'main' to 1289fb37 (from d6935ebd) to avoid a detached HEAD
install(ok): EL1000/warlaumont (dataset)
Installing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4.00/4.00 [00:00<00:00, 4.23k datasets/sWarning: untrusted X11 forwarding setup failed: xauth key data not generated███████████████████████████████| 3.00/3.00 [00:00<00:00, 3.93 Candidate locations/s]
Warning: untrusted X11 forwarding setup failed: xauth key data not generated
[INFO ] Scanning for unlocked files (this may take some time)
Installing: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 4.00/4.00 [00:02<00:00, 1.59 datasets/s]Warning: untrusted X11 forwarding setup failed: xauth key data not generated
[INFO ] Reset branch 'main' to ae94c585 (from 64880a92) to avoid a detached HEAD
install(ok): EL1000/winnipeg (dataset)
action summary:
install (ok: 6)
save (notneeded: 1)

Another interesting observation is that the annotations already seem to be there in spirit but not in body:

$ ls bergelson/annotations/vtc/converted/

(gives me a long list of files, including the one I used in the next command:)

$ more bergelson/annotations/vtc/converted/123972-9997_1_0_0.csv

bergelson/annotations/vtc/converted/123972-9997_1_0_0.csv: No such file or directory

lucasgautheron · 2021-04-27T12:57:41Z

lucasgautheron
Apr 27, 2021
Maintainer

Hello, datalad install does not download the data by default. You still need to do : ```bash datalad get */annotations/*/converted ``` Does it work for you ? Le mar. 27 avr. 2021 à 14:54, Alex Cristia ***@***.***> a écrit :

…

okay, in full disclosure, I had already started from code that seemed to have an EL1000 folder, so I had cd'd into that, then set up correctly the datasets I needed from EL1000, including downloading all the annotations. So as to not lose that work, I moved my copy of EL1000 (which was a folder, not a proper nested dataset) outside of my work directory. I then attempted the set up, thinking that if that worked, instead of getting the annotations, I would move them from my local copy. However, run-procedure gave me an error: source ~/ChildProjectVenv/bin/activate mv EL1000 .. datalad install -d . --source='https://gin.g-node.org/EL1000/EL1000' -r cd EL1000 datalad run-procedure setup [ERROR ] No idea how to execute procedure /Users/acristia/Documents/gitrepos/EL1000-CR/EL1000/.datalad/procedures/setup.py. Missing 'execute' permissions? [run_procedure.py:*call*:435] (ValueError) I don't see anything wrong with the installation output: $ datalad install -d . --source='https://gin.g-node.org/EL1000/EL1000' -r [INFO ] Scanning for unlocked files (this may take some time) install(ok): EL1000 (dataset) [INFO ] Installing Dataset(/Users/acristia/Documents/gitrepos/EL1000-CR/EL1000) to get /Users/acristia/Documents/gitrepos/EL1000-CR/EL1000 recursively Installing: 0.00 datasets [00:00, ? datasets/s] Warning: untrusted X11 forwarding setup failed: xauth key data not generated███████████████████████████████| 3.00/3.00 [00:00<00:00, 3.95 Candidate locations/s] Warning: untrusted X11 forwarding setup failed: xauth key data not generated [INFO ] Scanning for unlocked files (this may take some time) Installing: 0.00 datasets [00:29, ? datasets/s]Warning: untrusted X11 forwarding setup failed: xauth key data not generated [INFO ] Reset branch 'main' to bff10484 (from 5cc84948) to avoid a detached HEAD install(ok): EL1000/bergelson (dataset) Installing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 1.00/1.00 [00:00<00:00, 1.08k datasets/sWarning: untrusted X11 forwarding setup failed: xauth key data not generated███████████████████████████████| 3.00/3.00 [00:00<00:00, 4.08 Candidate locations/s] Warning: untrusted X11 forwarding setup failed: xauth key data not generated [INFO ] Scanning for unlocked files (this may take some time) Installing: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 1.00/1.00 [00:19<00:00, 19.6s/ datasets]Warning: untrusted X11 forwarding setup failed: xauth key data not generated install(ok): EL1000/kidd (dataset) Installing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 2.00/2.00 [00:00<00:00, 1.90k datasets/sWarning: untrusted X11 forwarding setup failed: xauth key data not generated███████████████████████████████| 3.00/3.00 [00:00<00:00, 4.10 Candidate locations/s] Warning: untrusted X11 forwarding setup failed: xauth key data not generated [INFO ] Scanning for unlocked files (this may take some time) Installing: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 2.00/2.00 [00:28<00:00, 14.2s/ datasets]Warning: untrusted X11 forwarding setup failed: xauth key data not generated [INFO ] Reset branch 'main' to 41fb635e (from 93004e6b) to avoid a detached HEAD install(ok): EL1000/lucid (dataset) Installing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 3.00/3.00 [00:00<00:00, 3.25k datasets/sWarning: untrusted X11 forwarding setup failed: xauth key data not generated███████████████████████████████| 3.00/3.00 [00:00<00:00, 4.11 Candidate locations/s] Warning: untrusted X11 forwarding setup failed: xauth key data not generated [INFO ] Scanning for unlocked files (this may take some time) Installing: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 3.00/3.00 [00:04<00:00, 1.47s/ datasets]Warning: untrusted X11 forwarding setup failed: xauth key data not generated [INFO ] Reset branch 'main' to 1289fb37 (from d6935ebd) to avoid a detached HEAD install(ok): EL1000/warlaumont (dataset) Installing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 4.00/4.00 [00:00<00:00, 4.23k datasets/sWarning: untrusted X11 forwarding setup failed: xauth key data not generated███████████████████████████████| 3.00/3.00 [00:00<00:00, 3.93 Candidate locations/s] Warning: untrusted X11 forwarding setup failed: xauth key data not generated [INFO ] Scanning for unlocked files (this may take some time) Installing: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 4.00/4.00 [00:02<00:00, 1.59 datasets/s]Warning: untrusted X11 forwarding setup failed: xauth key data not generated [INFO ] Reset branch 'main' to ae94c585 (from 64880a92) to avoid a detached HEAD install(ok): EL1000/winnipeg (dataset) action summary: install (ok: 6) save (notneeded: 1) Another interesting observation is that the annotations already seem to be there in spirit but not in body: $ ls bergelson/annotations/vtc/converted/ (gives me a long list of files, including the one I used in the next command:) $ more bergelson/annotations/vtc/converted/123972-9997_1_0_0.csv bergelson/annotations/vtc/converted/123972-9997_1_0_0.csv: No such file or directory — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#188 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABI4QHTM6SQ6DYXF3SXUQELTK2XXPANCNFSM43OMSAQA> .

-- Lucas Gautheron 06 79 23 86 47 Laboratoire de Sciences Cognitives et Psycholinguistique Bureau 414 - 29 rue d'Ulm 75005 Paris

1 reply

alecristia Apr 29, 2021
Maintainer Author

sorry, i didn't see this, but yes, that does work!

lucasgautheron · 2021-04-27T13:14:51Z

lucasgautheron
Apr 27, 2021
Maintainer

Btw the issue with the procedure script was my bad, I just fixed it, you;ll need to update EL1k (git pull origin main --recurse-submodules
)

6 replies

lucasgautheron Apr 27, 2021
Maintainer

I failed to reproduce.

Can you do ls -lA in your EL1000 folder ? also datalad status ?

alecristia Apr 29, 2021
Maintainer Author

$ ls -lA
total 0
drwxr-xr-x 3 acristia staff 96 Apr 27 16:15 bergelson
drwxr-xr-x 3 acristia staff 96 Apr 27 16:15 lucid
drwxr-xr-x 3 acristia staff 96 Apr 27 16:15 warlaumont
drwxr-xr-x 3 acristia staff 96 Apr 27 16:15 winnipeg

alecristia Apr 29, 2021
Maintainer Author

$ datalad status
[ERROR ] No dataset found at '/Users/acristia/Documents/gitrepos/EL1000'. Specify a dataset to work with by providing its path via the dataset option, or change the current working directory to be in a dataset. [dataset.py:require_dataset:573] (NoDatasetFound)
usage: datalad status [-h] [-d DATASET] [--annex [MODE]] [--untracked MODE]
[-r] [-R LEVELS] [-e {no|commit|full}] [-t {raw|eval}]
[PATH [PATH ...]]

lucasgautheron Apr 29, 2021
Maintainer

if doing datalad remove -d again upon these is not working; you can still chmod -R 777 them and rm them... I am a bit puzzled by this!

alecristia Apr 30, 2021
Maintainer Author

datalad remove -d PATH didn't help (I had done it before, but I tried again and it didn't), but changing properties allowed me to remove them.

Just to clarify, I did remove another dataset I had mistakenly installed, so clearly this has to do with how I had set up and/or installed these four - though I'm still not sure what I did wrong!

alecristia · 2021-04-30T08:52:53Z

alecristia
Apr 30, 2021
Maintainer Author

Just tagging here that the only thing left open in this discussion is what to do with other copies of datasets in your local folders that you'd like to get rid. (Or perhaps my question is more precise than that, and it relates to having poorly installed and/or uninstalled the datasets)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using a dataset in an analysis #188

{{title}}

Replies: 4 comments 8 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

using a dataset in an analysis #188

alecristia Apr 23, 2021 Maintainer

Replies: 4 comments · 8 replies

lucasgautheron Apr 23, 2021 Maintainer

alecristia Apr 27, 2021 Maintainer Author

lucasgautheron Apr 27, 2021 Maintainer

alecristia Apr 29, 2021 Maintainer Author

lucasgautheron Apr 27, 2021 Maintainer

lucasgautheron Apr 27, 2021 Maintainer

alecristia Apr 29, 2021 Maintainer Author

alecristia Apr 29, 2021 Maintainer Author

lucasgautheron Apr 29, 2021 Maintainer

alecristia Apr 30, 2021 Maintainer Author

alecristia Apr 30, 2021 Maintainer Author

alecristia
Apr 23, 2021
Maintainer

Replies: 4 comments 8 replies

lucasgautheron
Apr 23, 2021
Maintainer

alecristia Apr 27, 2021
Maintainer Author

lucasgautheron
Apr 27, 2021
Maintainer

alecristia Apr 29, 2021
Maintainer Author

lucasgautheron
Apr 27, 2021
Maintainer

lucasgautheron Apr 27, 2021
Maintainer

alecristia Apr 29, 2021
Maintainer Author

alecristia Apr 29, 2021
Maintainer Author

lucasgautheron Apr 29, 2021
Maintainer

alecristia Apr 30, 2021
Maintainer Author

alecristia
Apr 30, 2021
Maintainer Author