Ability to operate solely on already extracted metadata of DANDI dandisets #7

yarikoptic · 2024-04-15T14:38:03Z

We do have metadata across all dandisets assets extracted and made available both in

datalad dandisets on github - https://github.com/dandisets each has .dandi/assets.json e.g. https://github.com/dandisets/000960/blob/draft/.dandi/assets.json
S3 manifests -- those are slightly different - jsonld versions - e.g. http://dandiarchive.s3.amazonaws.com/dandisets/000960/draft/assets.jsonld or YAML version: http://dandiarchive.s3.amazonaws.com/dandisets/000960/draft/assets.yaml

Ideally the tool should be able to operate (might be a mode option of some kind) just on the metadata records and provide e.g. as output a json/tsv list of records with target filename per each asset.

If metadata is lacking, we should extend it at https://github.com/dandi/dandi-schema/ level and https://github.com/dandi/dandi-cli to support extraction/harmonization where needed.

Before even doing that, internal code internally should be aware of such target use-case -- should get a clear separation of steps of

metadata-extraction/harmonization, e.g. get_metadata_from_files(files: list[Path]) -> list[AssetMetadata]
analytics for BIDS files construction based on metadata, e.g. get_bids_filenames(list[AssetMetadata]) -> list[BIDSFile]
- tricky part is that some files would be "generated" and not correspond to specific asset but rather often "summary" over assets, e.g. dataset_description.json, participants.tsv
  - could be done via creating a ConcreteBIDSFile subclass of BIDSFile which would just store the content of the target file
BIDS dataset instantiation, e.g. populate_bids_files(list[BIDSFile]) -> None - which given the list of files from above would instantiate. Could have options of various kinds or have different implementations (e.g. creating datalad dataset via https://docs.datalad.org/en/stable/generated/man/datalad-addurls.html if originally operating on list of URLs; or another one which downloads etc)

The text was updated successfully, but these errors were encountered:

TheChymera · 2024-04-23T16:08:49Z

Do I understand correctly that this would require nwb2bids to depend on DANDI?
If so I think that would be a problem, because neuroconv should depend on this.

yarikoptic assigned TheChymera Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to operate solely on already extracted metadata of DANDI dandisets #7

Ability to operate solely on already extracted metadata of DANDI dandisets #7

yarikoptic commented Apr 15, 2024

TheChymera commented Apr 23, 2024

Ability to operate solely on already extracted metadata of DANDI dandisets #7

Ability to operate solely on already extracted metadata of DANDI dandisets #7

Comments

yarikoptic commented Apr 15, 2024

TheChymera commented Apr 23, 2024