Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assets summary is faulty for a BIDS dandiset with raw and derived data #1844

Closed
yarikoptic opened this issue Jan 30, 2024 · 1 comment
Closed

Comments

@yarikoptic
Copy link
Member

happens on an embargoed dandiset 000874 for which we show

image

which has one participant (according to metadata):

❯ grep Participant -A1 response_1706639141849.json | grep identifier | uniq -c
     29             "identifier": "SP002"

with following folders for that subject

❯ grep path response_1706639141849.json | awk '/sub-SP002/' | sed -e 's,.*"\(.*sub-SP002/\).*,\1,g' | sort | uniq -c
     40 derivatives/MRI-pipeline/sub-SP002/
      8 derivatives/OCT-pipeline/sub-SP002/
     58 rawdata/sub-SP002/
❯ grep path response_1706639141849.json | awk '/sub-SP002/' | sed -e 's,.*"\(.*sub-SP002/[^/]\+/\).*,\1,g' | sort | uniq -c
     40 derivatives/MRI-pipeline/sub-SP002/anat/
      8 derivatives/OCT-pipeline/sub-SP002/micr/
     56 rawdata/sub-SP002/ses-MRI/
      2 rawdata/sub-SP002/ses-OCT/

and some files do have _sample- entities with 4 different values. But overall not clear if 4 is really want we want there.

@yarikoptic
Copy link
Member Author

yarikoptic commented Jan 30, 2024

ok, troubleshooted to the data bug -- has - instead of _ to separate session for subject id in sub-SP002-ses-MRI, thus making that file to define having subject id SP002-ses-MRI. We really should look into using proper bids-validator here :-/

note to future myself -- underlying function is actually within dandischema. Here is the code snippet to load dumped in swagger result and run summary on it to see what is going etc

❯ cat try_stats.py
import json

resp = json.loads(open('/tmp/response_1706639141849.json').read())

from dandischema.metadata import *

aggregate_assets_summary(_['metadata'] for _ in resp['results'])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant