Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First data type schemas #524

Draft
wants to merge 27 commits into
base: v5
Choose a base branch
from
Draft

First data type schemas #524

wants to merge 27 commits into from

Conversation

UlrikeS91
Copy link
Collaborator

adjusted draft of #523

@UlrikeS91 UlrikeS91 added request any request or update for schemas major update large workload or major update needed to complete labels Dec 3, 2024
@UlrikeS91 UlrikeS91 self-assigned this Dec 3, 2024
@UlrikeS91
Copy link
Collaborator Author

UlrikeS91 commented Dec 3, 2024

@Peyman-N, @apdavison and @lzehl I will keep it as a draft for now, but the content is ready for review. Except for one new ControlledTerms terminology*, everything should be functional in the existing framework. Meaning the coordinateSpace still points to the old setup. This will need to be updated when we do the SANDS updates.

I restructured the content from the issue since it seems like several image types share properties (which makes sense).

The schemas are structured the following way:

  • 'Image':

    • concept schema
    • contains info about coordinate space and file compression
    • has a lookup label and additional remarks
  • 'RasterBasedImage' and 'VectorBasedImage':

    • concept schemas
    • extend 'Image'
    • contain some similar properties: data location & dimension
    • but had to be split due to their inherent differences (vectors are scalable)
    • 'RasterBasedImage' additionally contains device used capture image, color depth and resolution
    • 'VectorBasedImage' additionally contains software used to create image
  • specific raster-based images restrict some of the inherited properties and define additional ones:

    • 'RasterGraphic'
    • 'Volume'
    • 'ImageStack'
  • specific vector-based images restrict some of the inherited properties and define additional ones:

    • 'VectorGraphic'
    • 'PolygonalMesh'
  • new terminology is on 'Image', called 'ImageCompressionType' (e.g., LZW, RLE, JPEG), this could also be solved by defining the value as 'core:Technique' (or 'core:AnalysisTechnique', not sure) since this is basically what is happening - a method applied to the file to reduce the size of the file

Looking forward to discussing this draft 🙂

@lzehl
Copy link
Member

lzehl commented Jan 20, 2025

to be further discussed:

2Draster
3Draster
2Dvector
3Dvector

compare to:
https://en.wikipedia.org/wiki/Scanning_probe_microscopy
https://fsl.fmrib.ox.ac.uk/fsl/oldwiki/FSLVBM.html

@xgui3783
Copy link
Contributor

xgui3783 commented Feb 12, 2025

Thanks @lzehl for pointing me to this PR. Apologies that I only have a chance to be involved now, and now sooner. I do have some questions to everyone involved:

  1. Is there a way in the current proposal to include anchoring information?

We currently work with the outputs of voluba (3 dimensional raster based image) and quicknii (2 dimensional raster based image). In both cases, through user interaction, a transform file is often produced (in the case of voluba, a 4x4 affine, in the case of quicknii, if I understand correctly, effectively a 3x3 matrix)

In the current proposal schemas/dataTypes/rasterBasedImage.schema.tpl.json does not seem to allow the input of such a transform.

  1. I want to clarify if it is possible - or if it is the intention of the proposal - to capture nonlinear transformations.

In such a case, the relationship between the source (e.g. rasterBasedImage) to the target (CommonCoordinateSpaceVersion|CustomCoordinateSpace) is not defined by resolution, but a deformation field.

I want to point out the effort done by the ngff v0.5[1] (which is an extension to zarr v3[2]), on how they handle the pixel/voxel --> physical unit problem[3]. They allow the definition of a list of transformation steps, which at the moment include scaling (not dissimilar to this proposal) and translation. In an array form, it lends itself to extension, should the initial proposal does not cover edge cases (e.g. rotation can be introduced via an additional transform type, nonlinear transform too)

Whilst ngff (and by extension, zarr) is mainly a data format primarly used to store raster 2d/3d image data, its approach to inform the client about how to transform from one space to another seems flexible.

edit:

ping @UlrikeS91 @Peyman-N @lzehl

[1] https://ngff.openmicroscopy.org/latest/

[2] https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html

[3] https://ngff.openmicroscopy.org/latest/#multiscale-md

@lzehl
Copy link
Member

lzehl commented Feb 12, 2025

@xgui3783 thanks for getting involved here. Your input is needed and greatly appreciated. I will respond more in detail later but here already something that might clarify a bit our approach (so far)

These data type schemas suggested here would indeed not include anchoring information. It is not there purpose. Anchoring information would be stored in a separate schema (because there could be multiple to different coordinate spaces associated with the same file) which we should further discuss. The coordinate space in the here provided schemas will always refer to the coordinate space of the file (so the saved transformation result) not to the coordinate space a file could be transformed to (with the given transformation information).

I would suggest to have a dedicated meeting for how we can extend SANDS with transformation activities and the respective results. We discussed this multiple times in the past but always dropped it for time and complexity reasons.

@openMetadataInitiative/openminds-developers and @xgui3783 please provide additional comments/input to further shape this PR and related issues.

@xgui3783
Copy link
Contributor

The coordinate space in the here provided schemas will always refer to the coordinate space of the file

Do I understand correctly then, the following examples should have the corresponding commonCoordinateSpaceVersion be set to their .coordinateSpace linked attribute :

  • template image (e.g. T1/T2 MRI, 20um reconstruction big brain)
  • parcellations (e.g. waxholm v1-4 in waxholm space, Julich Brain in colin/icbm152*, allen mouse 2015/2017 in allen ccf v3)
  • other activation map/image/geometries (e.g. julich brain probability maps, output of fmriprep, nutil pointclouds etc)

But the following should not :

  • volumes/images that needed more than "scaling" (e.g. Julich Brain delineations in Big Brain, 2D image anchored by QuickNii)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major update large workload or major update needed to complete request any request or update for schemas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants