Using pure transforms for image and geometry data #7403

atbenmurray · 2024-01-18T13:58:13Z

An approach to implement #4027 that uses pure transforms to minimise the need for additional transforms.

NOTE: the implementation is untested in its current form and is missing components. There should be enough here to demostrate the approach; please let me know if anything is unclear.
NOTE: although this PR talks about pure transforms, we don't actually need to reimplement the transforms; we could instead modify them to have geometry paths, although this adds to the already high-complexity of the spatial transform implementations.

Introduction

This approach uses the concept of pure transforms. These are transforms that don't directly manipulate and resample data, but rather simply specify the operation that is to be carried out in terms of either an affine matrix or a deformation grid. This description is then given to a function that executes the transform, and does so depending on the nature of the data that is provided.

Initially, we should first go over the design principles that this PR adheres to

Geometry preprocessing design principles

Geometry data should be handled in the same way as Image data (as much as is possible)
- tensor with metadata including affine transform
- There may be special parameters analogous to "mode"
- Some primitives must be represented as more than one tensor
  - Our design / public API should be robust to this even if we don't implement it now
We should avoid Image and Geometry versions of each transform if possible
- We shouldn't modify the APIs of any transforms if we can avoid it
We should support pure Geometry applications
- Geometry is not subordinate to Images
  - Should not require the presence of Images
Compose and geometry:
- Not all pipelines using Array transforms will be expressible via Compose
  - All pipelines using Arrays will be expressible
- All pipelines using Dictionaries will be expressible via Compose

Critical geometry design decisions

The principle design decision for geometry is what unit space looks like. Given the desire that geometry is not subordinate to image data, we should tie unit space to physical units (i.e. millimetres) rather than pixel / voxel space. This has the following consequences:

images
- images generally have the concept of resolution, i.e. pixel/voxel extents in physical coordinates. We often resample images in a way that preserves the overall image extents by changing the resolution of the voxels as well as the number of voxels
  - e.g. 256 * 256 pixels @ (0.5 * 0.5) mm^2 = 128 * 128 mm^2 == 128 * 128 pixels @ (1.0 * 1.0) mm^2 = 128 * 128 mm^2
  - for the image being resampled, this can be done with the following affine transforms:
    - the first transform scales the image by 0.5mm / 1.0mm = 0.5. This also has the effect of doubling the viewport extents
    - the second transform scales the viewport extents by 0.5, which reduces the viewport back to the image border
  - the resulting transformation is defined here as being a raster-space transform
  - although the resolution of the image has changed, the geometric transformation is identity
- geometry has no concept of resolution and so is only affected by world-space transformations

Why does this matter? We need to separate the notion of transforms for overall geometry and transforms for raster space.

images are affected by both raster-space and world-space transformations
geometry is only affected by world-space transformations

Most transforms express pure world-space operations

e.g. flip, orientation, rotate(keep_size=True), rotate90, zoom etc.
The transforms that are image specific express raster-space operations that may or may not also have identity world-space transforms
- e.g. resample, resample_to_match, rotate(keep_size=False), spatial_resample, spacing

Note that rotate is an interesting case. Rotate generally expresses pure world-space operations, but when keep_size is False, it specifies a raster-space transform that modifies the image viewport.

Categorising transforms

Not image-specific

transforms that, in principle, work the same for image and geometry
- Affine(d), Flip(d), Orientation(d), Resize(d), Rotate(d), Rotate90(d), Zoom(d)
- RandAffine(d), RandAxisFlip(d), RandFlip(d), RandRotate(d), RandRotate90(d), RandZoom(d)
the APIs for these functions should not need to change

Image-specific transforms

need a strategy to support geometry
- RandSimulateLowResolution(d), Resize(d), Spacing(d), SpatialResample(d)
- this might be different at the dictionary, array and functional levels
Special case
- ResampleToMatch(d)
  - Can this be co-opted to update geometry to match images?

Transforms using grids

grid-based: what does a grid mean to geometry? does geometry just bypass the grid?
- AffineGrid(d), RandAffineGrid(d)
distortion: how do grid-based distorions get applied to geometry?
- GridPatch(d), GridSplit(d)
- Rand2DElastic(d), Rand3DElastic(d), RandDeformGrid(d), GridDistortion(d), RandGridDistortion(d), RandGridPatch(d)

Array and Dictionary

Array transforms that are image-specific may need extra arguments
Dictionary transforms that are image-specific can have "geom" arguments
- specify geometry tensor entries by name

Open questions

What does a grid mean to geometry?
- Can grids just be applied to points?

Geometry specific transforms

The following transforms have been identified as needing to exist in some form:

LoadGeometry
TransformToImage
Translate

`LoadGeometry`

This transform loads geometric data. It can in theory wrap all identified geometric types but at a minimum it needs to support arrays of points.

The user can specify how to interpret those points
- point cloud, open polygon, closed polygon (including winding), triangle soup
  Later we might also support loading of indexed meshes and higher order geometric primitives
  We might also want to support loading of other metadata such affine matrices

`TransformToImage`

This transform takes geometry data and maps it to a given image. This is useful when the geometry data doesn't have a defined space and must be mapped to an image.

Take on the image orientation
Transform the coordinate system
Centre relative to image center / specified corners

`TransformLike`

TransformLike is a way of aligning point clouds tensors with image tensors following transforms that only really work on images. Transforms such as Resize/Resized calculate a scaling matrix that can only be determined from the physical characteristics of an image. However, corresponding point cloud data still needs updating to match the transformed image. so TransformLike is provided to take the latest transform from applied or pending and either apply it or add it to pending on the point cloud tensors.

Note that TransformLike seems close in functionality to ResampleToMatch and extending the latter may be better than implementing the former.

Modifications to existing transforms

Array transforms

Array transforms specific to images might be modified so that geometry data can be updated. This can be done via additional operation parameters that take a tensor or tuple of tensors:

class Spacing(InvertibleTransform, LazyTransform):
    def __call__(
        self, data_array, mode, padding_mode, align_corners, dtype, scale_extent, output_spatial_shape, lazy,
        inputs_to_update # New
    ):

Another way to achieve this would be to add an extra key field that indicates fields that should have the recent transform applied to them, but this would have to be added to all spatial dictionary transforms.

Dictionary transforms

Dictionary transforms specific to images can refer to geometry by name rather than requiring to pass tensors in directly

class Spacingd(MapTransform, InvertibleTransform, LazyTransform):
    def __init__(
        self, keys, pixdim, diagonal, mode, padding_mode, align_corners, dtype, scale_extent,
        recompute_affine, min_pixdim, max_pixdim, ensure_same_shape, allow_missing_keys,
        input_keys_to_update # New
    ):

An example pure transform

All pure transforms are implemented as pure functions. The array and dictionary, class-based transforms simply wrap the functional transform and pass the necessary parameters to it. This is true for both deterministic and random transforms.

def rotate(
        img: torch.Tensor,
        angle: Sequence[float] | float,
        keep_size: bool = True,
        mode: InterpolateMode | str = InterpolateMode.AREA,
        padding_mode: NumpyPadMode | GridSamplePadMode | str = NumpyPadMode.EDGE,
        align_corners: bool = False,
        dtype: DtypeLike | torch.dtype = None,
        shape_override: Sequence[int] | None = None,
        dtype_override: DtypeLike | torch.dtype = None,
        lazy: bool = False
):
    # get the various arguments into the expected form
    img_ = convert_to_tensor(img, track_meta=get_track_meta())
    mode_ = look_up_option(mode, GridSampleMode)
    padding_mode_ = look_up_option(padding_mode, GridSamplePadMode)
    dtype_ = get_equivalent_dtype(dtype or img_.dtype, torch.Tensor)

    # get the input shape and input dtype, either from the image itself or from the pending transforms
    input_shape, input_dtype = get_input_shape_and_dtype(shape_override, dtype_override, img_)

    # get the angles; ignore all but the first value if the data is 2D
    input_ndim = len(input_shape) - 1
    if input_ndim not in (2, 3):
        raise ValueError(f"Unsupported image dimension: {input_ndim}, available options are [2, 3].")
    angle_ = ensure_tuple_rep(angle, 1 if input_ndim == 2 else 3)

    # calculate the transform and the output shape for the image
    rotate_tx = create_rotate(input_ndim, angle_).astype(np.float64)
    output_shape = input_shape if keep_size is True else transform_shape(input_shape, rotate_tx)

    # create the full transform description
    metadata = {
        "transform": transform,
        "op": "rotate",
        "angle": angle,
        "keep_size": keep_size,
        LazyAttr.INTERP_MODE: mode_,
        LazyAttr.PADDING_MODE: padding_mode_,
        LazyAttr.ALIGN_CORNERS: align_corners,
        LazyAttr.IN_SHAPE: input_shape,
        LazyAttr.IN_DTYPE: input_dtype,
        LazyAttr.OUT_SHAPE: output_shape,
        LazyAttr.OUT_DTYPE: dtype_,
    }

    # depending on the state of the 'lazy' argument, either apply the transform now or add it to the pending list
    return lazily_apply_op(img_, metadata, lazy)

Applying transform descriptions

Applying transform descriptions is simple matter of writing a resampler that knows how to handle point data. We add a kind property to MetaTensor (currently "pixel" and "point") to indicate what kind of tensor data a given tensor contains and when we arrive at the point of resampling we check the tensor kind and call the appropriate function, resample_image or resample_points. This will update the tensor accordingly when passed either an affine matrix or a deformation field.

The call stack is

lazily_apply_op -> apply_pending -> resample_image | resample_points

An example pipeline

im_keys = ('image', 'label')
pt_keys = ('rater 1', 'rater 2')
[
    LoadImaged(keys='im_keys'),
    LoadPointsd(keys='pt_keys'), #not currently in this PR, should be the only points specific tx needed
    Resized(keys='im_keys', ...),
    TransformLiked(keys='pt_keys', reference_key='image'),
    Rotate(keys='im_keys' + 'pt_keys', ...),
]

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
In-line docstrings updated.
Documentation updated, tested make html command in the docs/ folder.

Signed-off-by: Ben Murray <[email protected]>

atbenmurray · 2024-01-18T13:59:00Z

@ericspod @KumoLiu @Nic-Ma @vikashg please take a look and let me know if anything is unclear!

KumoLiu · 2024-01-19T09:10:08Z

Hi Ben, first thank you for the detailed PR and explanation; I have a few questions according to your proposal:

so 'TransformLike' is provided to take the latest transform from applied or pending and either apply it or add it to pending on the point cloud tensors.

Did you add TransformLike in this PR? I didn't find it. I'm curious how this transform is implemented. Does it mean we need to add TransformLike after each transform we apply to the image? I didn't fully understand the example you posted.

This approach uses the concept of pure transforms. These are transforms that don't directly manipulate and resample data, but rather simply specify the operation that is to be carried out in terms of either an affine matrix or a deformation grid. This description is then given to a function that executes the transform and does so depending on the nature of the data that is provided

If I understand correctly, it means we apply the resample all the time no matter lazy or eager, right? All transforms will end up with lazily_apply_op, will that make the transform much slower? Some transforms like Flip can be much faster if not using resampling.
This proposal may mainly be focused on spatial transforms, what about crop or pad？
What does this MetaMatrix refer to here, meta info? If so, then why not directly using meta dict?

MONAI/monai/transforms/lazy/functional.py

Line 101 in 9de7dbf

If passed a Tensor, it returns a tuple of Tensor, MetaMatrix:

atbenmurray · 2024-01-19T13:10:40Z

Hi Ben, first thank you for the detailed PR and explanation; I have a few questions according to your proposal:

Thanks for the detailed feedback!

so 'TransformLike' is provided to take the latest transform from applied or pending and either apply it or add it to pending on the point cloud tensors.

Did you add TransformLike in this PR? I didn't find it. I'm curious how this transform is implemented. Does it mean we need to add TransformLike after each transform we apply to the image? I didn't fully understand the example you posted.

No, I haven't implemented TransformLike yet, but I can sketch it out and add it to the PR. I think it is something that all of the approaches would need in any case, because there are transforms that require image dimensions to correctly calculate, so the approaches in #7385 and #7386 would likely also need such a transform.

This approach uses the concept of pure transforms. These are transforms that don't directly manipulate and resample data, but rather simply specify the operation that is to be carried out in terms of either an affine matrix or a deformation grid. This description is then given to a function that executes the transform and does so depending on the nature of the data that is provided

If I understand correctly, it means we apply the resample all the time no matter lazy or eager, right? All transforms will end up with lazily_apply_op, will that make the transform much slower? Some transforms like Flip can be much faster if not using resampling.

Ahh, no. We pass the lazy flag to lazily_apply_op. If lazy is false, a resample will happen right away. If lazy is true, the transform description is added to the list of pending transforms. I'll make that clearer in the text and documentation.

The resampling operation analyses the matrix that is passed to it and uses the cheapest operation it can use, and will always attempt to do so in a lossless fashion if possible. The logic is as follows:

If the matrix permits it, use a tensor operation (e.g. Flip)
Otherwise, if the matrix permits it, use an interpolate operation (e.g. in place scale / translate)
Otherwise, if the matrix permits it, use a gridless resample (this should be most operations but we don't have the backend code for such an operation at present)
Otherwise, use a grid-based resample

The benefit of such a 'universal' resampler is that the logic for efficiently resampling data can be taken out of the transforms themselves and put somewhere where every transform can benefit from improvements in resampling operations. This is a key part of what allows the pure transforms without any downsides. When operating lazily, it also allows for cases where the cumulative transform is an interpolate or even a tensor operation, even if the individual transforms wouldn't do so.

This proposal may mainly be focused on spatial transforms, what about crop or pad?

I have examples of this for crop and pad that I can add to the PR.

What does this MetaMatrix refer to here, meta info? If so, then why not directly using meta dict?

MONAI/monai/transforms/lazy/functional.py

Line 101 in 9de7dbf

If passed a Tensor, it returns a tuple of Tensor, MetaMatrix:

Ooops! This is adapted from the lazy resampling implementation that the paper was based on. MetaMatrix should be gone, and it is my mistake if there are some left in there. I'll tidy it up.

atbenmurray · 2024-10-25T14:03:33Z

@KumoLiu @Nic-Ma @ericspod I think we can close this PR. I have a branch on my fork of MONAI that significantly updates this work.

atbenmurray added 4 commits January 17, 2024 15:50

Initial commit of pure transforms

4ed2683

Signed-off-by: Ben Murray <[email protected]>

Array spatial transforms ported over from lazy

b504bc7

Signed-off-by: Ben Murray <[email protected]>

Adding pure dictionary spatial transforms

0a1df5f

Signed-off-by: Ben Murray <[email protected]>

Adding a placeholder for resampling of point clouds

9de7dbf

Signed-off-by: Ben Murray <[email protected]>

atbenmurray requested review from vikashg, ericspod, Nic-Ma and KumoLiu January 18, 2024 13:58

Addition of experimental TransformLike/TransformLikeD functionality

f3f858b

atbenmurray self-assigned this Feb 16, 2024

KumoLiu added this to the Geometric Transform API milestone Mar 1, 2024

ericspod closed this Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using pure transforms for image and geometry data #7403

Using pure transforms for image and geometry data #7403

atbenmurray commented Jan 18, 2024 •

edited

Loading

atbenmurray commented Jan 18, 2024

KumoLiu commented Jan 19, 2024 •

edited

Loading

atbenmurray commented Jan 19, 2024 •

edited

Loading

atbenmurray commented Oct 25, 2024

Using pure transforms for image and geometry data #7403

Using pure transforms for image and geometry data #7403

Conversation

atbenmurray commented Jan 18, 2024 • edited Loading

Introduction

Geometry preprocessing design principles

Critical geometry design decisions

Categorising transforms

Not image-specific

Image-specific transforms

Transforms using grids

Array and Dictionary

Open questions

Geometry specific transforms

LoadGeometry

TransformToImage

TransformLike

Modifications to existing transforms

Array transforms

Dictionary transforms

An example pure transform

Applying transform descriptions

An example pipeline

Types of changes

atbenmurray commented Jan 18, 2024

KumoLiu commented Jan 19, 2024 • edited Loading

atbenmurray commented Jan 19, 2024 • edited Loading

atbenmurray commented Oct 25, 2024

atbenmurray commented Jan 18, 2024 •

edited

Loading

`LoadGeometry`

`TransformToImage`

`TransformLike`

KumoLiu commented Jan 19, 2024 •

edited

Loading

atbenmurray commented Jan 19, 2024 •

edited

Loading