Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using pure transforms for image and geometry data #7403

Closed
wants to merge 5 commits into from

Conversation

atbenmurray
Copy link
Contributor

@atbenmurray atbenmurray commented Jan 18, 2024

An approach to implement #4027 that uses pure transforms to minimise the need for additional transforms.

NOTE: the implementation is untested in its current form and is missing components. There should be enough here to demostrate the approach; please let me know if anything is unclear.
NOTE: although this PR talks about pure transforms, we don't actually need to reimplement the transforms; we could instead modify them to have geometry paths, although this adds to the already high-complexity of the spatial transform implementations.

Introduction

This approach uses the concept of pure transforms. These are transforms that don't directly manipulate and resample data, but rather simply specify the operation that is to be carried out in terms of either an affine matrix or a deformation grid. This description is then given to a function that executes the transform, and does so depending on the nature of the data that is provided.

Initially, we should first go over the design principles that this PR adheres to

Geometry preprocessing design principles

  • Geometry data should be handled in the same way as Image data (as much as is possible)

    • tensor with metadata including affine transform
    • There may be special parameters analogous to "mode"
    • Some primitives must be represented as more than one tensor
      • Our design / public API should be robust to this even if we don't implement it now
  • We should avoid Image and Geometry versions of each transform if possible

    • We shouldn't modify the APIs of any transforms if we can avoid it
  • We should support pure Geometry applications

    • Geometry is not subordinate to Images
      • Should not require the presence of Images
  • Compose and geometry:

    • Not all pipelines using Array transforms will be expressible via Compose
      • All pipelines using Arrays will be expressible
    • All pipelines using Dictionaries will be expressible via Compose

Critical geometry design decisions

The principle design decision for geometry is what unit space looks like. Given the desire that geometry is not subordinate to image data, we should tie unit space to physical units (i.e. millimetres) rather than pixel / voxel space. This has the following consequences:

  • images
    • images generally have the concept of resolution, i.e. pixel/voxel extents in physical coordinates. We often resample images in a way that preserves the overall image extents by changing the resolution of the voxels as well as the number of voxels
      • e.g. 256 * 256 pixels @ (0.5 * 0.5) mm^2 = 128 * 128 mm^2 == 128 * 128 pixels @ (1.0 * 1.0) mm^2 = 128 * 128 mm^2
      • for the image being resampled, this can be done with the following affine transforms:
        • the first transform scales the image by 0.5mm / 1.0mm = 0.5. This also has the effect of doubling the viewport extents
        • the second transform scales the viewport extents by 0.5, which reduces the viewport back to the image border
      • the resulting transformation is defined here as being a raster-space transform
      • although the resolution of the image has changed, the geometric transformation is identity
    • geometry has no concept of resolution and so is only affected by world-space transformations

Why does this matter? We need to separate the notion of transforms for overall geometry and transforms for raster space.

  • images are affected by both raster-space and world-space transformations
  • geometry is only affected by world-space transformations

Most transforms express pure world-space operations

  • e.g. flip, orientation, rotate(keep_size=True), rotate90, zoom etc.
    The transforms that are image specific express raster-space operations that may or may not also have identity world-space transforms
    • e.g. resample, resample_to_match, rotate(keep_size=False), spatial_resample, spacing

Note that rotate is an interesting case. Rotate generally expresses pure world-space operations, but when keep_size is False, it specifies a raster-space transform that modifies the image viewport.

Categorising transforms

Not image-specific

  • transforms that, in principle, work the same for image and geometry
    • Affine(d), Flip(d), Orientation(d), Resize(d), Rotate(d), Rotate90(d), Zoom(d)
    • RandAffine(d), RandAxisFlip(d), RandFlip(d), RandRotate(d), RandRotate90(d), RandZoom(d)
  • the APIs for these functions should not need to change

Image-specific transforms

  • need a strategy to support geometry
    • RandSimulateLowResolution(d), Resize(d), Spacing(d), SpatialResample(d)
    • this might be different at the dictionary, array and functional levels
  • Special case
    • ResampleToMatch(d)
      • Can this be co-opted to update geometry to match images?

Transforms using grids

  • grid-based: what does a grid mean to geometry? does geometry just bypass the grid?
    • AffineGrid(d), RandAffineGrid(d)
  • distortion: how do grid-based distorions get applied to geometry?
    • GridPatch(d), GridSplit(d)
    • Rand2DElastic(d), Rand3DElastic(d), RandDeformGrid(d), GridDistortion(d), RandGridDistortion(d), RandGridPatch(d)

Array and Dictionary

  • Array transforms that are image-specific may need extra arguments
  • Dictionary transforms that are image-specific can have "geom" arguments
    • specify geometry tensor entries by name

Open questions

  • What does a grid mean to geometry?
    • Can grids just be applied to points?

Geometry specific transforms

The following transforms have been identified as needing to exist in some form:

  • LoadGeometry
  • TransformToImage
  • Translate

LoadGeometry

This transform loads geometric data. It can in theory wrap all identified geometric types but at a minimum it needs to support arrays of points.

  • The user can specify how to interpret those points
    • point cloud, open polygon, closed polygon (including winding), triangle soup
      Later we might also support loading of indexed meshes and higher order geometric primitives
      We might also want to support loading of other metadata such affine matrices

TransformToImage

This transform takes geometry data and maps it to a given image. This is useful when the geometry data doesn't have a defined space and must be mapped to an image.

  • Take on the image orientation
  • Transform the coordinate system
  • Centre relative to image center / specified corners

TransformLike

TransformLike is a way of aligning point clouds tensors with image tensors following transforms that only really work on images. Transforms such as Resize/Resized calculate a scaling matrix that can only be determined from the physical characteristics of an image. However, corresponding point cloud data still needs updating to match the transformed image. so TransformLike is provided to take the latest transform from applied or pending and either apply it or add it to pending on the point cloud tensors.

Note that TransformLike seems close in functionality to ResampleToMatch and extending the latter may be better than implementing the former.

Modifications to existing transforms

Array transforms

Array transforms specific to images might be modified so that geometry data can be updated. This can be done via additional operation parameters that take a tensor or tuple of tensors:

class Spacing(InvertibleTransform, LazyTransform):
    def __call__(
        self, data_array, mode, padding_mode, align_corners, dtype, scale_extent, output_spatial_shape, lazy,
        inputs_to_update # New
    ):

Another way to achieve this would be to add an extra key field that indicates fields that should have the recent transform applied to them, but this would have to be added to all spatial dictionary transforms.

Dictionary transforms

Dictionary transforms specific to images can refer to geometry by name rather than requiring to pass tensors in directly

class Spacingd(MapTransform, InvertibleTransform, LazyTransform):
    def __init__(
        self, keys, pixdim, diagonal, mode, padding_mode, align_corners, dtype, scale_extent,
        recompute_affine, min_pixdim, max_pixdim, ensure_same_shape, allow_missing_keys,
        input_keys_to_update # New
    ):

An example pure transform

All pure transforms are implemented as pure functions. The array and dictionary, class-based transforms simply wrap the functional transform and pass the necessary parameters to it. This is true for both deterministic and random transforms.

def rotate(
        img: torch.Tensor,
        angle: Sequence[float] | float,
        keep_size: bool = True,
        mode: InterpolateMode | str = InterpolateMode.AREA,
        padding_mode: NumpyPadMode | GridSamplePadMode | str = NumpyPadMode.EDGE,
        align_corners: bool = False,
        dtype: DtypeLike | torch.dtype = None,
        shape_override: Sequence[int] | None = None,
        dtype_override: DtypeLike | torch.dtype = None,
        lazy: bool = False
):
    # get the various arguments into the expected form
    img_ = convert_to_tensor(img, track_meta=get_track_meta())
    mode_ = look_up_option(mode, GridSampleMode)
    padding_mode_ = look_up_option(padding_mode, GridSamplePadMode)
    dtype_ = get_equivalent_dtype(dtype or img_.dtype, torch.Tensor)

    # get the input shape and input dtype, either from the image itself or from the pending transforms
    input_shape, input_dtype = get_input_shape_and_dtype(shape_override, dtype_override, img_)

    # get the angles; ignore all but the first value if the data is 2D
    input_ndim = len(input_shape) - 1
    if input_ndim not in (2, 3):
        raise ValueError(f"Unsupported image dimension: {input_ndim}, available options are [2, 3].")
    angle_ = ensure_tuple_rep(angle, 1 if input_ndim == 2 else 3)

    # calculate the transform and the output shape for the image
    rotate_tx = create_rotate(input_ndim, angle_).astype(np.float64)
    output_shape = input_shape if keep_size is True else transform_shape(input_shape, rotate_tx)

    # create the full transform description
    metadata = {
        "transform": transform,
        "op": "rotate",
        "angle": angle,
        "keep_size": keep_size,
        LazyAttr.INTERP_MODE: mode_,
        LazyAttr.PADDING_MODE: padding_mode_,
        LazyAttr.ALIGN_CORNERS: align_corners,
        LazyAttr.IN_SHAPE: input_shape,
        LazyAttr.IN_DTYPE: input_dtype,
        LazyAttr.OUT_SHAPE: output_shape,
        LazyAttr.OUT_DTYPE: dtype_,
    }

    # depending on the state of the 'lazy' argument, either apply the transform now or add it to the pending list
    return lazily_apply_op(img_, metadata, lazy)

Applying transform descriptions

Applying transform descriptions is simple matter of writing a resampler that knows how to handle point data. We add a kind property to MetaTensor (currently "pixel" and "point") to indicate what kind of tensor data a given tensor contains and when we arrive at the point of resampling we check the tensor kind and call the appropriate function, resample_image or resample_points. This will update the tensor accordingly when passed either an affine matrix or a deformation field.

The call stack is

lazily_apply_op -> apply_pending -> resample_image | resample_points

An example pipeline

im_keys = ('image', 'label')
pt_keys = ('rater 1', 'rater 2')
[
    LoadImaged(keys='im_keys'),
    LoadPointsd(keys='pt_keys'), #not currently in this PR, should be the only points specific tx needed
    Resized(keys='im_keys', ...),
    TransformLiked(keys='pt_keys', reference_key='image'),
    Rotate(keys='im_keys' + 'pt_keys', ...),
]

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

@atbenmurray
Copy link
Contributor Author

@ericspod @KumoLiu @Nic-Ma @vikashg please take a look and let me know if anything is unclear!

@KumoLiu
Copy link
Contributor

KumoLiu commented Jan 19, 2024

Hi Ben, first thank you for the detailed PR and explanation; I have a few questions according to your proposal:

so 'TransformLike' is provided to take the latest transform from applied or pending and either apply it or add it to pending on the point cloud tensors.

  • Did you add TransformLike in this PR? I didn't find it. I'm curious how this transform is implemented. Does it mean we need to add TransformLike after each transform we apply to the image? I didn't fully understand the example you posted.

This approach uses the concept of pure transforms. These are transforms that don't directly manipulate and resample data, but rather simply specify the operation that is to be carried out in terms of either an affine matrix or a deformation grid. This description is then given to a function that executes the transform and does so depending on the nature of the data that is provided

  • If I understand correctly, it means we apply the resample all the time no matter lazy or eager, right? All transforms will end up with lazily_apply_op, will that make the transform much slower? Some transforms like Flip can be much faster if not using resampling.

  • This proposal may mainly be focused on spatial transforms, what about crop or pad?

  • What does this MetaMatrix refer to here, meta info? If so, then why not directly using meta dict?

    If passed a Tensor, it returns a tuple of Tensor, MetaMatrix:

@atbenmurray
Copy link
Contributor Author

atbenmurray commented Jan 19, 2024

Hi Ben, first thank you for the detailed PR and explanation; I have a few questions according to your proposal:

Thanks for the detailed feedback!

so 'TransformLike' is provided to take the latest transform from applied or pending and either apply it or add it to pending on the point cloud tensors.

  • Did you add TransformLike in this PR? I didn't find it. I'm curious how this transform is implemented. Does it mean we need to add TransformLike after each transform we apply to the image? I didn't fully understand the example you posted.

No, I haven't implemented TransformLike yet, but I can sketch it out and add it to the PR. I think it is something that all of the approaches would need in any case, because there are transforms that require image dimensions to correctly calculate, so the approaches in #7385 and #7386 would likely also need such a transform.

This approach uses the concept of pure transforms. These are transforms that don't directly manipulate and resample data, but rather simply specify the operation that is to be carried out in terms of either an affine matrix or a deformation grid. This description is then given to a function that executes the transform and does so depending on the nature of the data that is provided

  • If I understand correctly, it means we apply the resample all the time no matter lazy or eager, right? All transforms will end up with lazily_apply_op, will that make the transform much slower? Some transforms like Flip can be much faster if not using resampling.

Ahh, no. We pass the lazy flag to lazily_apply_op. If lazy is false, a resample will happen right away. If lazy is true, the transform description is added to the list of pending transforms. I'll make that clearer in the text and documentation.

The resampling operation analyses the matrix that is passed to it and uses the cheapest operation it can use, and will always attempt to do so in a lossless fashion if possible. The logic is as follows:

  1. If the matrix permits it, use a tensor operation (e.g. Flip)
  2. Otherwise, if the matrix permits it, use an interpolate operation (e.g. in place scale / translate)
  3. Otherwise, if the matrix permits it, use a gridless resample (this should be most operations but we don't have the backend code for such an operation at present)
  4. Otherwise, use a grid-based resample

The benefit of such a 'universal' resampler is that the logic for efficiently resampling data can be taken out of the transforms themselves and put somewhere where every transform can benefit from improvements in resampling operations. This is a key part of what allows the pure transforms without any downsides. When operating lazily, it also allows for cases where the cumulative transform is an interpolate or even a tensor operation, even if the individual transforms wouldn't do so.

  • This proposal may mainly be focused on spatial transforms, what about crop or pad?

I have examples of this for crop and pad that I can add to the PR.

Ooops! This is adapted from the lazy resampling implementation that the paper was based on. MetaMatrix should be gone, and it is my mistake if there are some left in there. I'll tidy it up.

@atbenmurray atbenmurray self-assigned this Feb 16, 2024
@KumoLiu KumoLiu added this to the Geometric Transform API milestone Mar 1, 2024
@atbenmurray
Copy link
Contributor Author

@KumoLiu @Nic-Ma @ericspod I think we can close this PR. I have a branch on my fork of MONAI that significantly updates this work.

@ericspod ericspod closed this Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants