[FEA] cuCIM as a scikit-image backend #829

Schefflera-Arboricola · 2025-02-11T17:27:56Z

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I wish I could use cuCIM to do [...]

Currently, in scikit-image we are developing a dispatching mechanism that would allow function calls to be rerouted to different backend packages(like cucim). This means users would be able to seamlessly use cuCIM as a backend for scikit-image, getting significant speed improvements without rewriting much of their code.

So, cuCIM as a backend would look something like:

import os
from sklearn.metrics import mean_squared_error
import cupy as cp

os.environ["SKIMAGE_BACKENDS"] = "cucim"

img0 = cp.random.randint(0, 256, (256, 256, 3), dtype=cp.uint8)
img1 = cp.random.randint(0, 256, (256, 256, 3), dtype=cp.uint8)

print(mean_squared_error(img0, img1))  # Uses cuCIM's implementation of mean_squared_error, not scikit-image's

This setup allows users to benefit from GPU acceleration while keeping their familiar scikit-image API. We also plan on providing documentation and testing support for backends.

Also, you can actually run the above code, if you have a GPU setup, and then you would have to:

create a local development branch and add the entry-points and the interface and info, as described in the next section
install scikit-image dispatching development branch --> pip install git+https://github.com/Schefflera-Arboricola/scikit-image.git@patch-1 (make sure to uninstall any other scikit-image versions)
successfully run the above code

I don't have GPUs and I tried building cucim from source in google colab(with runtime type as GPU), but I couldn't.

Describe the solution you'd like
A clear and concise description of what you want to happen.

To make cuCIM into a backend package I think we will only need to add two entry-points in the project's pyproject.toml(here), and the objects those entry-points are referring to:

[project.entry-points.skimage_backends]
cucim = "cucim.skimage:backend_interface"

[project.entry-points.skimage_backend_infos]
cucim = "cucim.skimage:info"

info: Currently, it tells scikit-image which functions are supported by cuCIM. It is a function that returns a BackendInformation object, which has an attribute named supported_functions whose value is a list of function names as strings(in this format: "public_module_name:function_name") supported by cuCIM(the backend). Also, BackendInformation is defined in scikit-image. And in future we plan to use the BackendInformation to let backend provide more additional information about itself and it's supported functions.
backend_interface: it's a namespace containing two functions:
- can_has: Quickly checks if cuCIM can handle a given function call. It takes in the function name and the args and kwargs passed in by the user and does an inexpensive, initial check about weather cuCIM can handle these args or not(like checking the type of the args, etc.) and based on that return True or False.
- get_implementation: Returns the actual cuCIM function to execute. If can_has returns True then the get_implementation is called, and it returns the function callable which is then called and the backend implementation is run.

Here is a dummy backend for your reference: https://github.com/Schefflera-Arboricola/skimage-j4f

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

This backend mechanism is still a work in progress and we would really like to know the interest of the cucim community in this and any feedback you all might have on how this dispatching machinery could be improved to better accommodate the needs of backends, such as cucim.

If you are interested, you can also consider joining scikit-image's dispatching meetings:

meeting link : https://meet.jit.si/scikit-image-dispatching (Wednesdays 8-9 am UTC)
calendar invite link : https://calendar.app.google/HnMpJCUP591xzTAn7
meeting notes - https://hackmd.io/@betatim/SJlpIwQgyl/edit

Additional context
Add any other context, code examples, or references to existing implementations about the feature request here.

initial implementation - PR Basic infrastructure for dispatching to a backend scikit-image/scikit-image#7520
more dispatching developments going on at - PR Enabling backend priority betatim/scikit-image#1
scikit-image dispatching summary diagram:

https://drive.google.com/file/d/1xHLs6rK1P1XGt83ueL-DUbPO-dF0ZKFQ/view?usp=drive_link

inspired by NetworkX's entry-point based dispatching mechanism : https://networkx.org/documentation/latest/reference/backends.html

Looking forward to your feedback!

Thank you :)

The text was updated successfully, but these errors were encountered:

grlee77 · 2025-02-11T18:56:17Z

Hi @Schefflera-Arboricola, it's great to see this making progress on the scikit-image side! I remember we started looking at it at EuroSciPy and had seen that there had been some updates from you and Tim in the scikit-image repo, but was a bit out of date on the latest progress.

I am interested in helping implement a backend for cuCIM. From March I may be able to have a bit more dedicated time for it, but we can try to make some initial progress before then. Let me review the information you have provided and post any follow-up questions here.

The current meeting time is not feasible for me (3 AM on US east coast), but I am fine to collaborate asynchronously here initially and we can schedule a separate meeting if needed later on to discuss in person.

grlee77 · 2025-02-14T19:00:51Z

I posted one question about whether the plan on the scikit-image side is to initially only mark a couple functions as dispatchable?
https://github.com/scikit-image/scikit-image/pull/7520/files#r1956590515

Also, FYI @Schefflera-Arboricola , for this project we have a directory layout that isn't very common. The python pyproject.toml relevant to defining the endpoints is here:
https://github.com/rapidsai/cucim/blob/branch-25.04/python/cucim/pyproject.toml

And the following subfolder has an equivalent layout to scikit-image itself
https://github.com/rapidsai/cucim/tree/branch-25.04/python/cucim/src/cucim/skimage

Because we track the upstream scikit-image API, it should be straightforward to use cuCIM as a backend. We only need to handle copying data to/from the host if a NumPy array was provided (possibly with some size threshold where we say we don't want it if it is less than 500kB in size, for example). We can also return False for can_has on any image inputs that are not already an array (e.g. we don't want to automatically promote a list to a CuPy array)

Schefflera-Arboricola · 2025-02-14T20:00:04Z

I posted one question about whether the plan on the scikit-image side is to initially only mark a couple functions as dispatchable?
https://github.com/scikit-image/scikit-image/pull/7520/files#r1956590515

Answered , here - https://github.com/scikit-image/scikit-image/pull/7520/files#r1956593052

We only need to handle copying data to/from the host if a NumPy array was provided (possibly with some size threshold where we say we don't want it if it is less than 500kB in size, for example). We can also return False for can_has on any image inputs that are not already an array (e.g. we don't want to automatically promote a list to a CuPy array)

In the dispatching discussions so far that we have been having in scikit-image, we have been assuming that the user would be passing the array type that the backend supports and we(scikit-image) or the backend(s) will not be(or should not -- because it's expensive and/or not feasible for some array types) doing any array conversions. And can_has should be used by the backends to do an initial type-check or any other inexpensive check(s) on the passed in args. But, if you think that the array conversion(from numpy to cupy and cupy to numpy) would be a useful thing for the users then we can start talking more about it.

And if you want, you can use can_has to check the size and convert the NumPy array into a CuPy array but you should not, because can_has is meant to be an inexpensive check before we load and call the backend implementation. But please give your feedback on if/how this should/can be improved from a perspective of a backend user and a backend developer. Thanks!

grlee77 · 2025-02-18T20:05:32Z

And if you want, you can use can_has to check the size and convert the NumPy array into a CuPy array but you should not, because can_has is meant to be an inexpensive check before we load and call the backend implementation. But please give your feedback on if/how this should/can be improved from a perspective of a backend user and a backend developer. Thanks

Right, I would not want to put any conversion in can_has. The question is if on the cuCIM side we would return False on any numpy input or if we want to provide a backend the would allow round-trip host/device transfer as needed. Currently for the cuCIM functions as-is they only accept CuPy array inputs.

Schefflera-Arboricola · 2025-02-19T04:27:10Z

Right, I would not want to put any conversion in can_has.

ok, but, do you think having array conversions as part of the dispatching mechanism would be a helpful thing to have?

When a numpy array is passed then we can convert and then cache that converted cupy array. And then for the next function(that will be dispatched in the same runtime) that image(or numpy array) would not need to be converted again, we can use the cached cupy array. But, will that be a good thing to do? Also, we will have to perform the conversion again at the end for the returned array (from cupy to numpy), if we want the input array type and the output array type to be same(and I think @stefanv in a meeting said that we want that-- i.e. to have input and output arrays of the same array type). Also, we can make this conversion step optional if you think this kind of "conversion and caching" will be beneficial for some types of arrays. LMKWYT.

The question is if on the cuCIM side we would return False on any numpy input or if we want to provide a backend the would allow round-trip host/device transfer as needed.

I'm not sure if I understand the second part of your question(i.e. ...or if we want to provide a backend the would allow round-trip host/device transfer as needed.) correctly. I think you mean-- if a numpy array is passed then cucim would transfer the call back to scikit-image's native implementation, right?

But, if cucim's can_has will return False then the call will be transferred back to scikit-image and we will move on to the next backend in the backend priority list. And if none of the backends accept the dispatched call, then it will fall back to the scikit-image's own implementation(with a warning msg). (fyi, the backend priority and this falling-back is implemented in PR betatim/scikit-image#1 and not in PR scikit-image/scikit-image#7520)

Currently for the cuCIM functions as-is they only accept CuPy array inputs.

I think a check like this in can_has would be good then -- hasattr(arr, "__module__") and arr.__module__.startswith("cupy")

grlee77 · 2025-02-19T22:20:42Z

| and I think @stefanv in a meeting said that we want that-- i.e. to have input and output arrays of the same array type

Definitely agree with @stefanv that it is a much cleaner user experience if the returned array type matches the user's provided array type.

For example, it would be easy to quickly accept any array implementing Numba's CUDA array interface via an inexpensive zero-copy conversion (i.e. the existing data pointer is reused without making a copy)

# zero-copy conversion of GPU array to a CUDA array
if hasattr(arr, "__cuda_array_interface__"):
    arr = cp.asarray(arr)

but I don't know how the conversion of the CuPy array output back to the original type could be handled as that mechanism would be library-specific. It would be low cost since there is no copy, but would not comply with the requirement for having the output type match the input type. Given that, I don't think we should try to support arbitrary array conversions.

I do think it would make sense to potentially support NumPy arrays specifically, though, as that is the native array type used by scikit-image.

But, if cucim's can_has will return False then the call will be transferred back to scikit-image and we will move on to the next backend in the backend priority list. And if none of the backends accept the dispatched call, then it will fall back to the scikit-image's own implementation(with a warning msg).

Yes, I understood that part. The question is how to handle logic to allow can_has to return True for NumPy inputs. Here are a couple of options:

cuCIM could implement some decorator that could be applied to all cucim.skimage functions so they would call cupy.asarray on NumPy inputs before calling the wrapped function and then cupy.asnumpy on the output array (if the inputs were numpy arrays). This seems like a not very elegant approach, though.
Perhaps the scikit-image backend can be extended to provide a way for the a backend to register an optional pair of functions for numpy_to_native_array and native_array_to_numpy. If those functions were provided, then on the scikit-image side, the dispatchable decorator could call numpy_to_native_array on each array input to the wrapped func. Similarly, native_array_to_numpy would be called again on any array outputs of func.

On the cuCIM side, those two functions would just be (minus adding some potentially error checking):

def numpy_to_native_array(arr):
    return cp.asarray(arr)

def native_array_to_numpy(arr):
    return cp.asnumpy(arr)

stefanv · 2025-02-19T22:36:26Z

but I don't know how the conversion of the CuPy array output back to the original type could be handled as that mechanism would be library-specific. It would be low cost since there is no copy, but would not comply with the requirement for having the output type match the input type. Given that, I don't think we should try to support arbitrary array conversions.

Y'all have thought much more about this than I have, so feel free to disregard my 0.01c.

It sounds like forcing conversion onto the backend may make it perform some non-optimal decisions. E.g., it may be better for a certain backend to provide sparse results, instead of dense, but now it'd be forced to produce NumPy arrays.

If you do not return the same type as input, or a type that can be converted to NumPy Array via asarray, it prohibits building pipelines, since the subsequent function invocation will not be able to handle the output of the previous step. This makes switching the backend, and comparing implementations, impossible.

I think the concepts we want to enforce, therefore, may be:

The result of any operation must be the same across backends (within pragmatic bounds), regardless of which containers are used.
We must be able to build pipelines, like we currently can, by chaining functions, even when switching backends.

stefanv · 2025-02-19T22:40:02Z

Perhaps the scikit-image backend can be extended to provide a way for the a backend to register an optional pair of functions for numpy_to_native_array and native_array_to_numpy. If those functions were provided, then on the scikit-image side, the dispatchable decorator could call numpy_to_native_array on each array input to the wrapped func. Similarly, native_array_to_numpy would be called again on any array outputs of func.

Does it make sense to give the backend selector parameters? Here, it sounds like you may have a highly efficient backend selector: "only accept for dispatch if dealing with cupy arrays", or a more aggressive, but less efficient backend selector: "accept anything that can be converted to a cupy array".

Schefflera-Arboricola added the feature request New feature or request label Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] cuCIM as a scikit-image backend #829

[FEA] cuCIM as a scikit-image backend #829

Schefflera-Arboricola commented Feb 11, 2025 •

edited

Loading

grlee77 commented Feb 11, 2025

grlee77 commented Feb 14, 2025 •

edited

Loading

Schefflera-Arboricola commented Feb 14, 2025

grlee77 commented Feb 18, 2025

Schefflera-Arboricola commented Feb 19, 2025

grlee77 commented Feb 19, 2025 •

edited

Loading

stefanv commented Feb 19, 2025

stefanv commented Feb 19, 2025

[FEA] cuCIM as a scikit-image backend #829

[FEA] cuCIM as a scikit-image backend #829

Comments

Schefflera-Arboricola commented Feb 11, 2025 • edited Loading

grlee77 commented Feb 11, 2025

grlee77 commented Feb 14, 2025 • edited Loading

Schefflera-Arboricola commented Feb 14, 2025

grlee77 commented Feb 18, 2025

Schefflera-Arboricola commented Feb 19, 2025

grlee77 commented Feb 19, 2025 • edited Loading

stefanv commented Feb 19, 2025

stefanv commented Feb 19, 2025

Schefflera-Arboricola commented Feb 11, 2025 •

edited

Loading

grlee77 commented Feb 14, 2025 •

edited

Loading

grlee77 commented Feb 19, 2025 •

edited

Loading