Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] cuCIM as a scikit-image backend #829

Open
Schefflera-Arboricola opened this issue Feb 11, 2025 · 8 comments
Open

[FEA] cuCIM as a scikit-image backend #829

Schefflera-Arboricola opened this issue Feb 11, 2025 · 8 comments
Labels
feature request New feature or request

Comments

@Schefflera-Arboricola
Copy link

Schefflera-Arboricola commented Feb 11, 2025

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I wish I could use cuCIM to do [...]

Currently, in scikit-image we are developing a dispatching mechanism that would allow function calls to be rerouted to different backend packages(like cucim). This means users would be able to seamlessly use cuCIM as a backend for scikit-image, getting significant speed improvements without rewriting much of their code.

So, cuCIM as a backend would look something like:

import os
from sklearn.metrics import mean_squared_error
import cupy as cp

os.environ["SKIMAGE_BACKENDS"] = "cucim"

img0 = cp.random.randint(0, 256, (256, 256, 3), dtype=cp.uint8)
img1 = cp.random.randint(0, 256, (256, 256, 3), dtype=cp.uint8)

print(mean_squared_error(img0, img1))  # Uses cuCIM's implementation of mean_squared_error, not scikit-image's

This setup allows users to benefit from GPU acceleration while keeping their familiar scikit-image API. We also plan on providing documentation and testing support for backends.

Also, you can actually run the above code, if you have a GPU setup, and then you would have to:

  • create a local development branch and add the entry-points and the interface and info, as described in the next section
  • install scikit-image dispatching development branch --> pip install git+https://github.com/Schefflera-Arboricola/scikit-image.git@patch-1 (make sure to uninstall any other scikit-image versions)
  • successfully run the above code

I don't have GPUs and I tried building cucim from source in google colab(with runtime type as GPU), but I couldn't.


Describe the solution you'd like
A clear and concise description of what you want to happen.

To make cuCIM into a backend package I think we will only need to add two entry-points in the project's pyproject.toml(here), and the objects those entry-points are referring to:

[project.entry-points.skimage_backends]
cucim = "cucim.skimage:backend_interface"

[project.entry-points.skimage_backend_infos]
cucim = "cucim.skimage:info"
  • info: Currently, it tells scikit-image which functions are supported by cuCIM. It is a function that returns a BackendInformation object, which has an attribute named supported_functions whose value is a list of function names as strings(in this format: "public_module_name:function_name") supported by cuCIM(the backend). Also, BackendInformation is defined in scikit-image. And in future we plan to use the BackendInformation to let backend provide more additional information about itself and it's supported functions.

  • backend_interface: it's a namespace containing two functions:

    • can_has: Quickly checks if cuCIM can handle a given function call. It takes in the function name and the args and kwargs passed in by the user and does an inexpensive, initial check about weather cuCIM can handle these args or not(like checking the type of the args, etc.) and based on that return True or False.
    • get_implementation: Returns the actual cuCIM function to execute. If can_has returns True then the get_implementation is called, and it returns the function callable which is then called and the backend implementation is run.

Here is a dummy backend for your reference: https://github.com/Schefflera-Arboricola/skimage-j4f


Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

This backend mechanism is still a work in progress and we would really like to know the interest of the cucim community in this and any feedback you all might have on how this dispatching machinery could be improved to better accommodate the needs of backends, such as cucim.

If you are interested, you can also consider joining scikit-image's dispatching meetings:


Additional context
Add any other context, code examples, or references to existing implementations about the feature request here.

https://drive.google.com/file/d/1xHLs6rK1P1XGt83ueL-DUbPO-dF0ZKFQ/view?usp=drive_link


Looking forward to your feedback!

Thank you :)

@Schefflera-Arboricola Schefflera-Arboricola added the feature request New feature or request label Feb 11, 2025
@grlee77
Copy link
Contributor

grlee77 commented Feb 11, 2025

Hi @Schefflera-Arboricola, it's great to see this making progress on the scikit-image side! I remember we started looking at it at EuroSciPy and had seen that there had been some updates from you and Tim in the scikit-image repo, but was a bit out of date on the latest progress.

I am interested in helping implement a backend for cuCIM. From March I may be able to have a bit more dedicated time for it, but we can try to make some initial progress before then. Let me review the information you have provided and post any follow-up questions here.

The current meeting time is not feasible for me (3 AM on US east coast), but I am fine to collaborate asynchronously here initially and we can schedule a separate meeting if needed later on to discuss in person.

@grlee77
Copy link
Contributor

grlee77 commented Feb 14, 2025

I posted one question about whether the plan on the scikit-image side is to initially only mark a couple functions as dispatchable?
https://github.com/scikit-image/scikit-image/pull/7520/files#r1956590515

Also, FYI @Schefflera-Arboricola , for this project we have a directory layout that isn't very common. The python pyproject.toml relevant to defining the endpoints is here:
https://github.com/rapidsai/cucim/blob/branch-25.04/python/cucim/pyproject.toml

And the following subfolder has an equivalent layout to scikit-image itself
https://github.com/rapidsai/cucim/tree/branch-25.04/python/cucim/src/cucim/skimage

Because we track the upstream scikit-image API, it should be straightforward to use cuCIM as a backend. We only need to handle copying data to/from the host if a NumPy array was provided (possibly with some size threshold where we say we don't want it if it is less than 500kB in size, for example). We can also return False for can_has on any image inputs that are not already an array (e.g. we don't want to automatically promote a list to a CuPy array)

@Schefflera-Arboricola
Copy link
Author

I posted one question about whether the plan on the scikit-image side is to initially only mark a couple functions as dispatchable?
https://github.com/scikit-image/scikit-image/pull/7520/files#r1956590515

Answered , here - https://github.com/scikit-image/scikit-image/pull/7520/files#r1956593052

We only need to handle copying data to/from the host if a NumPy array was provided (possibly with some size threshold where we say we don't want it if it is less than 500kB in size, for example). We can also return False for can_has on any image inputs that are not already an array (e.g. we don't want to automatically promote a list to a CuPy array)

In the dispatching discussions so far that we have been having in scikit-image, we have been assuming that the user would be passing the array type that the backend supports and we(scikit-image) or the backend(s) will not be(or should not -- because it's expensive and/or not feasible for some array types) doing any array conversions. And can_has should be used by the backends to do an initial type-check or any other inexpensive check(s) on the passed in args. But, if you think that the array conversion(from numpy to cupy and cupy to numpy) would be a useful thing for the users then we can start talking more about it.

And if you want, you can use can_has to check the size and convert the NumPy array into a CuPy array but you should not, because can_has is meant to be an inexpensive check before we load and call the backend implementation. But please give your feedback on if/how this should/can be improved from a perspective of a backend user and a backend developer. Thanks!

@grlee77
Copy link
Contributor

grlee77 commented Feb 18, 2025

And if you want, you can use can_has to check the size and convert the NumPy array into a CuPy array but you should not, because can_has is meant to be an inexpensive check before we load and call the backend implementation. But please give your feedback on if/how this should/can be improved from a perspective of a backend user and a backend developer. Thanks

Right, I would not want to put any conversion in can_has. The question is if on the cuCIM side we would return False on any numpy input or if we want to provide a backend the would allow round-trip host/device transfer as needed. Currently for the cuCIM functions as-is they only accept CuPy array inputs.

@Schefflera-Arboricola
Copy link
Author

Right, I would not want to put any conversion in can_has.

ok, but, do you think having array conversions as part of the dispatching mechanism would be a helpful thing to have?

When a numpy array is passed then we can convert and then cache that converted cupy array. And then for the next function(that will be dispatched in the same runtime) that image(or numpy array) would not need to be converted again, we can use the cached cupy array. But, will that be a good thing to do? Also, we will have to perform the conversion again at the end for the returned array (from cupy to numpy), if we want the input array type and the output array type to be same(and I think @stefanv in a meeting said that we want that-- i.e. to have input and output arrays of the same array type). Also, we can make this conversion step optional if you think this kind of "conversion and caching" will be beneficial for some types of arrays. LMKWYT.

The question is if on the cuCIM side we would return False on any numpy input or if we want to provide a backend the would allow round-trip host/device transfer as needed.

I'm not sure if I understand the second part of your question(i.e. ...or if we want to provide a backend the would allow round-trip host/device transfer as needed.) correctly. I think you mean-- if a numpy array is passed then cucim would transfer the call back to scikit-image's native implementation, right?

But, if cucim's can_has will return False then the call will be transferred back to scikit-image and we will move on to the next backend in the backend priority list. And if none of the backends accept the dispatched call, then it will fall back to the scikit-image's own implementation(with a warning msg). (fyi, the backend priority and this falling-back is implemented in PR betatim/scikit-image#1 and not in PR scikit-image/scikit-image#7520)

Currently for the cuCIM functions as-is they only accept CuPy array inputs.

I think a check like this in can_has would be good then -- hasattr(arr, "__module__") and arr.__module__.startswith("cupy")

@grlee77
Copy link
Contributor

grlee77 commented Feb 19, 2025

| and I think @stefanv in a meeting said that we want that-- i.e. to have input and output arrays of the same array type

Definitely agree with @stefanv that it is a much cleaner user experience if the returned array type matches the user's provided array type.

For example, it would be easy to quickly accept any array implementing Numba's CUDA array interface via an inexpensive zero-copy conversion (i.e. the existing data pointer is reused without making a copy)

# zero-copy conversion of GPU array to a CUDA array
if hasattr(arr, "__cuda_array_interface__"):
    arr = cp.asarray(arr)

but I don't know how the conversion of the CuPy array output back to the original type could be handled as that mechanism would be library-specific. It would be low cost since there is no copy, but would not comply with the requirement for having the output type match the input type. Given that, I don't think we should try to support arbitrary array conversions.

I do think it would make sense to potentially support NumPy arrays specifically, though, as that is the native array type used by scikit-image.

But, if cucim's can_has will return False then the call will be transferred back to scikit-image and we will move on to the next backend in the backend priority list. And if none of the backends accept the dispatched call, then it will fall back to the scikit-image's own implementation(with a warning msg).

Yes, I understood that part. The question is how to handle logic to allow can_has to return True for NumPy inputs. Here are a couple of options:

  • cuCIM could implement some decorator that could be applied to all cucim.skimage functions so they would call cupy.asarray on NumPy inputs before calling the wrapped function and then cupy.asnumpy on the output array (if the inputs were numpy arrays). This seems like a not very elegant approach, though.
  • Perhaps the scikit-image backend can be extended to provide a way for the a backend to register an optional pair of functions for numpy_to_native_array and native_array_to_numpy. If those functions were provided, then on the scikit-image side, the dispatchable decorator could call numpy_to_native_array on each array input to the wrapped func. Similarly, native_array_to_numpy would be called again on any array outputs of func.

On the cuCIM side, those two functions would just be (minus adding some potentially error checking):

def numpy_to_native_array(arr):
    return cp.asarray(arr)

def native_array_to_numpy(arr):
    return cp.asnumpy(arr)

@stefanv
Copy link

stefanv commented Feb 19, 2025

but I don't know how the conversion of the CuPy array output back to the original type could be handled as that mechanism would be library-specific. It would be low cost since there is no copy, but would not comply with the requirement for having the output type match the input type. Given that, I don't think we should try to support arbitrary array conversions.

Y'all have thought much more about this than I have, so feel free to disregard my 0.01c.

It sounds like forcing conversion onto the backend may make it perform some non-optimal decisions. E.g., it may be better for a certain backend to provide sparse results, instead of dense, but now it'd be forced to produce NumPy arrays.

If you do not return the same type as input, or a type that can be converted to NumPy Array via asarray, it prohibits building pipelines, since the subsequent function invocation will not be able to handle the output of the previous step. This makes switching the backend, and comparing implementations, impossible.

I think the concepts we want to enforce, therefore, may be:

  • The result of any operation must be the same across backends (within pragmatic bounds), regardless of which containers are used.
  • We must be able to build pipelines, like we currently can, by chaining functions, even when switching backends.

@stefanv
Copy link

stefanv commented Feb 19, 2025

  • Perhaps the scikit-image backend can be extended to provide a way for the a backend to register an optional pair of functions for numpy_to_native_array and native_array_to_numpy. If those functions were provided, then on the scikit-image side, the dispatchable decorator could call numpy_to_native_array on each array input to the wrapped func. Similarly, native_array_to_numpy would be called again on any array outputs of func.

Does it make sense to give the backend selector parameters? Here, it sounds like you may have a highly efficient backend selector: "only accept for dispatch if dealing with cupy arrays", or a more aggressive, but less efficient backend selector: "accept anything that can be converted to a cupy array".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants