-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] cuCIM as a scikit-image backend #829
Comments
Hi @Schefflera-Arboricola, it's great to see this making progress on the scikit-image side! I remember we started looking at it at EuroSciPy and had seen that there had been some updates from you and Tim in the scikit-image repo, but was a bit out of date on the latest progress. I am interested in helping implement a backend for cuCIM. From March I may be able to have a bit more dedicated time for it, but we can try to make some initial progress before then. Let me review the information you have provided and post any follow-up questions here. The current meeting time is not feasible for me (3 AM on US east coast), but I am fine to collaborate asynchronously here initially and we can schedule a separate meeting if needed later on to discuss in person. |
I posted one question about whether the plan on the scikit-image side is to initially only mark a couple functions as dispatchable? Also, FYI @Schefflera-Arboricola , for this project we have a directory layout that isn't very common. The python And the following subfolder has an equivalent layout to scikit-image itself Because we track the upstream scikit-image API, it should be straightforward to use cuCIM as a backend. We only need to handle copying data to/from the host if a NumPy array was provided (possibly with some size threshold where we say we don't want it if it is less than 500kB in size, for example). We can also return False for |
Answered , here - https://github.com/scikit-image/scikit-image/pull/7520/files#r1956593052
In the dispatching discussions so far that we have been having in scikit-image, we have been assuming that the user would be passing the array type that the backend supports and we(scikit-image) or the backend(s) will not be(or should not -- because it's expensive and/or not feasible for some array types) doing any array conversions. And And if you want, you can use |
Right, I would not want to put any conversion in |
ok, but, do you think having array conversions as part of the dispatching mechanism would be a helpful thing to have? When a numpy array is passed then we can convert and then cache that converted cupy array. And then for the next function(that will be dispatched in the same runtime) that image(or numpy array) would not need to be converted again, we can use the cached cupy array. But, will that be a good thing to do? Also, we will have to perform the conversion again at the end for the returned array (from cupy to numpy), if we want the input array type and the output array type to be same(and I think @stefanv in a meeting said that we want that-- i.e. to have input and output arrays of the same array type). Also, we can make this conversion step optional if you think this kind of "conversion and caching" will be beneficial for some types of arrays. LMKWYT.
I'm not sure if I understand the second part of your question(i.e. But, if cucim's
I think a check like this in |
| and I think @stefanv in a meeting said that we want that-- i.e. to have input and output arrays of the same array type Definitely agree with @stefanv that it is a much cleaner user experience if the returned array type matches the user's provided array type. For example, it would be easy to quickly accept any array implementing Numba's CUDA array interface via an inexpensive zero-copy conversion (i.e. the existing data pointer is reused without making a copy) # zero-copy conversion of GPU array to a CUDA array
if hasattr(arr, "__cuda_array_interface__"):
arr = cp.asarray(arr) but I don't know how the conversion of the CuPy array output back to the original type could be handled as that mechanism would be library-specific. It would be low cost since there is no copy, but would not comply with the requirement for having the output type match the input type. Given that, I don't think we should try to support arbitrary array conversions. I do think it would make sense to potentially support NumPy arrays specifically, though, as that is the native array type used by scikit-image.
Yes, I understood that part. The question is how to handle logic to allow
On the cuCIM side, those two functions would just be (minus adding some potentially error checking): def numpy_to_native_array(arr):
return cp.asarray(arr)
def native_array_to_numpy(arr):
return cp.asnumpy(arr) |
Y'all have thought much more about this than I have, so feel free to disregard my 0.01c. It sounds like forcing conversion onto the backend may make it perform some non-optimal decisions. E.g., it may be better for a certain backend to provide sparse results, instead of dense, but now it'd be forced to produce NumPy arrays. If you do not return the same type as input, or a type that can be converted to NumPy Array via asarray, it prohibits building pipelines, since the subsequent function invocation will not be able to handle the output of the previous step. This makes switching the backend, and comparing implementations, impossible. I think the concepts we want to enforce, therefore, may be:
|
Does it make sense to give the backend selector parameters? Here, it sounds like you may have a highly efficient backend selector: "only accept for dispatch if dealing with cupy arrays", or a more aggressive, but less efficient backend selector: "accept anything that can be converted to a cupy array". |
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I wish I could use cuCIM to do [...]
Currently, in scikit-image we are developing a dispatching mechanism that would allow function calls to be rerouted to different backend packages(like
cucim
). This means users would be able to seamlessly use cuCIM as a backend for scikit-image, getting significant speed improvements without rewriting much of their code.So, cuCIM as a backend would look something like:
This setup allows users to benefit from GPU acceleration while keeping their familiar scikit-image API. We also plan on providing documentation and testing support for backends.
Also, you can actually run the above code, if you have a GPU setup, and then you would have to:
interface
andinfo
, as described in the next sectionpip install git+https://github.com/Schefflera-Arboricola/scikit-image.git@patch-1
(make sure to uninstall any other scikit-image versions)I don't have GPUs and I tried building cucim from source in google colab(with runtime type as GPU), but I couldn't.
Describe the solution you'd like
A clear and concise description of what you want to happen.
To make cuCIM into a backend package I think we will only need to add two entry-points in the project's pyproject.toml(here), and the objects those entry-points are referring to:
info
: Currently, it tells scikit-image which functions are supported by cuCIM. It is a function that returns aBackendInformation
object, which has an attribute namedsupported_functions
whose value is a list of function names as strings(in this format:"public_module_name:function_name"
) supported by cuCIM(the backend). Also,BackendInformation
is defined in scikit-image. And in future we plan to use theBackendInformation
to let backend provide more additional information about itself and it's supported functions.backend_interface
: it's a namespace containing two functions:can_has
: Quickly checks if cuCIM can handle a given function call. It takes in the function name and the args and kwargs passed in by the user and does an inexpensive, initial check about weather cuCIM can handle these args or not(like checking the type of the args, etc.) and based on that returnTrue
orFalse
.get_implementation
: Returns the actual cuCIM function to execute. Ifcan_has
returnsTrue
then theget_implementation
is called, and it returns the function callable which is then called and the backend implementation is run.Here is a dummy backend for your reference: https://github.com/Schefflera-Arboricola/skimage-j4f
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
This backend mechanism is still a work in progress and we would really like to know the interest of the cucim community in this and any feedback you all might have on how this dispatching machinery could be improved to better accommodate the needs of backends, such as cucim.
If you are interested, you can also consider joining scikit-image's dispatching meetings:
Additional context
Add any other context, code examples, or references to existing implementations about the feature request here.
initial implementation - PR Basic infrastructure for dispatching to a backend scikit-image/scikit-image#7520
more dispatching developments going on at - PR Enabling backend priority betatim/scikit-image#1
scikit-image dispatching summary diagram:
data:image/s3,"s3://crabby-images/b2caa/b2caa8fb94d95fc659e2d2c5f968bb663f50e259" alt="Image"
https://drive.google.com/file/d/1xHLs6rK1P1XGt83ueL-DUbPO-dF0ZKFQ/view?usp=drive_link
Looking forward to your feedback!
Thank you :)
The text was updated successfully, but these errors were encountered: