Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse gpu array from python to Julia #93

Open
jakubMitura14 opened this issue Jan 8, 2022 · 6 comments
Open

parse gpu array from python to Julia #93

jakubMitura14 opened this issue Jan 8, 2022 · 6 comments

Comments

@jakubMitura14
Copy link

jakubMitura14 commented Jan 8, 2022

hello I have cupy cuda array and I want to pass it into julia as is.
CUDA arrays are just list of pointers so it should be possible
from CUDA.jl side I know is possible as I have a comment
"""
for passing data the other way around you can use unsafe_wrap(CuArray, ...) to create a CUDA.jl array from a device pointer you get from Python
"""
still I can not make it work - anybody have some working example?

What I was trying

import cupy
import numba
import numpy as np
import torch
import torch.utils.dlpack
from statistics import median
import timeit
from juliacall import Main as jl
# julia.install()
from numba import cuda
jl.seval("using Pkg")
jl.seval("""Pkg.add("CUDA")""")
jl.seval("""Pkg.add("PythonCall")""")
jl.seval("""using CUDA""")
jl.seval("""using PythonCall""")
jl.seval("""CUDA.allowscalar(true)""")
jl.seval("""print(sum(CUDA.ones(3,3,3)))""")# working good
jl.seval("""function bb(arrGold)
    # print(CUDA.unsafe_wrap(CuArray{UInt8,3},arrGold, (2,2,2)))
    print( pyconvert(CuArray{UInt8} ,arrGold ))
end""")


def print_hi(name):
    t1 = torch.cuda.ByteTensor(np.ones((2,2,2)))
    c1 = cupy.asarray(t1)
   
    Main.bb(c1)

    def forBenchPymia():
        numba.cuda.synchronize()
        jl.bb(c1)
        numba.cuda.synchronize()
    
    num_runs = 1
    num_repetions = 1#2
    ex_time = timeit.Timer(forBenchPymia).repeat(
                         repeat=num_repetions,
                         number=num_runs)
    res= median(ex_time)*1000
    print("bench")
    print(res)



if __name__ == '__main__':
    print_hi('PyCharm')

@cjdoris
Copy link
Collaborator

cjdoris commented Jan 8, 2022

I'm no GPU expert, but you should be able to use the cuda array interface (https://numba.pydata.org/numba-doc/dev/cuda/cuda_array_interface.html) to get the pointer to the data.

PythonCall does something similar (https://github.com/cjdoris/PythonCall.jl/blob/main/src/pywrap/PyArray.jl) to wrap python objects that have the numpy array interface.

@jakubMitura14
Copy link
Author

jakubMitura14 commented Jan 8, 2022

Thanks, so I made progress thanks to your hints I changed it into numba cuda array still it is very slow although function does nothing - I mean theat just passing to argument gets 3633 miliseconds ( I am working on CUDA accelerated segmentation metrics and on data with the same size whole calculation takes around 50 ms )

import numba
import numpy as np
import torch
from statistics import median
import timeit
import julia
from juliacall import Main as jl
from numba import cuda
jl.seval("using Pkg")
jl.seval("""Pkg.add("CUDA")""")
 jl.seval("""Pkg.add("PythonCall")""")
jl.seval("""using CUDA""")
jl.seval("""using PythonCall""")
jl.seval("""CUDA.allowscalar(true)""")
jl.seval("""print(sum(CUDA.ones(3,3,3)))""") #works

jl.seval("""function bb(arrGold)
   
end""")




def print_hi(name):
    t1 = torch.tensor(np.ones((512,512,800))).to(torch.device("cuda"))
    numbaArray = cuda.as_cuda_array(t1)
 
    jl.bb(numbaArray)
    def forBenchPymia():
        numba.cuda.synchronize()
        jl.bb(numbaArray)
        numba.cuda.synchronize()
      
    num_runs = 1
    num_repetions = 1#2
    ex_time = timeit.Timer(forBenchPymia).repeat(
                         repeat=num_repetions,
                         number=num_runs)
    res= median(ex_time)*1000
    print("bench")
    print(res)




    # t = torch.cuda.ByteTensor([2, 22, 222])
    # c = cupy.asarray(t)
    # c_bits = cupy.unpackbits(c)
    # t_bits = torch.as_tensor(c_bits, device="cuda")
    # print(t_bits.view(-1, 8))


# Press the green button in the gutter to run the script.
if __name__ == '__main__':
    print_hi('PyCharm')


@rejuvyesh
Copy link

https://github.com/pabloferz/DLPack.jl might be of interest!

@jakubMitura14
Copy link
Author

Thanks !!

@github-actions
Copy link
Contributor

This issue has been marked as stale because it has been open for 30 days with no activity. If the issue is still relevant then please leave a comment, or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues about to be auto-closed label Sep 13, 2023
@github-actions
Copy link
Contributor

This issue has been closed because it has been stale for 7 days. If it is still relevant, please re-open it.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 21, 2023
@cjdoris cjdoris reopened this Sep 22, 2023
@cjdoris cjdoris removed the stale Issues about to be auto-closed label Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants