-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance bug when creating SparseMatrixCSC #735
Comments
Duplicate of #204. PyCall doesn't know about the scipy sparse formats. I'm not sure that a dependence on scipy belongs in PyCall — but you could certainly create a "SciPySparse" package on top of PyCall that knows how to convert between Julia and Scipy sparse formats. |
In particular, PyCall automatically tries to convert the return value back to a Python object, which is why you are running into #204 — since it doesn't know about SciPy sparse formats, it does so by converting to a dense matrix. You can fix this by just doing: return PyCall.pyjlwrap_new(A) which will return a thin Python wrapper around the native Julia By the way, these lines look wrong to me:
I don't understand why you need any conversion here at all — the arrays have already been converted to Julia arrays when the function is called. (You can use This is type-unstable:
since you are changing the type of |
It seems like you can do something like function transformToJulia(m, n, colPtr, rowVal, nzVal)
A = SparseMatrixCSC{Float64, Int}(m, n, Int[i+1 for i in colPtr], Int[i+1 for i in rowVal], Vector{Float64}(nzVal))
return PyCall.pyjlwrap_new(A) To make it even more efficient, you can tell it to do copy-free passing of the array arguments on the Python side: transformToJulia = j.pyfunction(j.eval("PerformanceBug.transformToJulia"), j.Int, j.Int, j.PyArray, j.PyArray, j.PyArray)) and then call |
Here is an even simpler function allowing you to pass the SciPy sparse array directly: using PyCall, SparseArrays
function scipyCSC_to_julia(A)
m, n = A.shape
colPtr = Int[i+1 for i in PyArray(A."indptr")]
rowVal = Int[i+1 for i in PyArray(A."indices")]
nzVal = Vector{Float64}(PyArray(A."data"))
B = SparseMatrixCSC{Float64,Int}(m, n, colPtr, rowVal, nzVal)
return PyCall.pyjlwrap_new(B)
end |
Hi Steven, thanks for you answer! I didn't realize PyCall was trying to convert every function output to Python. Adding the PyCall.pyjlwrap_new(A) statement fixed the performance problem! I am confused as to how you communicate with Julia without using global Julia variables. Surely, calling (in Python)
returns an error because Julia has no variable A in its workspace. EDIT: Using j.PerformanceBug.scipyCSC_to_julia(A) works.The whole global variable stuff is (fortunately) avoided with this syntax. I guess this is all contained in the documentation, but maybe a small demo on how to get PyCall and Python working together could prevent others from bumping into similar problems? Thanks for the help anyways! |
Dear all,
I'm trying to import a very large scipy.sparse.csc_matrix into Julia but I'm running into a performance problem. I have tried to create a minimal working example to show what I mean.
First of all, I made the following module in Julia (file PerformanceBug.jl):
Returning the matrix A in the last statement of the function consumes all my RAM and takes ages.
I use the above Julia module from within Python via the following script (executed via pycharm):
Does anybody have a clue as to why Julia suddenly has trouble creating a simple sparse matrix? Working purely in Julia poses no performance issues at all.
Kind regards,
Tom
The text was updated successfully, but these errors were encountered: