-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype cupy backend #4952
Prototype cupy backend #4952
Conversation
This is now running the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this looks sensible to me, though I don't feel I can approve yet
The bits that I'd got to look to be the same as what I'd implemented (though I was much slower and hadn't got to certain parts)
Main points I was wanting to ask about:
- put your own name down where you have done stuff (even if adding to others')
- I've looked where bits have been adapted from and have noticed minor discrepancies that I wasn't sure on, so am asking questions.
_backend_dict = {'cupy' : 'cupyfft'} | ||
_backend_list = ['cupy'] | ||
|
||
_alist, _adict = _list_available(_backend_list, _backend_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The backend_cuda
version of this has the if pycbc.HAVE_CUDA
statement and this doesn't. This makes me think, should this backend work when not on a GPU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's used to stop the tests failing here when no GPU is present by not loading any CUDA module ... I'll probably need this, but it might be possible to have the documentation and help text run for the cupy backend, even if the code isn't going to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Clearly the tests do need to pass before merging).
else: | ||
raise ValueError(_INV_FFT_MSG.format("IFFT", itype, otype)) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have something similar to the numpy warning, i.e "The cupy backend is a prototype, and performance may not be as expected"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't at the same level. The numpy FFT backend is really bad. It's not clear that's the same for cupy. I've not really seen things limited by the memory allocation. One might want a warning in the scheme initialization that things are not great yet, but I think it doesn't belong here.
if self.dtype == _xp.float32 or self.dtype == _xp.float64: | ||
return _xp.argmax(abs(self.data)) | ||
else: | ||
return abs_arg_max_complex(self._data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see where this is defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably not ... This is still a prototype.
if cdtype.kind == 'c': | ||
return _xp.sum(self.data.conj() * other, dtype=complex128) | ||
else: | ||
return inner_real(self.data, other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here - i dont see where this is defined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
Thanks @GarethCabournDavies I'll respond to some of the things above, but in terms of some of the big picture things:
|
return htilde | ||
|
||
|
||
def fstimeshift(freqseries, phi, kmin, kmax): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kmin
and kmax
don't appear to be used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a FIXME for that ... this function should be converted to an ElementwiseKernel, I think, for performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a FIXME here that this block should be changed to a proper ElementwiseKernel using these parameters.
This reverts commit 62a59e5.
@GarethCabournDavies Is there anything else that should be added at this stage? From my perspective, I would prefer to merge this now. I've added a warning when the scheme is loaded to make it clear that this is still a prototype scheme. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good for merging to me - one minor question that I'm not 100% on.
The other thing which would be nice to see but not necessary is a note in docs/install_cuda.rst
to say that this is available but not mature
delta_f=self.delta_f, epoch=self.epoch, | ||
copy=False) | ||
tmp[:len(self)] = self[:] | ||
|
||
f = TimeSeries(zeros(tlen, | ||
dtype=real_same_precision_as(self)), | ||
delta_t=delta_t) | ||
delta_t=delta_t, copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these changes affect other schemes/backends running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a general optimization improvement.
In the previous version here, we run zeros
to generate an array, and then in the FrequencySeries
initialization we create another array of zeroes and copy across. There's no reason to copy here as the initial zeros array is not being stored anyway and is otherwise free
d. So we should only assign the memory for this new array once, not twice, in all cases.
copy=True
is also only partially working on Cupy arrays (in some cases it will fail).
Does "Y Ddraig Goch" mean there is a dragon? 😄 |
I'm merging this now then. I encourage interested folks to propose PRs to improve this backend! |
This adds a prototype CUPY backend to PyCBC.
Our current CUDA GPU backend is not working. There's also a lot more tools now for interacting with CUDA than in 2011. CUPY is really nice, and I think will reduce quite a bit the complexity of our CUDA backend, while still allowing us to use the custom CUDA kernels that exist (as demonstrated in the PR).
This backend will:
I post this now, although I would like to have pycbc_inspiral running before proposing merging ... But I did promise on Wednesday that I would post this.
Others have suggested moving to torch instead. I would like to see a demonstration of this if we want to consider going that route or this one.
ACTIONS