Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype cupy backend #4952

Merged
merged 31 commits into from
Dec 3, 2024
Merged

Conversation

spxiwh
Copy link
Contributor

@spxiwh spxiwh commented Nov 22, 2024

This adds a prototype CUPY backend to PyCBC.

Our current CUDA GPU backend is not working. There's also a lot more tools now for interacting with CUDA than in 2011. CUPY is really nice, and I think will reduce quite a bit the complexity of our CUDA backend, while still allowing us to use the custom CUDA kernels that exist (as demonstrated in the PR).

This backend will:

  • Run the premerger likelihood through PyCBC inference (with MPI over multiple cores, but not with openmp).
  • Mostly run pycbc_inspiral. There's some issue in the chisq module, but I've run out of time to debug it.

I post this now, although I would like to have pycbc_inspiral running before proposing merging ... But I did promise on Wednesday that I would post this.

Others have suggested moving to torch instead. I would like to see a demonstration of this if we want to consider going that route or this one.

ACTIONS

  • I need to make sure that types are consistent in RawKernel (if not expensive, an explicit check before calling would avoid potential strange errors!)

@spxiwh
Copy link
Contributor Author

spxiwh commented Nov 25, 2024

>> [Mon 25 Nov 07:45:27 CST 2024] Running pycbc inspiral cupy:openmp with 1 threads


>> [Mon 25 Nov 07:45:48 CST 2024] test for GW150914
Pass: 2 GW150914-like triggers

This is now running the pycbc_inspiral unittest (examples/inspiral) ... It's still probably missing lots of things, and probably isn't well optimized (for inspiral), but I'm happy to get feedback (and potentially merge this) at this point.

Copy link
Contributor

@GarethCabournDavies GarethCabournDavies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this looks sensible to me, though I don't feel I can approve yet

The bits that I'd got to look to be the same as what I'd implemented (though I was much slower and hadn't got to certain parts)

Main points I was wanting to ask about:

  • put your own name down where you have done stuff (even if adding to others')
  • I've looked where bits have been adapted from and have noticed minor discrepancies that I wasn't sure on, so am asking questions.

_backend_dict = {'cupy' : 'cupyfft'}
_backend_list = ['cupy']

_alist, _adict = _list_available(_backend_list, _backend_dict)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backend_cuda version of this has the if pycbc.HAVE_CUDA statement and this doesn't. This makes me think, should this backend work when not on a GPU?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's used to stop the tests failing here when no GPU is present by not loading any CUDA module ... I'll probably need this, but it might be possible to have the documentation and help text run for the cupy backend, even if the code isn't going to work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Clearly the tests do need to pass before merging).

pycbc/fft/backend_cupy.py Outdated Show resolved Hide resolved
pycbc/fft/cupyfft.py Outdated Show resolved Hide resolved
else:
raise ValueError(_INV_FFT_MSG.format("IFFT", itype, otype))


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have something similar to the numpy warning, i.e "The cupy backend is a prototype, and performance may not be as expected"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't at the same level. The numpy FFT backend is really bad. It's not clear that's the same for cupy. I've not really seen things limited by the memory allocation. One might want a warning in the scheme initialization that things are not great yet, but I think it doesn't belong here.

pycbc/fft/cupyfft.py Outdated Show resolved Hide resolved
if self.dtype == _xp.float32 or self.dtype == _xp.float64:
return _xp.argmax(abs(self.data))
else:
return abs_arg_max_complex(self._data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see where this is defined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably not ... This is still a prototype.

if cdtype.kind == 'c':
return _xp.sum(self.data.conj() * other, dtype=complex128)
else:
return inner_real(self.data, other)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here - i dont see where this is defined

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

pycbc/types/array_cupy.py Outdated Show resolved Hide resolved
pycbc/vetoes/chisq_cupy.py Outdated Show resolved Hide resolved
pycbc/waveform/utils_cupy.py Outdated Show resolved Hide resolved
@spxiwh
Copy link
Contributor Author

spxiwh commented Nov 25, 2024

Thanks @GarethCabournDavies I'll respond to some of the things above, but in terms of some of the big picture things:

  • I don't like the named copyright in PyCBC. This does not accurately reflect contribution. I would prefer if this were removed everywhere with all code copyright of "The PyCBC Team" ... But that's a bigger change.
  • A number of places you highlight non-existent functions .... There's quite a few more! This is, deliberately, a prototype backend, so things are expected not to exist. Hopefully having it merged will encourage others (ie. you) to fill the gaps.

return htilde


def fstimeshift(freqseries, phi, kmin, kmax):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kmin and kmax don't appear to be used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a FIXME for that ... this function should be converted to an ElementwiseKernel, I think, for performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a FIXME here that this block should be changed to a proper ElementwiseKernel using these parameters.

@spxiwh
Copy link
Contributor Author

spxiwh commented Dec 2, 2024

@GarethCabournDavies Is there anything else that should be added at this stage? From my perspective, I would prefer to merge this now. I've added a warning when the scheme is loaded to make it clear that this is still a prototype scheme.

Copy link
Contributor

@GarethCabournDavies GarethCabournDavies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good for merging to me - one minor question that I'm not 100% on.

The other thing which would be nice to see but not necessary is a note in docs/install_cuda.rst to say that this is available but not mature

Comment on lines +475 to +481
delta_f=self.delta_f, epoch=self.epoch,
copy=False)
tmp[:len(self)] = self[:]

f = TimeSeries(zeros(tlen,
dtype=real_same_precision_as(self)),
delta_t=delta_t)
delta_t=delta_t, copy=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these changes affect other schemes/backends running?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a general optimization improvement.

In the previous version here, we run zeros to generate an array, and then in the FrequencySeries initialization we create another array of zeroes and copy across. There's no reason to copy here as the initial zeros array is not being stored anyway and is otherwise freed. So we should only assign the memory for this new array once, not twice, in all cases.

copy=True is also only partially working on Cupy arrays (in some cases it will fail).

@WuShichao
Copy link
Contributor

Does "Y Ddraig Goch" mean there is a dragon? 😄

@spxiwh
Copy link
Contributor Author

spxiwh commented Dec 3, 2024

I'm merging this now then. I encourage interested folks to propose PRs to improve this backend!

@spxiwh spxiwh merged commit c06f60a into gwastro:master Dec 3, 2024
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants