Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid limited memory adaptor issue in balanced KMeans #2570

Conversation

csadorf
Copy link
Contributor

@csadorf csadorf commented Feb 4, 2025

  • Switch to the use of get_large_workspace_resource instead of get_workspace_resource
  • Do not use explicit managed memory allocation.

Based on and merge after #2541 (diff)

Copy link

copy-pr-bot bot commented Feb 4, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the cpp label Feb 4, 2025
@cjnolet cjnolet requested a review from achirkin February 4, 2025 21:29
@cjnolet
Copy link
Member

cjnolet commented Feb 4, 2025

@csadorf we will also want to make sure we port this over to cuVS, since the kmeans in raft will be getting ported shortly after GTC.

Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have couple problem with kmeans_balanced here

rmm::mr::managed_memory_resource managed_memory;
rmm::device_async_resource_ref device_memory = resource::get_workspace_resource(handle);
rmm::device_async_resource_ref current_device_resource = rmm::mr::get_current_device_resource();
rmm::device_async_resource_ref workspace_resource =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of those rare cases where we do indeed need to explicitly allocate rmm::mr::managed_memory_resource (the removed TODO comment is actually incorrect).
The need to use managed memory here has nothing to do with the memory limit and the user choice, but rather is a part of the algorithm. We use the managed_memory variable across this file for not-so-big allocations that are accessed by both device and host (see, for example, the build_fine_clusters function above). Hence, using the device-only memory simply breaks the algorithm here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adressed.

However, I'm wondering whether we should be generally using make_managed_vector instead then. @achirkin Was there a specific motivation for the use of rmm::uvectors instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, only historic reasons: the balanced kmeans code arrived earlier than these managed helpers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying.

@cjnolet Considering that this code has moved to cuVS anyways I assume there is no point in refactoring this, is there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason to refactor, just need to fix any issues that is blocking cuML UMAP ATM

@csadorf csadorf changed the base branch from branch-25.04 to pull-request/2541 February 5, 2025 14:31
Copy link

copy-pr-bot bot commented Feb 5, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the extra comment, LGTM

@csadorf csadorf changed the title [WIP] Avoid limited memory adaptor issue in balanced KMeans Avoid limited memory adaptor issue in balanced KMeans Feb 5, 2025
@csadorf
Copy link
Contributor Author

csadorf commented Feb 5, 2025

Keeping this in draft mode until #2541 is merged.

@csadorf
Copy link
Contributor Author

csadorf commented Feb 5, 2025

@csadorf we will also want to make sure we port this over to cuVS, since the kmeans in raft will be getting ported shortly after GTC.

Prepared in rapidsai/cuvs#659 .

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, as this code is deprecated anyways (and will be removed once cuML is using cuVS for this).

@cjnolet cjnolet changed the base branch from pull-request/2541 to branch-25.04 February 5, 2025 19:26
@cjnolet
Copy link
Member

cjnolet commented Feb 5, 2025

Prepared in rapidsai/cuvs#659 .

Thanks so much @csadorf!

@cjnolet
Copy link
Member

cjnolet commented Feb 12, 2025

@csadorf if you haven't already, can you please open an issue in cuVS just to outline the challenge here? I understand we aren't planning on merging this diff into cuVS, but I want to make sure we have an issue opened so that we don't lose sight of this change. The improvement is going to need to be made, and soon.

@csadorf csadorf force-pushed the sadorf/address-limited-memory-adaptor-issue branch from 847e21e to 02be3c3 Compare February 12, 2025 17:34
@csadorf csadorf marked this pull request as ready for review February 12, 2025 17:35
@csadorf csadorf requested a review from a team as a code owner February 12, 2025 17:35
@csadorf
Copy link
Contributor Author

csadorf commented Feb 12, 2025

@csadorf if you haven't already, can you please open an issue in cuVS just to outline the challenge here? I understand we aren't planning on merging this diff into cuVS, but I want to make sure we have an issue opened so that we don't lose sight of this change. The improvement is going to need to be made, and soon.

Created rapidsai/cuvs#682 .

@csadorf
Copy link
Contributor Author

csadorf commented Feb 12, 2025

@cjnolet I believe this should be ready to merge.

@wphicks wphicks added enhancement New feature or request non-breaking Non-breaking change labels Feb 13, 2025
@jcrist jcrist added the improvement Improvement / enhancement to an existing function label Feb 13, 2025
@dantegd dantegd removed the enhancement New feature or request label Feb 13, 2025
@cjnolet
Copy link
Member

cjnolet commented Feb 13, 2025

/merge

@rapids-bot rapids-bot bot merged commit 842afd7 into rapidsai:branch-25.04 Feb 13, 2025
82 checks passed
@csadorf csadorf deleted the sadorf/address-limited-memory-adaptor-issue branch February 13, 2025 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants