Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoTuner: Set recommendation for spark.task.resource.gpu.amount to a very low value #1514

Merged
merged 1 commit into from
Jan 29, 2025

Conversation

parthosa
Copy link
Collaborator

Fixes #1401

This PR set the recommended value of spark.task.resource.gpu.amount to a very low number (0.001) as Spark will honor spark.executor.cores setting above this.

Changes

  • Updated the recommendedCoresPerExec method in Platform class to use ProfilingAutoTunerConfigsProvider.DEF_CORES_PER_EXECUTOR.
  • Added new default configuration values in AutoTunerConfigsProvider, including DEF_CORES_PER_EXECUTOR and DEF_TASK_GPU_RESOURCE_AMT.
  • Modified AutoTuner to use DEF_TASK_GPU_RESOURCE_AMT for spark.task.resource.gpu.amount instead of calculating it dynamically.
  • Updated unit tests with the new recommendation for gpu amount

@parthosa parthosa added bug Something isn't working core_tools Scope the core module (scala) labels Jan 27, 2025
@parthosa parthosa self-assigned this Jan 27, 2025
@parthosa parthosa marked this pull request as ready for review January 27, 2025 19:06
Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTME!
Thanks @parthosa

@parthosa parthosa merged commit 7ac3a32 into NVIDIA:dev Jan 29, 2025
14 checks passed
@parthosa parthosa deleted the spark-rapids-tools-1401 branch January 29, 2025 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core_tools Scope the core module (scala)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] spark.task.resource.gpu.amount improvements
2 participants