Handle invalid cluster recommendation for Dataproc #1537
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #1521.
Currently, AutoTuner/Bootstrapper recommends
1 x n1-standard-16
instance for the input CPU job, which used 8 cores and 2 instances. However, Dataproc does not support clusters with only one worker node.This PR introduces
validateRecommendedCluster
, a validation mechanism for recommended cluster configurations. Platform-specific classes can override this method to enforce platform-specific constraints.Changes
Enhancements to cluster recommendation validation:
core/src/main/scala/com/nvidia/spark/rapids/tool/Platform.scala
: Introduced thevalidateRecommendedCluster
method to validate the recommended cluster configuration, allowing subclasses to provide platform-specific validation.core/src/main/scala/com/nvidia/spark/rapids/tool/Platform.scala
: Implemented platform-specific validation inDataprocPlatform
to ensure the number of worker nodes meets the minimum required by the platform.Improvements to test coverage:
core/src/test/scala/com/nvidia/spark/rapids/tool/qualification/QualificationSuite.scala
: Modified tests to compare actual cluster information against expected values and added a new test to validate the recommended cluster information for invalid configurations. [1] [2] [3]core/src/test/scala/com/nvidia/spark/rapids/tool/qualification/QualificationSuite.scala
: Refactored therunQualificationAndTestClusterInfo
method to return the cluster summary, improving test readability and maintainability.Test