Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoTuner/Bootstrapper should recommend Dataproc Spark performance enhancements #1539

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -590,6 +590,11 @@ class DataprocPlatform(gpuDevice: Option[GpuDevice],
clusterProperties: Option[ClusterProperties]) extends Platform(gpuDevice, clusterProperties) {
override val platformName: String = PlatformNames.DATAPROC
override val defaultGpuDevice: GpuDevice = T4Gpu
override val recommendationsToInclude: Seq[(String, String)] = Seq(
"spark.dataproc.enhanced.optimizer.enabled" -> "true",
"spark.dataproc.enhanced.execution.enabled" -> "true"
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add those to the yaml file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to tuning yaml file.

override def isPlatformCSP: Boolean = true
override def maxGpusSupported: Int = 4

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2989,4 +2989,32 @@ We recommend using nodes/workers with more memory. Need at least 17496MB memory.
// scalastyle:on line.size.limit
compareOutput(expectedResults, autoTunerOutput)
}

test("test AutoTuner sets Dataproc Spark performance enhancements") {
// mock the properties loaded from eventLog
val logEventsProps: mutable.Map[String, String] =
mutable.LinkedHashMap[String, String](
"spark.executor.cores" -> "16",
"spark.executor.instances" -> "1",
"spark.executor.memory" -> "80g",
"spark.executor.resource.gpu.amount" -> "1",
"spark.executor.instances" -> "1"
)
val dataprocWorkerInfo = buildGpuWorkerInfoAsString(None, Some(32),
Some("212992MiB"), Some(5), Some(4), Some(T4Gpu.getMemory), Some(T4Gpu.toString))
val infoProvider = getMockInfoProvider(0, Seq(0), Seq(0.0),
logEventsProps, Some(testSparkVersion))
val clusterPropsOpt = ProfilingAutoTunerConfigsProvider
.loadClusterPropertiesFromContent(dataprocWorkerInfo)
val platform = PlatformFactory.createInstance(PlatformNames.DATAPROC, clusterPropsOpt)
val autoTuner: AutoTuner = ProfilingAutoTunerConfigsProvider
.buildAutoTunerFromProps(dataprocWorkerInfo, infoProvider, platform)
val (properties, comments) = autoTuner.getRecommendedProperties()
val autoTunerOutput = Profiler.getAutoTunerResultsAsString(properties, comments)
val expectedResults = Seq(
"--conf spark.dataproc.enhanced.optimizer.enabled=true",
"--conf spark.dataproc.enhanced.execution.enabled=true"
)
assert(expectedResults.forall(autoTunerOutput.contains))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion is not enough because it is possible that autotuner put two different entries for the same property.
for example, -conf spark.dataproc.enhanced.optimizer.enabled=true and another one with false..
then we need to check that each property exists exactly once.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the check for comparing the complete AutoTuner output.

}
}
Loading