Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoTuner/Bootstrapper should recommend Dataproc Spark performance enhancements #1539

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from

Conversation

parthosa
Copy link
Collaborator

@parthosa parthosa commented Feb 7, 2025

Fixes #1538.

This PR updates AutoTuner/Bootstrapper to recommend the following Dataproc Spark performance enhancements

spark.dataproc.enhanced.optimizer.enabled=true
spark.dataproc.enhanced.execution.enabled=true

Reference - https://cloud.google.com/dataproc/docs/guides/performance-enhancements

@parthosa parthosa added feature request New feature or request core_tools Scope the core module (scala) labels Feb 7, 2025
@parthosa parthosa self-assigned this Feb 7, 2025
"spark.dataproc.enhanced.optimizer.enabled" -> "true",
"spark.dataproc.enhanced.execution.enabled" -> "true"
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add those to the yaml file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to tuning yaml file.

"--conf spark.dataproc.enhanced.optimizer.enabled=true",
"--conf spark.dataproc.enhanced.execution.enabled=true"
)
assert(expectedResults.forall(autoTunerOutput.contains))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion is not enough because it is possible that autotuner put two different entries for the same property.
for example, -conf spark.dataproc.enhanced.optimizer.enabled=true and another one with false..
then we need to check that each property exists exactly once.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the check for comparing the complete AutoTuner output.

Signed-off-by: Partho Sarthi <[email protected]>
@parthosa parthosa marked this pull request as ready for review February 8, 2025 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core_tools Scope the core module (scala) feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] AutoTuner/Bootstrapper should recommend Dataproc Spark performance enhancements configs
2 participants