Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Reduce noise from Qualx logs #1535

Open
Tracked by #1416
amahussein opened this issue Feb 7, 2025 · 0 comments
Open
Tracked by #1416

[BUG] Reduce noise from Qualx logs #1535

amahussein opened this issue Feb 7, 2025 · 0 comments
Labels
? - Needs Triage user_tools Scope the wrapper module running CSP, QualX, and reports (python)

Comments

@amahussein
Copy link
Collaborator

Describe the bug

The qualX generates some log warnings that make it more difficult to spot errors/warnings in the tools output.
These warning are generated for each app which makes the log a little bit noisy.

For example the message below, it does not really help because it has no information about

2025-02-07 10:26:38,281 INFO spark_rapids_tools.tools.qualx.qualx_main: Loading dataset: qual_20250207102620_19Cbf90F
2025-02-07 10:26:38,637 WARNING spark_rapids_tools.tools.qualx.preprocess: Imputing missing features: ['platform_databricks-aws', 'platform_databricks-azure', 'platform_dataproc', 'platform_emr', 'sqlOp_AQEShuffleRead', 'sqlOp_BatchEvalPython', 'sqlOp_CommandResult', 'sqlOp_CustomShuffleReader', 'sqlOp_DeserializeToObject', 'sqlOp_Execute InsertIntoHadoopFsRelationCommand csv', 'sqlOp_Execute InsertIntoHadoopFsRelationCommand json', 'sqlOp_Execute InsertIntoHadoopFsRelationCommand orc', 'sqlOp_Execute InsertIntoHadoopFsRelationCommand parquet', 'sqlOp_Execute InsertIntoHadoopFsRelationCommand text', 'sqlOp_Execute InsertIntoHadoopFsRelationCommand unknown', 'sqlOp_GenerateBloomFilter', 'sqlOp_GlobalLimit', 'sqlOp_HashAggregatePrefixGroupingSets', 'sqlOp_LocalLimit', 'sqlOp_LocalTableScan', 'sqlOp_MapElements', 'sqlOp_ObjectHashAggregate', 'sqlOp_OutputAdapter', 'sqlOp_PartialWindow', 'sqlOp_ReusedSort', 'sqlOp_RunningWindowFunction', 'sqlOp_Scan ExistingRDD', 'sqlOp_Scan ExistingRDD Delta Table Checkpoint', 'sqlOp_Scan ExistingRDD Delta Table State', 'sqlOp_Scan OneRowRelation', 'sqlOp_Scan csv', 'sqlOp_Scan jdbc', 'sqlOp_Scan json', 'sqlOp_Scan orc', 'sqlOp_Scan text', 'sqlOp_Scan unknown', 'sqlOp_SerializeFromObject', 'sqlOp_SortAggregate', 'sqlOp_SubqueryOutputBroadcast', 'sqlOp_TakeOrderedAndProject', 'sqlOp_Window', 'sqlOp_WindowGroupLimit', 'sqlOp_WindowSort']
2025-02-07 10:26:38,643 WARNING spark_rapids_tools.tools.qualx.preprocess: Removing extra features: ['hasSqlID', 'resourceProfileId', 'sqlOp_Execute InsertIntoHadoopFsRelationCommand', 'sqlOp_Scan csv ']
2025-02-07 10:26:38,643 WARNING spark_rapids_tools.tools.qualx.preprocess: Removing extra features: ['hasSqlID', 'resourceProfileId', 'sqlOp_Execute InsertIntoHadoopFsRelationCommand', 'sqlOp_Scan csv ']
2025-02-07 10:26:38,647 INFO spark_rapids_tools.tools.qualx.qualx_main: Loading model from: ~/src_directory/spark_rapids_pytools/resources/qualx/models/xgboost/onprem.json
2025-02-07 10:26:38,687 INFO spark_rapids_tools.tools.qualx.qualx_main: Predicting dataset (with stage filtering): qual_20250207102620_19Cbf90F
2025-02-07 10:26:38,754 INFO spark_rapids_tools.tools.qualx.qualx_main: Writing features to: output_folder/qual_20250207102620_19Cbf90F/xgboost_predictions/features.csv
2025-02-07 10:26:38,878 INFO spark_rapids_tools.tools.qualx.qualx_main: Writing shapley feature importances to: output_folder/qual_20250207102620_19Cbf90F/xgboost_predictions/feature_importance.csv
2025-02-07 10:26:38,879 INFO spark_rapids_tools.tools.qualx.qualx_main: Writing shapley values to: output_folder/qual_20250207102620_19Cbf90F/xgboost_predictions/shap_values.csv
2025-02-07 10:26:38,901 INFO spark_rapids_tools.tools.qualx.util: Writing per-SQL predictions to: output_folder/qual_20250207102620_19Cbf90F/xgboost_predictions/per_sql.csv
2025-02-07 10:26:38,903 INFO spark_rapids_tools.tools.qualx.util: Writing per-application predictions to: output_folder/qual_20250207102620_19Cbf90F/xgboost_predictions/per_app.csv

Steps/Code to reproduce bug

We like to improve the readability of the logging by either making those messages more concise or by revisiting them and deciding on what exactly to report.

@amahussein amahussein added ? - Needs Triage user_tools Scope the wrapper module running CSP, QualX, and reports (python) labels Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage user_tools Scope the wrapper module running CSP, QualX, and reports (python)
Projects
None yet
Development

No branches or pull requests

1 participant