-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for configurable qualx label column #1528
Changes from all commits
c122dfa
398ec13
5bda4a3
bbb9847
ed22793
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,37 @@ | ||||||||||||||||||||||||||||||||||
# Copyright (c) 2025, NVIDIA CORPORATION. | ||||||||||||||||||||||||||||||||||
# | ||||||||||||||||||||||||||||||||||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||||||||||||||||||||||||||||||||||
# you may not use this file except in compliance with the License. | ||||||||||||||||||||||||||||||||||
# You may obtain a copy of the License at | ||||||||||||||||||||||||||||||||||
# | ||||||||||||||||||||||||||||||||||
# http://www.apache.org/licenses/LICENSE-2.0 | ||||||||||||||||||||||||||||||||||
# | ||||||||||||||||||||||||||||||||||
# Unless required by applicable law or agreed to in writing, software | ||||||||||||||||||||||||||||||||||
# distributed under the License is distributed on an "AS IS" BASIS, | ||||||||||||||||||||||||||||||||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||||||||||||||||||||||||||||||||||
# See the License for the specific language governing permissions and | ||||||||||||||||||||||||||||||||||
# limitations under the License. | ||||||||||||||||||||||||||||||||||
""" | ||||||||||||||||||||||||||||||||||
Config module for Qualx, controlled by environment variables. | ||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
Environment variables: | ||||||||||||||||||||||||||||||||||
- QUALX_CACHE_DIR: cache directory for saving Profiler output. | ||||||||||||||||||||||||||||||||||
- QUALX_DATA_DIR: data directory containing eventlogs, primarily used in dataset JSON files. | ||||||||||||||||||||||||||||||||||
- QUALX_DIR: root directory for Qualx execution, primarily used in dataset JSON files to locate | ||||||||||||||||||||||||||||||||||
dataset-specific plugins. | ||||||||||||||||||||||||||||||||||
- QUALX_LABEL: targeted label column for XGBoost model. | ||||||||||||||||||||||||||||||||||
- SPARK_RAPIDS_TOOLS_JAR: path to Spark RAPIDS Tools JAR file. | ||||||||||||||||||||||||||||||||||
""" | ||||||||||||||||||||||||||||||||||
import os | ||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
def get_cache_dir() -> str: | ||||||||||||||||||||||||||||||||||
"""Get cache directory to save Profiler output.""" | ||||||||||||||||||||||||||||||||||
return os.environ.get('QUALX_CACHE_DIR', 'qualx_cache') | ||||||||||||||||||||||||||||||||||
Comment on lines
+28
to
+30
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can use the utility methods to get/set the env variables. spark-rapids-tools/user_tools/src/spark_rapids_pytools/common/utilities.py Lines 103 to 118 in 14255f4
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment as above. |
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
def get_label() -> str: | ||||||||||||||||||||||||||||||||||
"""Get targeted label column for XGBoost model.""" | ||||||||||||||||||||||||||||||||||
label = os.environ.get('QUALX_LABEL', 'Duration') | ||||||||||||||||||||||||||||||||||
assert label in ['Duration', 'duration_sum'] | ||||||||||||||||||||||||||||||||||
return label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RAPIDS_USER_TOOLS_*
. Shall we apply the same concept for QualX related ones?QUALX_CACHE_DIR
: there is cache-directory used by the tools wrapper. Can we use the same value for both to reduce the number of variables needed by the tools? the tools uses env variableRAPIDS_USER_TOOLS_CACHE_FOLDER
and it has default variable to/var/tmp/spark_rapids_user_tools_cache
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amahussein I think there are a lot of scripts/tools that use these at the moment, so I'd leave renaming for another time. My hope is that this new
config.py
file will make it easier to refactor/rename in the future (while keeping changes minimal for now).