Config validation #3788
Replies: 8 comments 1 reply
-
Tangentially related: #2481 |
Beta Was this translation helpful? Give feedback.
-
So we actually have a solution for config inheritance and mixins in some of our internal tooling (hopefully open sourcing in 2024 🤞) for assembling Kedro projects. It's proven a useful feature for achieving DRY concepts, powered by OmegaConf and Python devs find it easy to grok since it follows MRO to the dot. For QB contributors I can link to the docs and the specific implementation here. Please find a screenshot of the docs of the
As well as a
|
Beta Was this translation helpful? Give feedback.
-
Regarding validation I would actually apply the Pydantic models to the functions bound in the node function - not couple it to the configuration system. |
Beta Was this translation helpful? Give feedback.
-
This is indeed a question that have been asked a couple of times. Apart from this specific use case (where it may eventually makes sense to have a specific configuration file as you suggest), my usual go to solution is to create a custom resolver, which would look like this: _sensor:
type: "<unknown>"
stderr: 0.1
_sensor2:
stderr: 0.2
sensors:
sensor1: ${_sensor} # All defaults are taken
sensor2: ${merge_dict: _sensor, _sensor2} # in settings. py : register_new_resolver("merge_dict", lambda d1, d2: {**d1, **d2}) We could probably make the syntax easier with a more complex resolution logic (e.g. find if the key exists in a parent node to "automatically" check for a parent key with the same name (look for |
Beta Was this translation helpful? Give feedback.
-
Apple rolls their own DSL for configuration |
Beta Was this translation helpful? Give feedback.
-
So I find the niche space of "Configuration languages" like Jsonnet, Dhall, Cue and now pkl super interesting. I've posted this before but it's genuinely one of the best articles on how all the lessons learned in the K8s world apply here. I think #891 is the bible on this and ultimately lays out why we ultimately settled on I'm more convinced than ever that:
|
Beta Was this translation helpful? Give feedback.
-
Today, after showing dataset factories to a user:
|
Beta Was this translation helpful? Give feedback.
-
BTW the first problem can be trivially solved with YAML merge keys: _default: &default
name: "Sensor 1"
type: "temperature"
stderr: 0.1
sensors:
sensor1:
<<: *default
sensor2:
<<: *default
name: "Sensor 2"
stderr: 0.2 |
Beta Was this translation helpful? Give feedback.
-
Description
Today I spoke to a user that had a very long and treacherous
parameters.yml
like this:And so forth. So, there are several problems:
Which was cool! Because the defaults were filled from the
Sensor
model.However, (2a) it's not clear how to keep the defaults in the YAML, which was desirable (although there's maybe a way to achieve that in Pydantic), (2b) it's not clear if this should be in
parameters.yml
or rather a customsensors.yml
, and most importantly, (2c) it's not clear how or where to perform such validation. There's noafter_config_loaded
hook.I think the closest might be what kedro-mlflow does using
after_context_created
https://github.com/Galileo-Galilei/kedro-mlflow/blob/e88679938b1d4c7633c3f631f6b402ff11ab61fe/kedro_mlflow/framework/hooks/mlflow_hook.py#L78-L79 but then it's trying to inject the config in theKedroContext
https://github.com/Galileo-Galilei/kedro-mlflow/blob/e88679938b1d4c7633c3f631f6b402ff11ab61fe/kedro_mlflow/framework/hooks/mlflow_hook.py#L129-L134, with all the problems discussed in #3214.How can we better support this use case?
Paging @datajoely, @Galileo-Galilei
Beta Was this translation helpful? Give feedback.
All reactions