-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Best practices for sharing configuration across multiple catalog files #3625
Comments
Hi @sbrugman, I am having the same issue you mention here, where I want more flexibility and avoid copying common YML configs over and over but I don't seem to find an easy way in the current kedro ecosystem Did you find any elegant solution? I think playing with OmegaConf resolvers might make it work but it makes things quite unreadable and complicated |
I am adding an interesting Slack discussion here, where the interaction between factories and OmegaConf resolvers is discussed, so that we can have it as a reference on this topic: Initial message:
|
If |
Unclear to me if YAML anchors can be shared. But variables and blocks of YAML definitely can. The way of using them deviates from normal YAML syntax though, and I don't think the "inheritance" provided by YAML anchors is supported in Omegaconf. |
There are at least three dynamic techniques to define the Data Catalog currently in Kedro:
There is a common usage pattern for which even a combination of these three is not expressive enough to satisfactory specify the configuration.
Scenario
Each of the pipelines consists of a set of nodes of limited distinct types, e.g. source (read only), intermediate (write local) and final (write to some database).
Multiple people are working on different pipelines, so the catalog is split in a file per pipeline:
The split files provide overview and prevent conflicts when working in parallel.
So far so good.
Rapidly, each of these files begins to look like:
Each file contains more or less the same YAML anchors, which according to the specs cannot be shared across files. Is there another way to store this common information in a single file, instead in every single one of them - while keeping a single file per pipeline?
The variable interpolation perhaps?
Even though the OmagaConfigLoader can interpolate a dict, this (afaik) does not allow to partially pass a dict, such as we can do in Python (or with the anchors above):
The dataset factories also do not support this pattern.
The following would be close, but is too restrictive:
Restrictions:
Desiderata
The user needs to be able to template catalog entries across multiple files. It must be possible to overwrite individual entries and it should be possible that the name is an alias.
The text was updated successfully, but these errors were encountered: