You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The same rules do not apply to normalize and load steps. ie. the loader file format (#815) may be set only globally (or scoped to particular pipeline). you cannot declare that certain source prefers given file format.
In this ticket we'll unify that behavior. We introduce the following default section layout to configure pipeline steps <pipeline_name>.sources.<source_section>.<source_name>.<step_name> where <pipeline_name> and <step_name> are sticky and the other sections are eliminated in the standard way (). For example: if I want to set the number of workers in normalize stage all the settings will work
[sources.zendesk.data_writer] section will stop working. it was implicitly applied to to extract step. it must be replaced with
[sources.zendesk.extract.data_writer] or [sources.zendesk.normalize.data_writer]
[data_writer] that was applied to all steps (which btw. did not make much sense) now requires exact scope
[extract.data_writer] or [normalize.data_writer]
normalize and load settings could be defined top level (which also didn't make sense!) ie workers=1 applied both to load and normalize steps and now explicit step name must be used (which was the official way anyway) normalize.workers=1
Tasks
allow to inject an obligatory config section with step name (ie. normalize) that will be always present at the end of section list when resolving config value. this will allow for things like normalize.workers.
inject source name and section in the normalize and load steps - same way we do it in extract. that will require code refactor where we first read a load package, take the source name from schema and then instantiate the Load and Normalize objects. so the code becomes similar to extract code.
schema should also store section from the source so we have full injection section like source.
existing Normalize and Load configiurations (and their corresponding storages) should still use same sections in with_config but they should prefer existing (to yield to sections as set by the pipeline with_section decorator)
this changes should not impact destination settings. they
The text was updated successfully, but these errors were encountered:
Background
The
extract
step allows to specify configuration / secrets per source section/name. For example, the parallelism settings, buffer sizes etc. may be different for each processed source. (ie. https://dlthub.com/docs/reference/performance#controlling-in-memory-buffers)The same rules do not apply to
normalize
andload
steps. ie. the loader file format (#815) may be set only globally (or scoped to particular pipeline). you cannot declare that certain source prefers given file format.In this ticket we'll unify that behavior. We introduce the following default section layout to configure pipeline steps
<pipeline_name>.sources.<source_section>.<source_name>.<step_name>
where <pipeline_name> and <step_name> are sticky and the other sections are eliminated in the standard way (). For example: if I want to set the number of workers in normalize stage all the settings will workthe following wont
a few more examples:
Breaking changes to 0.3.x:
[sources.zendesk.data_writer] section will stop working. it was implicitly applied to to
extract
step. it must be replaced with[sources.zendesk.extract.data_writer] or [sources.zendesk.normalize.data_writer]
[data_writer] that was applied to all steps (which btw. did not make much sense) now requires exact scope
[extract.data_writer] or [normalize.data_writer]
normalize and load settings could be defined top level (which also didn't make sense!) ie
workers=1
applied both to load and normalize steps and now explicit step name must be used (which was the official way anyway)normalize.workers=1
Tasks
normalize
) that will be always present at the end of section list when resolving config value. this will allow for things likenormalize.workers
.normalize
andload
steps - same way we do it inextract
. that will require code refactor where we first read a load package, take the source name from schema and then instantiate theLoad
andNormalize
objects. so the code becomes similar toextract
code.section
from the source so we have full injection section like source.with_config
but they should prefer existing (to yield to sections as set by the pipelinewith_section
decorator)The text was updated successfully, but these errors were encountered: