-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
local dlt pipeline cli runner #800
Comments
Nice!
|
Hi @rudolfix, and thanks for considering our feedback from Slack. Like @mehd-io, I'm already using your second scenario with a custom CLI wrapper based on Click, so I'd support that option as well. I also agree with @mehd-io's concerns about the first option. Furthermore, my own concern is that creating a CLI runner for a source/resource might lead to confusion, especially for those less familiar with the tool. In this approach, a pipeline is created behind the scenes, but on the surface it might blur the distinction between a pipeline and a source/resource, as the latter might also function as a pipeline in practice, given the option to run it as such. Just my 2 cents! |
@sultaniman please read the code in |
Background
We are looking for convenient way to execute dlt pipelines from command line, possibly with minimal / without additional code.
There are two options to investigate (not mutually exclusive, we just need to start somewhere):
In case of (1) user would specify the name and module with the source(s) and then parameters to instantiate source class (we can use fire lib to create cli interfaces automatically for source/resource functions: https://github.com/google/python-fire), the command would create instance of dlt pipeline, attach a destination and dataset to it, import the desired source, create an instance with passed parameters and then run it (see below).
In case of (2) user would write a pipeline script where source(s) and pipeline are instantiated and then would pass the name of the script, pipeline and source(s) names to the script which will run them. (in this case overriding destination/dataset etc. for the actual run so it is possible to switch from dev destination to production one)
both 1 and 2 have a few common features that we already have in our airflow helper: (https://github.com/dlt-hub/dlt/blob/master/dlt/helpers/airflow_helper.py#L39 and runner example: )
replace
)An option to backfil could be available for resources that use Incremental class for incremental loading and are aware of external schedulers, In that case a start and end value could be passed from cli (not only dates but also timestamps or integers - whatever is used as incremental cursor)
The text was updated successfully, but these errors were encountered: