Step Launchers supersession announcement #25685
Replies: 1 comment 5 replies
-
This will imply major major breaking changes in the future. From the release notes the step launcher:
Meaning that from v2 support it could be removed entirely. Yes the step launcher has its drawbacks, but it is easily customizable (like dagster pipes), by extending its class in my opinion. And the biggest drawback of them all is the deprecation of io_manager with dagster pipes. For example, many spark jobs rely on intermediate outputs (instead of cache) to optimize the spark job. We for example, have intermediate outputs in all jobs, because we have complex queries which require this optimization. Having to configure each op with a hardcoded "io_manager" in the end / beginning, just to make things work, is just friction to developers. I think the idea behind dagster pipes is good, but it feels like we had a battletested steplauncher and now we are "obligated" to use a new less-featured almost blank canvas tool. For newcomers in dagster, this might be okay, but for existing users, its a barrier. If the step launcher had its flaws, it should've been a matter of improving the existing solution, instead of building from the ground up a new one and making existing workflows deprecated. Maybe this opinion is due to my lack of experience and experimentation with pipes, but just the thought of having to refactor everything, makes me demotivated just to try it |
Beta Was this translation helpful? Give feedback.
-
StepLauncher supersession announcement
TLDR
The StepLauncher is being superseded with Dagster Pipes. We recommend users start exploring Dagster Pipes and consider migrating (see the Spark example) existing
StepLauncher
usages to Pipes. WhileStepLauncher
will remain available, this feature will no longer be receiving active development.Users who prefer Step Launchers to Pipes because of some features like
IOManager
integration are welcome to do so, given they accept the increased development and deployment complexity.Context
StepLauncher
was an experimental Dagster resource which could be used to run Dagster steps in remote environments such as Spark. The following step launchers were implemented:dagster_databricks.databricks_pyspark_step_launcher
dagster_aws.emr.emr_pyspark_step_launcher
The goal of step launchers was to seamlessly execute Dagster code in Spark or another remote environment. In particular, a
StepLauncher
is responsible for:op/asset
remotelyHowever, this power came at the cost of significant implementation complexity and feature coupling. If you've been using step launchers, especially in Spark, you might wonder why we're moving away from this approach. Historically, we tried to integrate business logic and orchestration in external runtimes like Spark using framework-level abstractions. While
StepLauncher
aimed to make remote execution easier, it faced several challenges that made adoption difficult:StepLauncher
managed code deployment at runtime, which often conflicted with DevOps processes. Many users preferred managing deployments during push time instead.StepLauncher
required to havedagster
installed in the remote environment, which might introduce version conflicts or increase the complexity of the deployment process.StepLauncher
with programming languages other than Python, which is a blocker for popular Scala/Java Spark workloads.For those who do not want to structure their Spark business logic around Dagster definitions, we believe that Dagster Pipes -- a more composable and lightweight solution -- is the right path forward.
Dagster Pipes
Dagster Pipes is a wire protocol that handles parameter/context passing to the remote process, and log/metadata gathering from the remote process. This approach aligns better with Dagster's philosophy of modularity and extensibility, enabling users to create more flexible and powerful remote execution solutions, albeit with some additional setup responsibility.
Some of the improvements over
StepLauncher
are:➕ Increased composability: Pipes components can be mixed and matched, allowing for use in a wide variety of environments.
➕ Decreased complexity: individual Pipes components can be implemented and tested in isolation, making it easier to develop and maintain custom solutions. This modular approach allows for greater flexibility and adaptability to specific use cases.
➕ Improved extensibility: Users can easily extend the existing existing family of Pipes components to meet their unique requirements, fostering a more diverse ecosystem of integrations.
➕ Lightweight: Pipes can execute unmodified scripts. In order to send additional Dagster events (such as Dagster metadata or asset check results) back to the orchestration process, a zero-dependency (and single-file)
dagster-pipes
Python package can be installed in the remote environment.➕ Language-agnostic: Implementing Dagster Pipes in additional programming languages is very tractable. Dagster customers have already done this. Right now we only have an official implementation for Python, but we will be adding support to more languages in the near future (JVM languages being in progress).
As Pipes are more lightweight and give you greater control, they also come with some responsibilities:
➖ Pipes do not automatically set up the remote environment. This responsibility now falls to the user, typically handled through CI/CD processes.
➖ Pipes to not automatically execute
op/asset
body. Instead, the users are typically expected to provide an external script which will be launched by Pipes.Pipes Clients
We have implemented a set of opinionated Pipes clients on top of the Pipes framework for some popular services.
References
Beta Was this translation helpful? Give feedback.
All reactions