Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create one or more blog posts about Kedro & "dynamic pipelines" that sets out the various requirements, solutions and links to docs #7

Open
Tracked by #2627
stichbury opened this issue Jan 31, 2023 · 13 comments
Labels
Blog post creation Blog posts (ideas and execution)

Comments

@stichbury
Copy link
Contributor

It would be great if we had a page explaining Kedro and dynamic pipelines and what the best solutions are if you can't use Kedro.

We are asked about it regularly:

Hello, good afternoon, I was adding a catalog item from a database that has timeseries data and I wanted to have a dynamic param so when I run the code It retrieves the data regarding a certain day.
I ran into this post:
kedro-org/kedro#1089
But I am failing to understand what is the reasoning behind. I would like to understand what would be the approach in kedro. Loading the whole table from sql and then run the transformations in code. But that would make quite inefficient when running in production. (edited)

@stichbury stichbury added the Blog post creation Blog posts (ideas and execution) label Jan 31, 2023
@stichbury
Copy link
Contributor Author

We had a technical design meeting to discuss this topic. I'm going to revisit the recording to see what, if anything, could be used in a blog post.

@stichbury
Copy link
Contributor Author

When we had the technical design meeting last month, we didn't get a final summary together because the session ended abruptly when the room was needed by the next meeting. I think, from reviewing the recording, that the general consensus was that we need to understand better the user problems, and need to clarify our advice in terms of what we say on slack about whether we recommend the approach (this is taken from @deepyaman's comment near the end).

Am I correct in thinking that we still need to communicate dynamic pipelines in terms of our stance now and future plans? @merelcht ?

@merelcht
Copy link
Member

Yes. Afaik @noklam was diving deeper into this topic to get a better understanding of the user problems and what our users really mean when talking about dynamic pipelines. What are your thoughts on this @noklam, do we need another meeting to discuss?

@noklam
Copy link
Contributor

noklam commented May 23, 2023

Thank you @stichbury for asking this! Yes! There is some work due, I will try to finish it and share the summary. I prefer to share it in a separate meeting since the topic is quite complicated and not everyone is interested.

The follow-up action

  1. Continue the research and summarise https://miro.com/app/board/uXjVMSX0s6s=/ (by @noklam, end of May)
  2. I may have a smaller group to discuss this first then a broader meeting to share with the team.
  3. Clarify what is a dynamic pipeline, which problems are we going to solve, and which problems are unlikely to be solved. We may even provide an example of the most common problem like a time-series forecast (short-term), or a feature that solves these problems at a higher level (dataset factory, multi-runner, customer resolver or something else). Even if we don't have the feature we could still talk about what's coming.

@noklam
Copy link
Contributor

noklam commented Sep 26, 2023

@stichbury
Copy link
Contributor Author

Is this something we should (re)publish on the Kedro blog @noklam ?

@noklam
Copy link
Contributor

noklam commented Sep 28, 2023

Does it fits into Kedro's blog? So far most of the posts are very use-case specific, this is more a quick walkthrough.

I can also re-purpose this into a time-series forecast example, maybe add the datasets factory feature to show how this can be done.

@stichbury
Copy link
Contributor Author

Thanks @noklam! I'd like to publish it, but whether it is fine as a quick walkthrough or whether it needs some changes to make it into a time-series forecast example, is probably not something I can answer.

Maybe something that Joel or @astrojuanlu could guide on. I'll ask for some feedback.

@stichbury
Copy link
Contributor Author

So Joel's commented that this is ideal as a post for Kedro, and agrees that having the datasets factory feature would be great. I think we should factor this in as a post in the next sprint, so I've made a ticket #112 and I'll help you convert the text to publish on Contentful for blog.kedro.org.

@astrojuanlu
Copy link
Member

TODO: Replace Discord links in #7 (comment) with Linen (and maybe copy them to kedro-org/kedro#2627)

@astrojuanlu
Copy link
Member

Note that @marrrcin wrote https://getindata.com/blog/kedro-dynamic-pipelines/

@stichbury
Copy link
Contributor Author

stichbury commented Nov 7, 2023

I think this ticket should be kept open for theoretical discussion but we can acknowledge that there's a need to create documentation that lists the various ways pipelines are considered "dynamic". We ultimately need to list the ways we recommend users to implement what they need to meet their particular "dynamic pipeline" requirements. That documentation takes priority and then we look at whether there's a blog series about each "archetype" to come. First step in the sequence of docs is described here: kedro-org/kedro#3282

There is also an issue outstanding for a time series blogpost #112 which I'm hoping we can deliver from @noklam's existing blog. It's in the current sprint but I'm not sure if we'll get it published this time or next. WDYT @noklam ?

@stichbury stichbury moved this to To Do in Kedro Framework Nov 7, 2023
@stichbury stichbury changed the title Create a blog post about Kedro & dynamic pipelines Create one or more blog posts about Kedro & "dynamic pipelines" that sets out the various requirements, solutions and links to docs Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blog post creation Blog posts (ideas and execution)
Projects
Status: No status
Development

No branches or pull requests

4 participants