How can we manage internal data lineage in DPDS? #36
andrea-gioia
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Let's brainstorm here on how to expose internal data lineage through observability ports in DPDS.
The ideas collected in this thread could become the foundation for a new RFC proposal.
What is a internal data lineage anyway?
Internal data lineage tracks the journey of data within a data product from input ports to output ports. In general, there are several ways to categorize data lineage, depending on the focus and the method used to track it. Here are some of the most common types:
Technical Data Lineage: This focuses on the technical aspects of the data flow, like the specific databases, tables, and transformations the data goes through. It is also commonly referred as static data lineage. Information related to this type of lineage should be exposed by a data product through discoverability port.
Operational Data Lineage: This tracks the movement and processing of data from a system perspective. It shows how data is ingested, loaded, and moved around within the organization's infrastructure. It is also commonly referred as dynamic data lineage or runtime data lineage. Information related to this type of lineage should be exposed by a data product through observability port.
Business Data Lineage: This looks at the data from a business standpoint. It focuses on how data elements relate to business processes and what they represent in the bigger picture. It is also commonly referred as vertical data lineage. This type of lineage is managed through semantic linking and it is out of scope of this brainstorm.
Why it matters?
TODO
Some useful resources
Beta Was this translation helpful? Give feedback.
All reactions