Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spin off pipeline inspection to separate package #2163

Open
1 task
astrojuanlu opened this issue Oct 30, 2024 · 2 comments
Open
1 task

Spin off pipeline inspection to separate package #2163

astrojuanlu opened this issue Oct 30, 2024 · 2 comments

Comments

@astrojuanlu
Copy link
Member

Description

Kedro-Viz has done a lot of work to statically derive the structure of the pipeline, now without even having all the imports in place (see discussion in #1742, #1966)

The idea here is to split that functionality as a separate Python package that the Kedro Viz backend would depend on.

Context

There's growing evidence that this functionality could be useful for other use cases, for example to facilitate translating Kedro pipelines into other formats. Plugin authors probably know better, but I'm sure every translator plugin (think kedro-vertexai, kedro-mlrun, kedro-databricks) needs some form of pipeline inspection.1

There's a proof of concept of how that could look like in https://github.com/AlpAribal/kedro-inspect, which was created as part of this research https://github.com/kedro-org/kedro/wiki/Synthesis-of-research-related-to-deployment-of-Kedro-to-modern-MLOps-platforms .

Possible Implementation

There's a strong indication that this process could use OpenLineage, specifically the concept of Static Lineage (admittedly not very well documented). Some earlier thoughts in kedro-org/kedro#4054

Possible Alternatives

Use a more ad-hoc format, more similar to whatever Kedro Viz is currently using, maybe even the output of --save-file (although at the moment it's not very clear what's expected there, see #1681).

Since there has been reluctance in the past towards spinning off packages, another solution could be that the functionality stays in Kedro Viz, and plugins depend on it. With the amount of dependencies Kedro Viz has, I really hope this isn't the preferred solution.

Another solution is to have that in pypi.org/p/kedro. I don't even think it would be too bad, since we're talking about exporting or serializing kedro.pipelines.pipeline.Pipeline objects in the end.

Very likely there are other possible solutions here, ideas welcome.

Since this is in the Kedro-Viz tracker, cc @merelcht for visibility.

Checklist

  • Include labels so that we can categorise your feature request

Footnotes

  1. This is a mere hypothesis. I think it would be good to sweep the different translator plugins and see if there is overlapping code among them cc @DimedS

@astrojuanlu
Copy link
Member Author

(It seems like I'm solutionising here, please don't take it as such - it's more of a brain dump after an exceedingly long yak shaving session)

@astrojuanlu
Copy link
Member Author

Earlier, related, idea at #1857.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Inbox
Development

No branches or pull requests

1 participant