-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create massive pipeline to test with flowchart on Kedro-viz #1064
Comments
@jmholzer recently did this kedro-org/kedro#1795 (comment) where he tested the runner with 1000 nodes. I am wondering if we can create a json from the pipeline with 1000 nodes and use it for the above. |
Great idea. Let's try and build this into the demo project so we don't have maintain two data sources. Thoughts from backlog grooming.
|
Another idea: find a team that has a massive pipeline and get it from them. |
I know a few of them 😄 |
Please let us know where we can get one! |
We will use the insurex (QB vertical team) sanitized pipeline for this. |
Hi Team, Update: I reached out to Shubham from CommercialX and got one of their pipeline. He also shared a box link to go over the setup. I have set it up in my local and Observations:
I would like to get some help from the framework team (@SajidAlamQB , @ankatiyar if anyone has some time), to speed the process of Spark setup locally and successfully execute Thank you |
CommercialX Kedro Viz Testing - Observations:
Size of the data - RUN 1 - Starting Kedro Viz ... Time taken to create a kedro context:: 0.12806415557861328 Time taken to create pipeline dictionary:: 23.553779125213623 RUN 2 - Starting Kedro Viz ... Time taken to create a kedro context:: 0.12883210182189941 Time taken to create pipeline dictionary:: 21.121844053268433 Immediate RUN 3 - Starting Kedro Viz ... Time taken to create a kedro context:: 0.12455415725708008 Time taken to create pipeline dictionary:: 9.573238134384155 |
Good to know. What are the next steps? The logs are a bit difficult to read. Maybe it would help to see a flamegraph, like this kedro-org/kedro#3033 (comment) |
Also notice that, while testing with internal projects is useful, for us to confidently move forward with this we will probably have to generate some open source synthetic projects to test. See kedro-org/kedro#3790 for past discussion about this |
Hi @astrojuanlu , Thank you for the suggestions. I tested with the tools you have mentioned and also prepared a rough notes on the next steps here. To summarize, as a first step, if we load kedro data in an async way (async loading test branch) would help improve the Kedro-Viz load time for larger pipelines. If there are any new findings on the internal implementation of Kedro, I would be happy to discuss in the next Tech design. Thank you |
Thanks @ravi-kumar-pilla. To summarize from the internal document: Insights
Next steps
And if I may add, I think
|
Adding a bit more context after a quick discussion:
|
I'm not sure there's anything else for us to do here.
Let's close this issue as completed until we have more concrete actions. |
Description
Create a massive kedro-viz pipeline to stress-test flowchart features.
Context
The fluidity of flowchart interactions depends on the size of the pipeline, currently we don't have massive pipelines so we cannot stress tests a lot of features on kedro-viz. We know a lot of data science projects have huge pipelines. This issue is to make sure we build kedro-viz to also handle massive pipelines.
Possible Implementation
Maybe we can just create a big json file with multiple large pipelines
Checklist
The text was updated successfully, but these errors were encountered: