-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve resume pipeline suggestion for SequentialRunner #1795
Conversation
Signed-off-by: Jannic Holzer <[email protected]>
Signed-off-by: Jannic Holzer <[email protected]>
Signed-off-by: Jannic Holzer <[email protected]>
Signed-off-by: Jannic Holzer <[email protected]>
Signed-off-by: Jannic Holzer <[email protected]>
Signed-off-by: Jannic Holzer <[email protected]>
Signed-off-by: Jannic Holzer <[email protected]>
Signed-off-by: Jannic Holzer <[email protected]>
Signed-off-by: Jannic Holzer <[email protected]>
Signed-off-by: Jannic Holzer <[email protected]>
Signed-off-by: Jannic Holzer <[email protected]>
Signed-off-by: Jannic Holzer <[email protected]>
Signed-off-by: Jannic Holzer <[email protected]>
Signed-off-by: Jannic Holzer <[email protected]>
Signed-off-by: Jannic Holzer <[email protected]>
Signed-off-by: Jannic Holzer <[email protected]>
Signed-off-by: Jannic Holzer <[email protected]>
Signed-off-by: Jannic Holzer <[email protected]>
Signed-off-by: Jannic Holzer <[email protected]>
Signed-off-by: Jannic Holzer <[email protected]>
Signed-off-by: Jannic Holzer <[email protected]>
Signed-off-by: Jannic Holzer <[email protected]>
Signed-off-by: Jannic Holzer <[email protected]>
122839f
to
af405ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work tackling a tough issue! I don't have much concern since this feature wasn't doing much before, even if it doesn't work for all cases it will still be an improvement. But like Antony said it would be good to check if it works for other Runner
.
Thanks for the re-review! You're right I dropped this one somewhere, I'm sorry. I've been looking into this, I'll give an update when I've finished. |
Cool, no worries. As @noklam says, if it doesn't work then it's not a showstopper. I'm happy to merge with it just working on sequential runner and we can fall back on using the previous inferior |
Alright, I finished my investigation into It is possible to implement the new scheme proposed in this PR for Unfortunately, it isn't of much use, since the sequence in which nodes are run (and the resulting exception is reached) is not deterministic for ParallelRunner. This causes problems for both the new and the existing logic for generating suggestions. For example, with the existing logic a run with
Another identical run will (stochastically) produce the message:
Similar results are seen for the new logic implemented in this PR. One message is correct while the other isn't. Since these conflicting messages occur with roughly the same frequency, I don't think we should be suggesting a resume command at all at the moment for @noklam @AntonyMilneQB it would be good to hear your thoughts on this. If you agree with me, I will turn off this feature for |
I am happy that this is added just for Note that there may be 2 sources of non-deterministic behavior:
It's impossible to have deterministic nodes execution order for |
Signed-off-by: Jannic Holzer <[email protected]>
Thanks for the feedback @noklam and @AntonyMilneQB! It's much appreciated. @noklam thanks for the hint in 1. Regarding 2, you're right about this, the execution order is inherently indeterminate. Nonetheless I think we can at least reach a deterministic 'solution' (in this case, the correct warning) using join(s). I will open an issue and explain my thinking. |
Signed-off-by: Jannic Holzer <[email protected]>
…kedro-org/kedro into feat/improve-resume-scenario-suggestion Signed-off-by: Jannic Holzer <[email protected]>
* Add _find_first_persistent_ancestors and stubs for supporting functions. Signed-off-by: Jannic Holzer <[email protected]> * Add body to _enumerate_parents. Signed-off-by: Jannic Holzer <[email protected]> * Add function to check persistence of node outputs. Signed-off-by: Jannic Holzer <[email protected]> * Modify _suggest_resume_scenario to use _find_first_persistent_ancestors Signed-off-by: Jannic Holzer <[email protected]> * Pass catalog to self._suggest_resume_scenario Signed-off-by: Jannic Holzer <[email protected]> * Track and return all ancestor nodes that must be re-run during DFS. Signed-off-by: Jannic Holzer <[email protected]> * Integrate DFS with original _suggest_resume_scenario. Signed-off-by: Jannic Holzer <[email protected]> * Implement backwards-DFS strategy on all boundary nodes. Signed-off-by: Jannic Holzer <[email protected]> * Switch to multi-node start BFS approach to finding persistent ancestors. Signed-off-by: Jannic Holzer <[email protected]> * Add a useful error message if no nodes ran. Signed-off-by: Jannic Holzer <[email protected]> * Add docstrings to new functions. Signed-off-by: Jannic Holzer <[email protected]> * Add catalog argument to self._suggest_resume_scenario Signed-off-by: Jannic Holzer <[email protected]> * Modify exception_fn to allow it to take multiple arguments Signed-off-by: Jannic Holzer <[email protected]> * Add test for AbstractRunner._suggest_resume_scenario Signed-off-by: Jannic Holzer <[email protected]> * Add docstring for _suggest_resume_scenario Signed-off-by: Jannic Holzer <[email protected]> * Improve formatting Signed-off-by: Jannic Holzer <[email protected]> * Move new functions out of AbstractRunner Signed-off-by: Jannic Holzer <[email protected]> * Remove bare except Signed-off-by: Jannic Holzer <[email protected]> * Fix broad except clause Signed-off-by: Jannic Holzer <[email protected]> * Access datasets __dict__ using vars() Signed-off-by: Jannic Holzer <[email protected]> * Sort imports Signed-off-by: Jannic Holzer <[email protected]> * Improve resume message Signed-off-by: Jannic Holzer <[email protected]> * Add a space to resume suggestion message Signed-off-by: Jannic Holzer <[email protected]> * Modify DFS logic to eliminate possible queue duplicates Signed-off-by: Jannic Holzer <[email protected]> * Modify catalog.datasets to catalog._data_sets w/ disabled linter warning Signed-off-by: Jannic Holzer <[email protected]> * Move all pytest fixtures to conftest.py Signed-off-by: Jannic Holzer <[email protected]> * Modify all instances of Pipeline to pipeline Signed-off-by: Jannic Holzer <[email protected]> * Fix typo in the name of TestSequentialRunnerBranchedPipeline Signed-off-by: Jannic Holzer <[email protected]> * Remove spurious assert in save of persistent_dataset_catalog Signed-off-by: Jannic Holzer <[email protected]> * Replace instantiations of Pipeline with pipeline Signed-off-by: Jannic Holzer <[email protected]> * Modify test_suggest_resume_scenario fixture to use node names Signed-off-by: Jannic Holzer <[email protected]> * Add disable=unused-argument to _save Signed-off-by: Jannic Holzer <[email protected]> * Remove resume suggestion for ParallelRunner Signed-off-by: Jannic Holzer <[email protected]> * Remove spurious try / except Signed-off-by: Jannic Holzer <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> Signed-off-by: nickolasrm <[email protected]>
Description
Resolves #1477
Development notes
After a failed run, Kedro suggests a command to the user:
You can resume the pipeline run by adding the following argument to your previous command: --from-nodes "node4_B"
Before this PR, the suggested command will run from the last nodes to be executed, regardless of whether their input was persisted or not. If any of the inputs to the listed nodes is not persisted, the run immediately fails again.
After this PR, the suggested command will run from the closest successfully executed nodes with persisted inputs:
You can resume the pipeline run from the nearest nodes with persisted inputs by adding the following argument to your previous command: --from-nodes "node1_B,node1_A"
This is achieved by performing a breadth-first search, starting at the last successfully executed nodes. This backward search yields a set of the nearest nodes that have persisted inputs.
Six tests are added to the
test_sequential_runner
test suite to test different cases on an X-shaped pipeline.Limitations
This change is a significant improvement, but there are still two important limitations:
MemoryDataSet
s. This definition has limitations; it does not account for custom datasets that are not persisted.In the future, I think it would be a good idea to add a method to the API of AbstractDataSet that checks for persistence. I would love to hear thoughts on this.
Checklist
RELEASE.md
file