Skip to content

Commit

Permalink
Major text revision.
Browse files Browse the repository at this point in the history
  • Loading branch information
hjoliver committed Jul 31, 2024
1 parent 26d96cf commit 97e3d3a
Showing 1 changed file with 59 additions and 69 deletions.
128 changes: 59 additions & 69 deletions docs/proposal-group-trigger.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,42 +2,75 @@

## Background: re-running a sub-graph in Cylc 7 vs Cylc 8

For typical use cases where all affected tasks and their parents remain in the
Cylc 7 task pool, it may be easier to rerun a past sub-graph in Cylc 7 than in
Cylc 8.
**If** all sub-graph tasks **and** their parents remain in the Cylc 7 task pool
it may be easier to rerun a past sub-graph in Cylc 7 than in (current) Cylc 8.

This is because it may require more graph knowledge to identify the
initial task(s) and off-flow prerequisites of the sub-graph (Cylc 8) than to
identify all of the sub-graph member tasks as a group (by family name or glob)
and reset them to waiting (Cylc 7).
That's because it arguably requires more graph knowledge to identify initial
task(s) and off-flow prerequisites (Cylc 8) than to identify all member tasks
as a group (by family name or glob) and reset them to waiting (Cylc 7).

## Proposal

Currently, `cylc trigger TASKS` makes all the target tasks trigger immediately.
That's almost certainly not the desired behaviour if there is any dependence
between them.
Currently, `cylc trigger TASKS` makes all of the target tasks trigger immediately,
which is likely not the desired behaviour if there is any dependence between them
(so, currently, we just avoid doing that).

I propose that we change the
default triggering behaviour to:
1. respect any internal dependencies between the target tasks
2. automatically satisfy any off-group (i.e., off-flow) prerequisites
I propose that we change default manual triggering behaviour to:
1. respect any internal dependencies between the target tasks, and
2. automatically satisfy any off-group (i.e., off-flow) prerequisites

If there are no dependencies between the target tasks, this replicates
current behaviour because (2) automatically satisfies all the prerequisites.
If there are no dependencies between the target tasks, this replicates current
behaviour because (2) automatically satisfies all the prerequisites.

If there are dependencies between the targeted tasks, this makes rerunning
any sub-graph in Cylc 8 very easy.
If there are dependencies between the targeted tasks, this makes rerunning any
sub-graph in Cylc 8 very easy.

### Implementation
## Implementation

For any group of tasks we can examine all prerequisites to find those that point
outside of the group. Setting those prerequisites satisfied will:
For the selected group of tasks, examine all prerequisites to find any that
point outside of the group. Satsifying these external prerequisites will:
- immediately trigger the "initial tasks" of the group, and
- satisfy any off-flow prerequisites that would cause a stall
- satisfy any off-flow prerequisites that would cause a stall

### Details

1. match n=0 and future tasks to command inputs and record all the task IDs
1. (if not triggering with `--flow=new` then erase the previous flow from
each task in the group, or perhaps require separate use of
`cylc remove` for that)
1. examine the prerequisites of each task in the group
- for n=0 tasks, just query their prerequisites
- for future tasks, use taskdef methods to predict their prerequisites
1. unsatisfy any in-flow prerequisites in existing (n=0) tasks
- ensures that dependencies get waited on when the tasks run
- (not needed for future tasks)
1. satisfy any off-flow prerequisites
- spawns the owner tasks into n=0, and avoid future stall
1. `cylc set --pre=all` any parentless in-group tasks
(i.e. promote them to the task pool)

### CLI

Even though the implementation just sets prerequisites within the target
task group, the trigger command is the appropriate home for this because
it targets multiple tasks at once, and the intention is always to trigger
the (initial tasks of) the group right away - whereas
`cylc set` just sets individual prerequisites or outputs on a single task.

### Outputs?

In principle we could also remove or set out-of-group outputs to prevent
downstream flow-on.

However, we would normally want the flow to continue if the rerun is successful,
and if `cylc remove` is used first (for a past sub-graph) then it won't re-run
downstream tasks beyond the group anyway.

But we could make it option to not flow beyond the bounds of the task group.

-----

## Comparison of Cylc 7 and 8 (current) for sub-graph rerun
## Appendix: Comparison of Cylc 7 and 8 (current) sub-graph rerun

### Cylc 7 sub-graph rerun

Expand All @@ -51,7 +84,7 @@ outside of the group. Setting those prerequisites satisfied will:
2. insert all off-flow parents as waiting
3. reset all the parents to succeeded

Dependency matching will then cause the sub-graph to run correctly.
Dependency matching will then cause the sub-graph to run correctly.

### Cylc 8 sub-graph rerun

Expand All @@ -77,56 +110,13 @@ the flow should continue as normal if the sub-graph re-run was successful.

C7_a is the simplest. It relies on the group and its parents still being in the task
pool, but that is often the case when dealing with same-cycle problems. If so,
users can get away without understanding the task pool.
users can get away without understanding the task pool.

C7_b is the least intuitive of all - it requires some understanding of the task pool,
C7_b is the least intuitive of all - it requires understanding the task pool,
task insertion, and the graph (e.g. to identify off-flow parent tasks).

C8_a is conceptually clean, and general, but it does require an understanding of the
graph structure to identify initial tasks and off-flow prerequisites, and it might
result in unwanted downstream activity.

C8_b is like C8_a, but trades off unwanted downstream activity for `cylc remove`.

## Cylc 8 group trigger proposal

This makes re-running any Cylc 8 sub-graph as easy as the C7_a case.

#### C8_c: Cylc 8 (general, proposal):
1. trigger the group (e.g. by family name, or as a list of task IDs)

### Implementation

For each group member, the trigger method should:
- remove the previous flow (if not `--flow=new`)
- set all off-flow (i.e., out-of-group) prerequisites
- unset all in-flow (i.e., in-group) prerequisites (existing n=0 only)
- "set all" any parentless in-group tasks (i.e., promote them to the task pool)

To achieve this, on matching inconming arguments to n=0 and future tasks:
- record all task IDs in the group
- examine the prerequisites of each task
- for n=0 task proxies, just query their prerequisite objects
- for future tasks, use tasdef methods to see what their prequisites would be
- unset any group-internal prerequisites in n=0 task proxies
- set any off-flow prerequisites
- this will spawn the associated tasks into n=0

### CLI

Even though the implementation just "sets prerequisites" the trigger
command is the more appropriate home for this functionality.
- `cylc set` just sets prerequisites or outputs on individual tasks
- the intention here is always to trigger (the group) right away,
which is not the case with `cylc set`

### Outputs?

In principle we could also remove or set out-of-group outputs to prevent
downstream flow-on.

However, we would normally want the flow to continue as normal if the rerun is
successful, and if `cylc remove` is used (for re-flow as opposed to new-flow)
then it won't re-run downstream tasks that already succeeded in the same flow.

So managing off-flow outputs is not necessary.

0 comments on commit 97e3d3a

Please sign in to comment.