Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trigger --flow=new doesn't trigger downstream tasks with off-flow dependencies #6265

Closed
ColemanTom opened this issue Jul 30, 2024 · 27 comments
Closed
Labels

Comments

@ColemanTom
Copy link
Contributor

ColemanTom commented Jul 30, 2024

Description

This is only tested with CYLC_VERSION=8.3.3

UPDATE: Tested in 8.2.3, 8.2.7 and a similar (perhaps more dangerous version) of the bug was found there too.

When I have a graph like shown below, if I trigger a new flow for the a task for a cycle, the b task never gets rerun. I thought it would be because it should be part of the flow for that graph. Instead, it gets stuck as waiting. I'm not sure, but this may lead to longer-term issues with subsequent cycles?

Reproducible Example

flow.cylc for a workflow I've called basic-workflow:

[scheduling]
    initial cycle point = 20240724
    [[graph]]
        P1D = """
            @wall_clock => a => b & c
            a[-P1D] => a
            b[-P2D] => b
        """


[runtime]
    [[a,b,c]]
        script = "sleep 1"
  1. Run this workflow.
  2. Wait a while until it has caught up to realtime so its not doing anything (this just makes it easier to monitor).
  3. cylc trigger --flow=new basic-workflow/run3 //20240729/a
  4. Monitor the suite or look at the log
2024-07-30T01:56:41Z INFO - Command "force_trigger_tasks" received. ID=2fd72720-b2f9-47e2-b054-334d130a4387
    force_trigger_tasks(flow=['new'], flow_wait=False, tasks=['20240729/a'])
2024-07-30T01:56:41Z INFO - New flow: 2 (no description) 2024-07-30T01:56:41+00:00
2024-07-30T01:56:41Z INFO - [20240729T0000Z/a(flows=2):waiting(runahead)] => waiting
2024-07-30T01:56:41Z INFO - [20240729T0000Z/a(flows=2):waiting] => waiting(queued)
2024-07-30T01:56:41Z INFO - Command "force_trigger_tasks" actioned. ID=2fd72720-b2f9-47e2-b054-334d130a4387
2024-07-30T01:56:41Z INFO - [20240729T0000Z/a(flows=2):waiting(queued)] => waiting
2024-07-30T01:56:41Z INFO - [20240729T0000Z/a(flows=2):waiting] => preparing
2024-07-30T01:56:42Z INFO - [20240729T0000Z/a/02(flows=2):preparing] submitted to localhost:background[3272823]
2024-07-30T01:56:42Z INFO - [20240729T0000Z/a/02(flows=2):preparing] => submitted
2024-07-30T01:56:44Z INFO - [20240729T0000Z/a/02(flows=2):submitted] => running
2024-07-30T01:56:47Z INFO - [20240729T0000Z/a/02(flows=2):running] => succeeded
2024-07-30T01:56:47Z INFO - [20240729T0000Z/b(flows=2):waiting(runahead)] => waiting
2024-07-30T01:56:47Z INFO - [20240729T0000Z/c(flows=2):waiting(runahead)] => waiting
2024-07-30T01:56:47Z INFO - [20240729T0000Z/c(flows=2):waiting] => waiting(queued)
2024-07-30T01:56:47Z INFO - [20240730T0000Z/a(flows=2):waiting(runahead)] => waiting
2024-07-30T01:56:48Z INFO - [20240730T0000Z/a(flows=2):waiting] => waiting(queued)
2024-07-30T01:56:48Z INFO - [20240729T0000Z/c(flows=2):waiting(queued)] => waiting
2024-07-30T01:56:48Z INFO - [20240730T0000Z/a(flows=2):waiting(queued)] => waiting
2024-07-30T01:56:48Z INFO - [20240730T0000Z/a(flows=2):waiting] => preparing
2024-07-30T01:56:48Z INFO - [20240729T0000Z/c(flows=2):waiting] => preparing
2024-07-30T01:56:49Z INFO - [20240729T0000Z/c/02(flows=2):preparing] submitted to localhost:background[3272923]
2024-07-30T01:56:49Z INFO - [20240729T0000Z/c/02(flows=2):preparing] => submitted
2024-07-30T01:56:49Z INFO - [20240730T0000Z/a/02(flows=2):preparing] submitted to localhost:background[3272924]
2024-07-30T01:56:49Z INFO - [20240730T0000Z/a/02(flows=2):preparing] => submitted
2024-07-30T01:56:52Z INFO - [20240729T0000Z/c/02(flows=2):submitted] => running
2024-07-30T01:56:52Z INFO - [20240730T0000Z/a/02(flows=2):submitted] => running
2024-07-30T01:56:54Z INFO - [20240729T0000Z/c/02(flows=2):running] => succeeded
2024-07-30T01:56:54Z INFO - [20240730T0000Z/a/02(flows=2):running] => succeeded
2024-07-30T01:56:54Z INFO - [20240730T0000Z/b(flows=2):waiting(runahead)] => waiting
2024-07-30T01:56:54Z INFO - [20240730T0000Z/c(flows=2):waiting(runahead)] => waiting
2024-07-30T01:56:54Z INFO - [20240730T0000Z/c(flows=2):waiting] => waiting(queued)
2024-07-30T01:56:54Z INFO - [20240731T0000Z/a(flows=1,2):waiting] merged in flow(s) 2
2024-07-30T01:56:55Z INFO - [20240730T0000Z/c(flows=2):waiting(queued)] => waiting
2024-07-30T01:56:55Z INFO - [20240730T0000Z/c(flows=2):waiting] => preparing
2024-07-30T01:56:56Z INFO - [20240730T0000Z/c/02(flows=2):preparing] submitted to localhost:background[3273094]
2024-07-30T01:56:56Z INFO - [20240730T0000Z/c/02(flows=2):preparing] => submitted
2024-07-30T01:56:58Z INFO - [20240730T0000Z/c/02(flows=2):submitted] => running
2024-07-30T01:57:01Z INFO - [20240730T0000Z/c/02(flows=2):running] => succeeded

Expected Behaviour

b task should run as it is a brand new flow. That the other tasks it relies on didn't get rerun as a new flow I don't think should matter.

@ColemanTom ColemanTom added the bug Something is wrong :( label Jul 30, 2024
@ColemanTom ColemanTom changed the title trigger --flow=new for task with a cross-cycle dependency trigger --flow=new doesn't trigger downstream tasks with cross-cycle dependencies Jul 30, 2024
@ColemanTom
Copy link
Contributor Author

NOTE: I'm only talking about cross-cycle dependency, but I imagine if you have a=>b; c=>b and only trigger a, then b will not be run.

@ColemanTom
Copy link
Contributor Author

For me, unless there is a workaround, this is a big problem as I can't see how to trigger a new flow correctly when tasks have extra dependencies other than the task you have triggered.

@ColemanTom
Copy link
Contributor Author

ColemanTom commented Jul 30, 2024

Trying with 8.2.3 and 8.2.7, there was an oddity back then, but it didn't stall a workflow.

2024-07-30T02:22:26Z INFO - New flow: 3 (no description) 2024-07-30 02:22:26
2024-07-30T02:22:26Z INFO - [20240727T0000Z/a waiting(runahead) job:01 flows:3] => waiting
2024-07-30T02:22:26Z INFO - [20240727T0000Z/a waiting job:01 flows:3] => waiting(queued)
2024-07-30T02:22:26Z INFO - Command actioned: force_trigger_tasks(['20240727/a'], flow=['new'], flow_wait=False, flow_descr=None)
2024-07-30T02:22:26Z INFO - [20240727T0000Z/a waiting(queued) job:01 flows:3] => waiting
2024-07-30T02:22:26Z INFO - [20240727T0000Z/a waiting job:02 flows:3] => preparing
2024-07-30T02:22:27Z INFO - [20240727T0000Z/a preparing job:02 flows:3] submitted to localhost:background[3285875]
2024-07-30T02:22:27Z INFO - [20240727T0000Z/a preparing job:02 flows:3] => submitted
2024-07-30T02:22:27Z INFO - [20240727T0000Z/a submitted job:02 flows:3] health: submission timeout=None, polling intervals=PT15M,...
2024-07-30T02:22:29Z INFO - [20240727T0000Z/a submitted job:02 flows:3] => running
2024-07-30T02:22:29Z INFO - [20240727T0000Z/a running job:02 flows:3] health: execution timeout=None, polling intervals=PT15M,...
2024-07-30T02:22:32Z INFO - [20240727T0000Z/a running job:02 flows:3] => succeeded
2024-07-30T02:22:32Z INFO - [20240728T0000Z/a waiting(runahead) job:01 flows:3] => waiting
2024-07-30T02:22:32Z INFO - [20240727T0000Z/b waiting(runahead) job:01 flows:3] => waiting
2024-07-30T02:22:32Z INFO - [20240727T0000Z/c waiting(runahead) job:01 flows:3] => waiting
2024-07-30T02:22:32Z INFO - [20240727T0000Z/c waiting job:01 flows:3] => waiting(queued)
2024-07-30T02:22:33Z INFO - xtrigger satisfied: wall_clock = wall_clock(trigger_time=1722124800)
2024-07-30T02:22:33Z INFO - [20240728T0000Z/a waiting job:01 flows:3] => waiting(queued)
2024-07-30T02:22:33Z INFO - [20240727T0000Z/c waiting(queued) job:01 flows:3] => waiting
2024-07-30T02:22:33Z INFO - [20240728T0000Z/a waiting(queued) job:01 flows:3] => waiting
2024-07-30T02:22:33Z INFO - [20240727T0000Z/c waiting job:02 flows:3] => preparing
2024-07-30T02:22:33Z INFO - [20240728T0000Z/a waiting job:02 flows:3] => preparing
2024-07-30T02:22:35Z INFO - [20240727T0000Z/c preparing job:02 flows:3] submitted to localhost:background[3285960]
2024-07-30T02:22:35Z INFO - [20240727T0000Z/c preparing job:02 flows:3] => submitted
2024-07-30T02:22:35Z INFO - [20240727T0000Z/c submitted job:02 flows:3] health: submission timeout=None, polling intervals=PT15M,...

The b task didn't ever rerun, it just got removed from the graph. So, that was a bug too I think. It has just evolved and become more visible. The old version was probably more dangerous as it gave false confidence tasks were run.

@hjoliver
Copy link
Member

hjoliver commented Jul 30, 2024

Trying with 8.2.3 and 8.2.7, ... The b task didn't ever rerun, it just got removed from the graph. So, that was a bug too I think. It has just evolved and become more visible. The old version was probably more dangerous as it gave false confidence tasks were run.

Prior to 8.3.0 tasks with partially satisfied prerequisites were "hidden" to make it look as if they got spawned (into the active task pool) once all their prerequisites were satisfied, rather than when the first prerequisite was satisfied. This was a decision made early in implementation of the Cylc 8 scheduling algorithm that we later decided was a mistake.

@hjoliver
Copy link
Member

hjoliver commented Jul 30, 2024

This is not a bug, but you've highlighted something that we really need to hammer home in documenting manual interventions in Cylc 8.

A flow is a self-consistent, self-sustaining run through the graph, and as such the prerequisites of all affected tasks must be satisfied within the flow. It has to be that way because if you start a new flow in the past then technically ALL of the prerequisites (in the past graph) have already been satisfied by the original flow - so if we allowed old-flow outputs to satisfy new-flow prerequisites then all of the new-flow tasks could run at once without waiting on any dependencies.

Also, given that graph edges typically represent real (file) dependencies, it's perfectly possible that you would not want your next-cycle task a to run until its off-flow dependence on the earlier b was marked (somehow) as satisfied in the context of the new flow (e.g. by manually copying a file back into the run directory, or whatever).

All you need to do is manually set a's off-flow prerequisite to satisfied (in flow 2) with cylc set. (This will only be needed for the first downstream cycle, in your graph).

Note we have recently discussed an easier way to automatically detect off-flow prerequisites, for cases when you can easily identify all the tasks that should run in the new flow, but I can't find the issue right now (maybe it was in Element chat...) ... I'll make sure that's captured in an issue.

@hjoliver
Copy link
Member

@ColemanTom - if you're happy with my explanation, I'll change the title of this issue (the cross-cycle dependence is not relevant) and close it.

@ColemanTom
Copy link
Contributor Author

ColemanTom commented Jul 30, 2024

Yes, happy for the title to be changed but I think this is a major flaw. For complex graphs, that could be many cylc set required. It becomes very unwieldy and then requires a lot more detailed knowledge of the graph. If I want to rerun a whole cycle again, I had thought I could just trigger a new flow from the first task. Now it seems I need to know about other pre-req's across cycles and do a dozen cylc set commands. Assume there is no CLI access, that is a lot of clicks. And this is just a very simple model I'm currently looking at, it just has an N number of cross-cycle dependencies (minimal graph example below).

            @wall_clock => start => generate_tide_<DAY_OFFSET>day
{% for day in range(-PAST_DAYS, FUTURE_DAYS) %}
            generate_tide_{{ day+1 }}day[-PT{{ CYCLE_FREQUENCY }}H] => generate_tide_{{ day }}day
{% endfor %}

@hjoliver hjoliver changed the title trigger --flow=new doesn't trigger downstream tasks with cross-cycle dependencies trigger --flow=new doesn't trigger downstream tasks with off-flow dependencies Jul 30, 2024
@hjoliver
Copy link
Member

hjoliver commented Jul 30, 2024

I disagree that it's a flaw, let alone a major flaw.

If you want to run arbitrary new flows within the same graph then dealing with off-flow prerequisites is unavoidable - when the new flow spawns a new task, how should we determine which of its other prerequisites must wait on parents within the new flow and which to take as magically satisfied by the old flow?

This applies to Cylc 7 too, although in simple cases you could get away with not really understanding that if all the affected tasks remained in the task pool, and if you knew which tasks they were so you could manually reset them all to waiting to set up the flow. (Arbitrary new flows from any point in the graph were basically impossible pre Cylc 8).

The good news is (the discussion/issue I referred to just above) those cases (the ones where you could fake a limited new flow with a bunch of manual state resetting to set it up) will soon be much easier in Cylc 8. Not by means of fixing a flaw, but by making it easy to say "here's all the tasks that can participate in the new flow, and any dependence on tasks outside of that group should be ignored" (where it is appropriate to ignore off-flow dependence, which as I said is not always the case).

@hjoliver hjoliver removed the bug Something is wrong :( label Jul 30, 2024
@hjoliver
Copy link
Member

If I want to rerun a whole cycle again, I had thought I could just trigger a new flow from the first task.

In Cylc 7 that required a warm start, which automatically ignores off-flow prerequisites (which in this case means previous-cycle dependence). But if you want to do that in a running Cylc 7 workflow, good luck!

@ColemanTom
Copy link
Contributor Author

ColemanTom commented Jul 30, 2024

In Cylc 7 that required a warm start, which automatically ignores off-flow prerequisites (which in this case means previous-cycle dependence). But if you want to do that in a running Cylc 7 workflow, good luck!

In cylc 7 I could just reset the whole cycle to waiting and it worked. I know you have said in the past that is just coincidence, but with how the graphs were setup, it was a safe assumption that it would work for the models I'm talking about. I only work in ops, I don't reset past the current cycle. I'm never discussing rewinding more than the current cycle.

@hjoliver
Copy link
Member

Oh, OK, understood - I'll partially retract my previous "good luck!" comment then! However, that working relies on ALL the tasks in the entire cycle point that you want to rerun remaining in the Cylc 7 task pool.

Rerunning an entire cycle actually tends to be simpler than the general case - for full-cycles we can be confident that off-flow prerequisites are (a) easily identified, and (b) the user definitely wants to ignore them.

In Cylc 8 you'll soon be able to do that too, with something like "cylc trigger --cycle=x". But for the moment, you have to do it the general way:

  • manually trigger the first task(s) in the graph for that cycle
  • manually set any off-flow prerequisites that you want to be ignored

@ColemanTom
Copy link
Contributor Author

If you want to run arbitrary new flows within the same graph then dealing with off-flow prerequisites is unavoidable - when the new flow spawns a new task, how should we determine which of its other prerequisites must wait on parents within the new flow and which to take as magically satisfied by the old flow?

That is a very fair comment that I had not considered. It just is making my job very difficult right now as I'm trying to figure out how to adjust support instructions to people who will not be able to understand Cylc graphs and have limited time to do lots of set commands (and they do not have CLI access so can't provide them a simple command to run and they can't run cylc show to check on blockages).

If you remember the rewind discussions, it also makes that potentially more complicated, I'll need to dive into that more again to check.

@ColemanTom
Copy link
Contributor Author

ColemanTom commented Jul 30, 2024

Quick question with partial rewind of a cycle

# hour = 0,1,2,3,4,5
wait<hour-1> => wait<hour>
wait<hour> => task1<hour>
                   => task2<hour>
                   => archive

If I have processed up to hour=4 and want to rewind to hour=2. If I trigger a new flow from wait<hour=2>, will that cause the archive task to get stuck because task2<hour .lt. 2> don't exist in the new flow? Logic would be required to wait until task2<hour .lt. 2> finish, then do a set on them?

@hjoliver
Copy link
Member

It just is making my job very difficult right now as I'm trying to figure out how to adjust support instructions to people who will not be able to understand Cylc graphs and have limited time to do

That's a fair comment too - we've had exactly the same from the UK support people. Initially my opinion was it's just a matter of getting used to a better - albeit new and different! - way to do things, but I've grudgingly conceded that it may genuinely be easier to do some kinds of restricted runtime interventions in Cylc 7 compared to Cylc 8 as it is right now. Power and generality will win out in the end, but unfortunately that doesn't help you much in the present.

@hjoliver
Copy link
Member

and they can't run cylc show to check on blockages).

The upcoming task metadata UI view is a very easy win at this point (the data is available to the UI and @oliver-sanders already mocked up an example), but we've not had time to prioritize it just yet. Maybe something your site could look at?

@ColemanTom
Copy link
Contributor Author

and they can't run cylc show to check on blockages).

The upcoming task metadata UI view is a very easy win at this point (the data is available to the UI and @oliver-sanders already mocked up an example), but we've not had time to prioritize it just yet. Maybe something your site could look at?

@ScottWales I'm guessing you wouldn't have time to do this?

@hjoliver
Copy link
Member

Having seen @ScottWales multi-user GUI, he would find this super-easy 😁

@hjoliver
Copy link
Member

Quick question with partial rewind of a cycle

Can you just confirm that the two different parameterized forms of wait are not a typo?

@ColemanTom
Copy link
Contributor Author

ColemanTom commented Jul 30, 2024

Quick question with partial rewind of a cycle

Can you just confirm that the two different parameterized forms of wait are not a typo?

Updated. They were mistakes. Initially I had member in there, but then I realised it doesn't matter for this example - I had just been copying an existing task combination.

@hjoliver
Copy link
Member

hjoliver commented Jul 30, 2024

If I have processed up to hour=4 and want to rewind to hour=2. If I trigger a new flow from wait<hour=2>, will that cause the archive task to get stuck because task2<hour .lt. 2> don't exist in the new flow?

Yes.

Logic would be required to wait until task2<hour .lt. 2> finish, then do a set on them?

No (good news!) - you can set those prerequisites ahead of time, at the same time you trigger the new flow (note for multiple ops like this you can only use --flow=new once - after that use the explicit flow number.

Equivalently, in this case, you can set the upstream outputs of the task_hour tasks instead of the prerequisites of archive, also ahead of time, no need to wait for a stall.

image

@ScottWales
Copy link
Contributor

@ScottWales I'm guessing you wouldn't have time to do this?

I can have a bit of a look, can't put that much time into it though

No (good news!) - you can set those prerequisites ahead of time, at the same time you trigger the new flow (note for multiple ops like this you can only use --flow=new once - after that use the explicit flow number.

Is there a scriptable way to get the flow number that --flow=new creates and feed it to further operations? Or otherwise list what flows currently exist?

@hjoliver
Copy link
Member

I can have a bit of a look, can't put that much time into it though

Sometimes managers just need a bit of subtle persuasion :-)

Is there a scriptable way to get the flow number that --flow=new creates and feed it to further operations? Or otherwise list what flows currently exist?

We plan to expose this in the UI soon. For the moment:

  • flow numbers > 1 are always logged with task IDs in the scheduler log
  • the run DB records flow numbers in relevant tables

Log:

...
INFO - [1/wait_hour4/01:preparing] => submitted
INFO - [1/task2_hour2/01:submitted] => running
INFO - [1/task2_hour2/01:running] => succeeded
INFO - Command "force_trigger_tasks" received. ID=b169ce6c-766d-47f3-8e95-f32eca031d4a
   force_trigger_tasks(flow=['new'], flow_wait=False, tasks=['1/wait_hour2'])
INFO - New flow: 2 (no description) 2024-07-30T23:15:06
INFO - [1/wait_hour2(flows=2):waiting(runahead)] => waiting
INFO - [1/wait_hour2(flows=2):waiting] => waiting(queued)
INFO - Command "force_trigger_tasks" actioned. ID=b169ce6c-766d-47f3-8e95-f32eca031d4a
INFO - [1/wait_hour2(flows=2):waiting(queued)] => waiting
INFO - [1/wait_hour2(flows=2):waiting] => preparing
INFO - [1/task1_hour3/01:submitted] => running
INFO - [1/task1_hour3/01:running] => succeeded
...

DB:

$ sqlite3 -header ~/cylc-run/tom/runN/log/db 'select cycle, name, flow_nums, status from task_states'
cycle|name|flow_nums|status
1|wait_hour0|[1]|succeeded
1|wait_hour1|[1]|succeeded
1|task1_hour0|[1]|succeeded
1|task1_hour1|[1]|succeeded
1|wait_hour2|[1]|succeeded
1|task2_hour0|[1]|succeeded
1|task2_hour1|[1]|succeeded
...

@hjoliver
Copy link
Member

hjoliver commented Jul 30, 2024

I'll close this issue. Summary:

  • off-flow prerequisites are not automatically satisfied, by design (not a bug)
  • there is a valid question on how to automatically identify and satisfy them if that is what's desired, but that can be covered but other issues ...

@hjoliver
Copy link
Member

Rerunning an entire cycle actually tends to be simpler than the general case - for full-cycles we can be confident that off-flow prerequisites are (a) easily identified, and (b) the user definitely wants to ignore them.

In Cylc 8 you'll soon be able to do that too, with something like "cylc trigger --cycle=x". But for the moment, you have to do it the general way:

See #5416

@hjoliver
Copy link
Member

FYI @ColemanTom -

The good news is (the discussion/issue I referred to just above) those cases (the ones where you could fake a limited new flow with a bunch of manual state resetting to set it up) will soon be much easier in Cylc 8. Not by means of fixing a flaw, but by making it easy to say "here's all the tasks that can participate in the new flow, and any dependence on tasks outside of that group should be ignored" (where it is appropriate to ignore off-flow dependence, which as I said is not always the case).

See Group Trigger Proposal

@ColemanTom
Copy link
Contributor Author

No (good news!) - you can set those prerequisites ahead of time, at the same time you trigger the new flow (note for multiple ops like this you can only use --flow=new once - after that use the explicit flow number.

How does one pre-set the prereqs?

@hjoliver
Copy link
Member

hjoliver commented Jul 31, 2024

Just use cylc set --pre=PREREQ (or the same via the GUI) on the target task. (Or have I misunderstood the question?)

@MetRonnie MetRonnie closed this as not planned Won't fix, can't repro, duplicate, stale Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants