Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow 'cylc reload' to reload global configuration #6509

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open
22 changes: 21 additions & 1 deletion cylc/flow/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@
RunMode,
StopMode,
)
from cylc.flow.cfgspec.glbl_cfg import glbl_cfg


if TYPE_CHECKING:
Expand Down Expand Up @@ -329,7 +330,7 @@


@_command('reload_workflow')
async def reload_workflow(schd: 'Scheduler'):
async def reload_workflow(schd: 'Scheduler', reload_global: bool = False):
"""Reload workflow configuration."""
yield
# pause the workflow if not already
Expand Down Expand Up @@ -363,6 +364,25 @@
# give commands time to complete
sleep(1) # give any remove-init's time to complete

if reload_global:
# Reload global config if requested
LOG.info("Reloading the global configuration.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will update the workflow status message (shown in the GUI toolbar) to reflect what Cylc is doing:

Suggested change
LOG.info("Reloading the global configuration.")
LOG.info("Reloading the global configuration.")
schd.reload_pending = 'reloading the global configuration'
schd.update_data_store() # update workflow status msg
await schd.update_data_structure()

It's useful info incase the operation gets stuck with a filesystem issue, or incase loading the config takes a long time for whatever reason.

try:
glbl_cfg(reload=True)
except (ParsecError, CylcConfigError) as exc:
if cylc.flow.flags.verbosity > 1:
# log full traceback in debug mode
LOG.exception(exc)

Check warning on line 375 in cylc/flow/commands.py

View check run for this annotation

Codecov / codecov/patch

cylc/flow/commands.py#L375

Added line #L375 was not covered by tests
LOG.critical(
f'Reload failed - {exc.__class__.__name__}: {exc}'
'\nThis is probably due to an issue with the new'
' configuration.'
'\nTo continue with the pre-reload config, un-pause the'
' workflow.'
'\nOtherwise, fix the configuration and attempt to reload'
' again.'
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way this is implemented, if the global config fails to reload, we log the error, but continue and reload the workflow config.

I see cylc reload --global as a transaction that reloads both the global config and the workflow config. Ideally, if either of these operations fail, we would rollback the transaction. I.e, either both operations happen or neither happen, the workflow is always left in an intended state.

So I would expect it to bail at this point. I.E, If the global reload fails, I would expect the logic to skip straight to this block of logic at the end of the routine:

    # resume the workflow if previously paused
    schd.reload_pending = False
    schd.update_data_store()  # update workflow status msg
    schd._update_workflow_state()
    if not was_paused_before_reload:
        schd.resume_workflow()
        schd.process_workflow_db_queue()  # see #5593

That does leave us with the situation where the global config does reload, but the workflow config doesn't (e.g. a validation error). Currently, the global config change will not roll back in this situation, but we could get it to do that something along the lines of:

if reload_global:
    ...
    try:
        cfg = glbl_cfg(cached=False)
    except ...:
        ...

...
# reload the workflow definition
...

glbl_cfg().reload(cfg)

(would require a reload method on the GlobalConfig object)


# reload the workflow definition
schd.reload_pending = 'loading the workflow definition'
schd.update_data_store() # update workflow status msg
Expand Down
4 changes: 4 additions & 0 deletions cylc/flow/network/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -1886,6 +1886,10 @@ class Meta:
class Arguments:
workflows = graphene.List(WorkflowID, required=True)

reload_global = Boolean(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this is a breaking change that will cause errors when cylc reload is run against workflows using a different version both on the CLI and in the GUI.

Somewhat surprisingly, we don't yet have a working pattern for adding new fields to mutations. It's one of the things on the TODO list that we haven't found time for yet!

  1. If the new field appears in the mutation and the scheduler doesn't know about it -> error.
  2. If the new field doesn't appear in the mutation but the scheduler is expecting it -> error.

I think we can get around both issues by marking this field as optional (required=False) and omitting the new field from the cylc reload mutation unless the --global option is used.

This way the mutation should remain forward and backward compatible so long as the --global option isn't used. And should only fail for older schedulers where the --global option is used (which is correct).

Sadly, we don't have any automated inter-version testing as yet (another one for the TODO list) so this will need to be tested manually for now.

default_value=False,
description="Also reload global config")

Comment on lines +1922 to +1925
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, all arguments added here are automatically listed in the GUI. The description text goes into the help bubble.

Here's how this option will appear:

Screenshot from 2025-01-28 14-49-23

result = GenericScalar()


Expand Down
11 changes: 10 additions & 1 deletion cylc/flow/scripts/reload.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,10 +69,12 @@

MUTATION = '''
mutation (
$wFlows: [WorkflowID]!
$wFlows: [WorkflowID]!,
$reloadGlobal: Boolean,
) {
reload (
workflows: $wFlows
reloadGlobal: $reloadGlobal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[In relation to above comment]

I think we can avoid inter-version interoperability issues providing we omit the new reloadGlobal field when --global was not specified.

) {
result
}
Expand All @@ -87,6 +89,12 @@
multiworkflow=True,
argdoc=[WORKFLOW_ID_MULTI_ARG_DOC],
)

parser.add_option(

Check warning on line 93 in cylc/flow/scripts/reload.py

View check run for this annotation

Codecov / codecov/patch

cylc/flow/scripts/reload.py#L93

Added line #L93 was not covered by tests
"-g", "--global",
help="also reload global configuration.",
action="store_true", default=False, dest="reload_global")

return parser


Expand All @@ -97,6 +105,7 @@
'request_string': MUTATION,
'variables': {
'wFlows': [workflow_id],
'reloadGlobal': options.reload_global,
}
}

Expand Down
1 change: 0 additions & 1 deletion cylc/flow/task_proxy.py
Original file line number Diff line number Diff line change
Expand Up @@ -344,7 +344,6 @@ def copy_to_reload_successor(
reload_successor.summary = self.summary
reload_successor.local_job_file_path = self.local_job_file_path
reload_successor.try_timers = self.try_timers
reload_successor.platform = self.platform
ScottWales marked this conversation as resolved.
Show resolved Hide resolved
reload_successor.job_vacated = self.job_vacated
reload_successor.poll_timer = self.poll_timer
reload_successor.timeout = self.timeout
Expand Down
70 changes: 70 additions & 0 deletions tests/integration/test_reload.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
TASK_STATUS_PREPARING,
TASK_STATUS_SUBMITTED,
)
from cylc.flow.cfgspec.glbl_cfg import glbl_cfg


async def test_reload_waits_for_pending_tasks(
Expand Down Expand Up @@ -146,3 +147,72 @@ async def test_reload_failure(

# the config should be unchanged
assert schd.config.cfg['scheduling']['graph']['R1'] == 'one'


async def test_reload_global(
flow,
one_conf,
scheduler,
start,
log_filter,
tmp_path,
monkeypatch,
):

global_config_path = tmp_path / 'global.cylc'
monkeypatch.setenv("CYLC_CONF_PATH", str(global_config_path.parent))

# Original global config file
global_config_path.write_text("""
[platforms]
[[localhost]]
[[[meta]]]
x = 1
""")
assert glbl_cfg(reload=True).get(['platforms','localhost','meta','x']) == '1'

id_ = flow(one_conf)
schd = scheduler(id_)
async with start(schd):

# Modify the global config file
global_config_path.write_text("""
[platforms]
[[localhost]]
[[[meta]]]
x = 2
""")

# reload the workflow and global config
await commands.run_cmd(commands.reload_workflow(schd, reload_global=True))

# Global config should have been reloaded
assert log_filter(
contains=(
'Reloading the global configuration.'
)
)

# Task platforms reflect the new config
assert schd.pool.get_tasks()[0].platform['meta']['x'] == '2'

# Modify the global config file with an error
global_config_path.write_text("""
[ERROR]
[[localhost]]
[[[meta]]]
x = 3
""")

# reload the workflow and global config
await commands.run_cmd(commands.reload_workflow(schd, reload_global=True))

# Error is noted in the log
assert log_filter(
contains=(
'This is probably due to an issue with the new configuration.'
)
)

# Task platforms should be the last valid value
assert schd.pool.get_tasks()[0].platform['meta']['x'] == '2'
Loading