Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle job log directory deleted for active task #6425

Open
oliver-sanders opened this issue Oct 17, 2024 · 3 comments · May be fixed by #6577
Open

handle job log directory deleted for active task #6425

oliver-sanders opened this issue Oct 17, 2024 · 3 comments · May be fixed by #6577
Assignees
Labels
bug Something is wrong :(
Milestone

Comments

@oliver-sanders
Copy link
Member

Spotted in the wild!

If you delete the job log directory for an active task, Cylc will preserve its last known status indefinitely. I.e, Cylc will consider the job to be submitted/running forever.

In this case it was caused by a housekeep task being triggered whilst other tasks in the cycle were still running. The housekeep task tarred up the log/job/<cycle> dir removing the job status files in the process.

This situation should be handled similarly to the job no longer appearing in the queue, i.e, the job is dead, long live the job. Stick it into the failed/submit-failed state as appropriate.

@oliver-sanders oliver-sanders added the bug Something is wrong :( label Oct 17, 2024
@oliver-sanders oliver-sanders added this to the 8.3.x milestone Oct 17, 2024
@MetRonnie MetRonnie modified the milestones: 8.4.x, 8.4.1 Jan 9, 2025
@wxtim wxtim self-assigned this Jan 23, 2025
@wxtim
Copy link
Member

wxtim commented Jan 24, 2025

@oliver-sanders - do you have a recipe for reproduction of this bug. From the description I tried

[scheduling]
    cycling mode = integer
    [[graph]]
        R1 = task:started => housekeep

[runtime]
    [[task]]
        script = sleep 12
        platform = remote  # Tried this in case it were necessary
    [[housekeep]]
        script = """
            RMTHIS=${CYLC_WORKFLOW_RUN_DIR}/log/job/1/task
            echo "Housekeeping ${RMTHIS}"
            rm -fr "${RMTHIS}"
        """

@oliver-sanders
Copy link
Member Author

No id don't, your example seems lonjg the right lines, you might want to try using a batch system rather than background as the polling is different.

@wxtim
Copy link
Member

wxtim commented Jan 28, 2025

Finally have a replicable example (Thank you @oliver-sanders)

[scheduling]
    cycling mode = integer
    [[graph]]
        R1 = task

[runtime]
    [[task]]
        script = """
            rm ${CYLC_WORKFLOW_RUN_DIR}/.service/contact
            rm -r "${CYLC_WORKFLOW_RUN_DIR}/log/job/${CYLC_TASK_CYCLE_POINT}/${CYLC_TASK_NAME}"
        """
        platform = _remote_pbs

@wxtim wxtim linked a pull request Jan 28, 2025 that will close this issue
8 tasks
@wxtim wxtim linked a pull request Jan 29, 2025 that will close this issue
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :(
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants