Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crab kill broken in v3.250109 #8900

Closed
belforte opened this issue Jan 29, 2025 · 0 comments
Closed

crab kill broken in v3.250109 #8900

belforte opened this issue Jan 29, 2025 · 0 comments

Comments

@belforte
Copy link
Member

I only noticed while testing new (extensice) changes to DagmanCreator and DagmanSubmitter that crab kill fails. as indicated in #8893 (comment) (BUG 5).

The bug was independent and present already in v3.250109 which we have in preprod since then and luckily did not deploy in production yet.

We need to add crab kill to our validation.

The problem is this line2

if not self.task['tw_name'] or not self.task['clusterid']:
self.logger.info("Task %s was not submitted to HTCondor scheduler yet", self.workflow)
return

introduced in 8ed2195 to fix #8874

The problem is that the task dictionary at this point in the code is what MasterWorker retrieved via a GET to workflowdb REST API in

def getWork(self, limit, getstatus, ignoreTWName=False):

and that API as per
def get(self, workername, getstatus, limit):
""" Retrieve all columns for a specified task or
tasks which are in a particular status with
particular conditions """

uses the SQL from Task.GetReadyTasks_sql which retrieves in the WHERE only a subset of the DB Task table columns. clusterid is not one of them !! In spite of what the comment in the API above states. 😠

Of course this must be because initially the tm_* columns where "all the information", then developers added more columns and not all SQL's have been changed. I do not know if it is possible to have code which retrieves all colums and properly names them, to avoid the need to list column names in so many places...

Anyhow, I do not feel like changing the REST code now. Will add a call in DagmanKiller to retrieve the full info about the task in order to check clusterid

belforte added a commit to belforte/CRABServer that referenced this issue Jan 29, 2025
@belforte belforte changed the title crab kill brokenb in v3.250109 crab kill broken in v3.250109 Jan 29, 2025
@belforte belforte self-assigned this Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant