Updating HTCondor metrics #23

sebastian-luna-valero · 2024-11-22T15:59:24Z

We are currently doing the metascheduling using total number of unclaimed CPUs and total amount of unclaimed RAM across the entire HTCondor cluster. However @abdulrahmanazab pointed out this can be problematic and suggested to report metrics specifying the type of unclaimed slots.

To make the switch I suggest we move from the current bash script to using the python bindings instead.

Here is a quick and dirty script for you to test:

import htcondor
import json

collector = htcondor.Collector()

# get CPUs and Mem from unclaimed slots and group them in a dictionary
unclaimed_slots_summary = {}
for slot in collector.query(htcondor.htcondor.AdTypes.Startd, constraint = 'State == "Unclaimed"', projection=['Cpus', 'Memory']):
    cpu_mem_combo = slot.printJson()
    if cpu_mem_combo not in unclaimed_slots_summary:
        unclaimed_slots_summary[cpu_mem_combo] = 1
    else:
        unclaimed_slots_summary[cpu_mem_combo] += 1

for unclaimed_slot_type in unclaimed_slots_summary:
    classad = json.loads(unclaimed_slot_type)
    print("There are {} slots with {} CPUs and {} memory".format(
        unclaimed_slots_summary[unclaimed_slot_type],
        classad['Cpus'],
        classad['Memory']
    ))

You need to install the htcondor package. Could you please run it on your end and let me know the results?

Here is a sample output on the EGI pulsar endpoint:

$ python mon.py 
There are 5 slots with 8 CPUs and 15736 memory                                                                                                                                                                     
There are 1 slots with 6 CPUs and 7800 memory

If it works as expected, we just need to decide how to represent this properly using the influxdb protocol format.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updating HTCondor metrics #23

Updating HTCondor metrics #23

sebastian-luna-valero commented Nov 22, 2024 •

edited

Loading

Updating HTCondor metrics #23

Updating HTCondor metrics #23

Comments

sebastian-luna-valero commented Nov 22, 2024 • edited Loading

sebastian-luna-valero commented Nov 22, 2024 •

edited

Loading