You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are currently doing the metascheduling using total number of unclaimed CPUs and total amount of unclaimed RAM across the entire HTCondor cluster. However @abdulrahmanazab pointed out this can be problematic and suggested to report metrics specifying the type of unclaimed slots.
To make the switch I suggest we move from the current bash script to using the python bindings instead.
Here is a quick and dirty script for you to test:
importhtcondorimportjsoncollector=htcondor.Collector()
# get CPUs and Mem from unclaimed slots and group them in a dictionaryunclaimed_slots_summary= {}
forslotincollector.query(htcondor.htcondor.AdTypes.Startd, constraint='State == "Unclaimed"', projection=['Cpus', 'Memory']):
cpu_mem_combo=slot.printJson()
ifcpu_mem_combonotinunclaimed_slots_summary:
unclaimed_slots_summary[cpu_mem_combo] =1else:
unclaimed_slots_summary[cpu_mem_combo] +=1forunclaimed_slot_typeinunclaimed_slots_summary:
classad=json.loads(unclaimed_slot_type)
print("There are {} slots with {} CPUs and {} memory".format(
unclaimed_slots_summary[unclaimed_slot_type],
classad['Cpus'],
classad['Memory']
))
You need to install the htcondor package. Could you please run it on your end and let me know the results?
Here is a sample output on the EGI pulsar endpoint:
$ python mon.py
There are 5 slots with 8 CPUs and 15736 memory
There are 1 slots with 6 CPUs and 7800 memory
If it works as expected, we just need to decide how to represent this properly using the influxdb protocol format.
The text was updated successfully, but these errors were encountered:
Hi @pauldg and @sanjaysrikakulam
We are currently doing the metascheduling using total number of unclaimed CPUs and total amount of unclaimed RAM across the entire HTCondor cluster. However @abdulrahmanazab pointed out this can be problematic and suggested to report metrics specifying the type of unclaimed slots.
To make the switch I suggest we move from the current
bash
script to using the python bindings instead.Here is a quick and dirty script for you to test:
You need to install the htcondor package. Could you please run it on your end and let me know the results?
Here is a sample output on the EGI pulsar endpoint:
If it works as expected, we just need to decide how to represent this properly using the influxdb protocol format.
The text was updated successfully, but these errors were encountered: