Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call zesInit() instead of setting ZES_ENABLE_SYSMAN from a constructor function #687

Closed
al45tair opened this issue Sep 2, 2024 · 26 comments

Comments

@al45tair
Copy link

al45tair commented Sep 2, 2024

It looks like hwloc changes the environment from a constructor function; this seems unpleasant — it means that very likely the first thing your program will do is allocate memory and copy the environment.

Setting ZES_ENABLE_SYSMAN=1 explicitly as a workaround will stop this behaviour, but it would be better if hwloc called zesInit(0) rather than changing the environment.

@al45tair
Copy link
Author

al45tair commented Sep 2, 2024

I note intel/compute-runtime#650, which might be the reason hwloc was not using zesInit().

(I'm not a huge fan of calling zesInit() from a constructor function either, actually, but that seems better than changing the environment.)

@bgoglin
Copy link
Contributor

bgoglin commented Sep 2, 2024

Hello. Yes this is unpleasant, and it caused many complains, veeeery long discussion with ZE developers, but we had no way to do otherwise :/

The compute-runtime issue is indeed the main reason while we didn't use zesInit() yet. Supporting both zesInit() and no-zesInit() in the same code looked like a big mess. Hence the plan was to just wait, assume everybody switched to a modern compute-runtime with a working zesInit(), and then just drop the constructor entirely.

Then came the issue with ZES and ZE using difference device handles, and conversion routines were still not implemented recently (oneapi-src/level-zero#153).

Also there's a new "unified runtime" in the works, which may make some/most of this work obsolete at some point. Honestly, I was planning to wait until we know where we're going.

Do you need a fix? Or can you live with a workaround (disabling levelzero as build-time, setting ZES_ENABLE_SYSMAN before hwloc does, ...)?

Regarding calling zesInit() inside the constructor, I don't remember if it was required in some cases. My PR was just calling it from the normal hwloc code but it wasn't much tested since zesInit() wasn't implemented at that time.

@al45tair
Copy link
Author

al45tair commented Sep 2, 2024

:-) I think we're both in a similar bind; the Swift runtime does the nasty thing it does to work around a problem with Rosetta and Docker, but that relies on things not changing the environment from constructors (which is a pretty mad thing to be doing, so it seemed a reasonable compromise; we detect the case when the environment has moved and cope with it, so the worst that happens is you end up with no command line arguments). Trouble is, someone linked OpenCV and pulled in hwloc, which then did exactly that :-D

I think setting ZES_ENABLE_SYSMAN=1 in the environment before running their program is a reasonable workaround for people using OpenCV with Swift for now. This probably only affects a relatively small number of people and from what you say, I don't think either we (Swift) or you are in a position to do much to fix it in the short term.

@TApplencourt
Copy link

TApplencourt commented Oct 22, 2024

Hi,

Intel recently updated their documentation about SYSMAN initialization ( https://github.com/intel/compute-runtime/blob/master/programmers-guide/SYSMAN.md )

I'm as happy as you and we (Argonne / Aurora people) try to push Intel to fix some of their hum interesting behavior and implementation choices,

@saik-intel
Copy link

as we documented about Sysman initialization with https://github.com/intel/compute-runtime/blob/master/programmers-guide/SYSMAN.md , we recommend users to switch using zesInit instead of zeInit + ZES_ENABLE_SYSMAN. please let us know what is the issue for HWLOC switching to zesInit

@bgoglin
Copy link
Contributor

bgoglin commented Oct 23, 2024

@saik-intel The doc isn't very clear. The way to find the sysman device handle corresponding to a core device handle is to get the uuid from the core device, call zesDriverGetDeviceByUuidExp() to get the index, and then use that index in the device array returned by zesGetDevice? Is there an easy way to translate between core and sysman driver handles too? Or should I just iterate over each driver until I find the right UUID?

@TApplencourt
Copy link

A nice complexity also arrisse due to the fact that "zesInit + (Legacy mode) Or (Legacy mode) + zesInit" is not supported.

A user code who is using the "Legacy Mode" (aka calling zeInit + setting ZES_ENABLE_SYSMAN=1) and use the C API of hwlock, will need hwlock to not call "zesInit" to avoid a crash...

@AshwinKumarKulkarni
Copy link

AshwinKumarKulkarni commented Oct 23, 2024

@saik-intel The doc isn't very clear. The way to find the sysman device handle corresponding to a core device handle is to get the uuid from the core device, call zesDriverGetDeviceByUuidExp() to get the index, and then use that index in the device array returned by zesGetDevice? Is there an easy way to translate between core and sysman driver handles too? Or should I just iterate over each driver until I find the right UUID?

please use below sample code snippet to map core device handle to sysman device handle and adapt/optimize for your environment. For simplicity sample assumes driver count is 1.
`zes_device_handle_t getSysmandeviceHandleFromCoreHandle(ze_device_handle_t hDevice){

//step 1: Get the UUID of the core device handle
ze_device_properties_t deviceProperties = { ZE_STRUCTURE_TYPE_DEVICE_PROPERTIES };
ze_result_t result = zeDeviceGetProperties(hDevice, &deviceProperties);
if (result != ZE_RESULT_SUCCESS) {
    printf("Error: zeDeviceGetProperties failed, result = %d\n", result);
    return nullptr;
}

//Step 2: copy uuid from ze_device_uuid_t to zes_uuid_t 
zes_uuid_t uuid;
memcpy(uuid.id, deviceProperties.uuid.id, ZE_MAX_DEVICE_UUID_SIZE);

//query sysman device handle using uuid

//Step 3: Query driverHandle for use if not already queried
uint32_t driverCount = 0;
result = zesDriverGet(&driverCount, nullptr);
if (driverCount == 0) {
    printf("Error could not retrieve driver\n");
    exit(-1);
}
ze_driver_handle_t sysmanDriverHandle;
driverCount = 1;
result = zesDriverGet(&driverCount, &sysmanDriverHandle);
if (result != ZE_RESULT_SUCCESS) {
    printf("Error:  zesDriverGet failed, result = %d\n", result);
    return nullptr;
}

//Step 4: Query equivalent sysman device handle using UUID
zes_device_handle_t phSysmanDevice;
ze_bool_t onSubdevice = false;
uint32_t subdeviceId = 0;
result = zesDriverGetDeviceByUuidExp(sysmanDriverHandle, uuid, &phSysmanDevice, &onSubdevice, &subdeviceId);
if (result != ZE_RESULT_SUCCESS) {
    printf("Error: zesDriverGetDeviceByUuidExp failed, result = %d\n", result);
    return nullptr;
}

//Step 5: Return the equivalent sysman device handle
return phSysmanDevice;

}`
The sample utility function should be used after successful zesInit() call.

@bgoglin
Copy link
Contributor

bgoglin commented Oct 24, 2024

A nice complexity also arrisse due to the fact that "zesInit + (Legacy mode) Or (Legacy mode) + zesInit" is not supported.

A user code who is using the "Legacy Mode" (aka calling zeInit + setting ZES_ENABLE_SYSMAN=1) and use the C API of hwlock, will need hwlock to not call "zesInit" to avoid a crash...

I don't get a crash in this case, I get ZE_RESULT_ERROR_UNINITIALIZED from zesInit() (tested with latest compute-runtime on my laptop, and with 24.26.30049.10-950 on the endeavour test cluster). However, I am thinking of just disabling the hwloc level-zero backend whenever I find ZES_ENABLE_SYSMAN=1 in the environment just in case.

@bgoglin
Copy link
Contributor

bgoglin commented Oct 24, 2024

please use below sample code snippet to map core device handle to sysman device handle and adapt/optimize for your environment. For simplicity sample assumes driver count is 1.

Is driver count always going to be 1 in practice? Intel CPU + GPU seem to use the same driver. As long as other GPU vendors don't implement their own ZE driver, we're good?

I don't see any way to match ZE and ZES drivers (I don't see any driver name, ZE reports a driver UUID, ZES doesn't). Does the device UUID somehow contain a driver-specific part that would prevent two drivers from giving the same UUID to two different devices?

@saik-intel
Copy link

A nice complexity also arrisse due to the fact that "zesInit + (Legacy mode) Or (Legacy mode) + zesInit" is not supported.
A user code who is using the "Legacy Mode" (aka calling zeInit + setting ZES_ENABLE_SYSMAN=1) and use the C API of hwlock, will need hwlock to not call "zesInit" to avoid a crash...

I don't get a crash in this case, I get ZE_RESULT_ERROR_UNINITIALIZED from zesInit() (tested with latest compute-runtime on my laptop, and with 24.26.30049.10-950 on the endeavour test cluster). However, I am thinking of just disabling the hwloc level-zero backend whenever I find ZES_ENABLE_SYSMAN=1 in the environment just in case.

if you use ZES_ENABLE_Sysman in environment variable and call zesInit(), it will return error as we want to support only one

@saik-intel
Copy link

A nice complexity also arrisse due to the fact that "zesInit + (Legacy mode) Or (Legacy mode) + zesInit" is not supported.

A user code who is using the "Legacy Mode" (aka calling zeInit + setting ZES_ENABLE_SYSMAN=1) and use the C API of hwlock, will need hwlock to not call "zesInit" to avoid a crash...

As two combination is not supported , use either one mode and i recommend to use only one and if we use LEGACY method, i recommend dont call zesInit()

@saik-intel
Copy link

please use below sample code snippet to map core device handle to sysman device handle and adapt/optimize for your environment. For simplicity sample assumes driver count is 1.

Is driver count always going to be 1 in practice? Intel CPU + GPU seem to use the same driver. As long as other GPU vendors don't implement their own ZE driver, we're good?
[Saikishore] Practically we are seeing 1 driver now but don't hardcode to 1 as given in example

I don't see any way to match ZE and ZES drivers (I don't see any driver name, ZE reports a driver UUID, ZES doesn't). Does the device UUID somehow contain a driver-specific part that would prevent two drivers from giving the same UUID to two different devices?
[SaiKishore] Both ze and zes both Methods reports UUID which is unique ID for device identifying and as per given in example from Ashwin of above will help to map core handle -> Sysman Handle based on UUID of device. At the end of the day you need casting core handle to sysman handle, why do you want to match ze and zes driver, i didn;t get your query ? @bgoglin

@TApplencourt
Copy link

if you use ZES_ENABLE_Sysman in environment variable and call zesInit(), it will return error as we want to support only one
I don't get a crash in this case, I get ZE_RESULT_ERROR_UNINITIALIZED from zesInit() (

You are right. My bad. (Error: zesInit failed, result = 2013265921) It's the cast of of ze / zes handle who lead to segfault.
But this is expected, and as library developers, we should just not rely on that.

Sorry for the noise!

@bgoglin
Copy link
Contributor

bgoglin commented Oct 24, 2024

[SaiKishore] Both ze and zes both Methods reports UUID which is unique ID for device identifying and as per given in example from Ashwin of above will help to map core handle -> Sysman Handle based on UUID of device. At the end of the day you need casting core handle to sysman handle, why do you want to match ze and zes driver, i didn;t get your query ?

@saik-intel The question is how to expand your sample code with multiple drivers. Let's say I have 2 drivers reporting 2 devices each. Am I sure that UUID of these 2*2 devices will all be different so that I can rely on only the device UUID to match ZES device with ZE device?

This really depends how device UUID are built internally. If all drivers build the device UUID as ++, no problem. If some dumb drivers just report UUID=0 for the first device and UUID=1 for the second, we'll get some UUID collisition, and UUID matching won't be unique unless we can also match the ZE driver with the ZES driver.

This question only stands if multiple drivers are going to exist in practice. If some academics start playing with the compute runtime and add another driver with a simple UUID build, the problem may arise. If both your driver and a custom driver build the UUID from the same PCI BDF, they may report the same UUID, problem too.

@saik-intel
Copy link

saik-intel commented Oct 25, 2024

[SaiKishore] Both ze and zes both Methods reports UUID which is unique ID for device identifying and as per given in example from Ashwin of above will help to map core handle -> Sysman Handle based on UUID of device. At the end of the day you need casting core handle to sysman handle, why do you want to match ze and zes driver, i didn;t get your query ?
[saik-Intel] : The L0 Driver supports for multiple drivers and we tested internally, i will ask @AshwinKumarKulkarni to update sample code for multiple driver. Irrespective of how many drivers exists ,you could always rely on UUID which is unique to device.

@saik-intel The question is how to expand your sample code with multiple drivers. Let's say I have 2 drivers reporting 2 devices each. Am I sure that UUID of these 2*2 devices will all be different so that I can rely on only the device UUID to match ZES device with ZE device?

This really depends how device UUID are built internally. If all drivers build the device UUID as ++, no problem. If some dumb drivers just report UUID=0 for the first device and UUID=1 for the second, we'll get some UUID collisition, and UUID matching won't be unique unless we can also match the ZE driver with the ZES driver.
[saik-intel]: UUID is not simple enum to do ++, it is internally HW generated and dumped into register which is exposed to UMD via PMT KMD (telemetry) driver. You can trust UUID which is unique across the devices on the system.

This question only stands if multiple drivers are going to exist in practice. If some academics start playing with the compute runtime and add another driver with a simple UUID build, the problem may arise. If both your driver and a custom driver build the UUID from the same PCI BDF, they may report the same UUID, problem too.
[saik-Intel]: You can use any number of drivers and Device UUID is unique across multiple devices of drivers.

@bgoglin
Copy link
Contributor

bgoglin commented Oct 25, 2024

@saik-intel Thanks, I have the clarification I need then, no need for @AshwinKumarKulkarni to update the code, just add a comment saying that UUID is guaranteed unique across multiple devices and drivers.

@saik-intel
Copy link

saik-intel commented Oct 25, 2024

@bgoglin yes UUID is guaranteed unique across multiple devices and drivers. Can you please close this issue, if it is clarified. Thanks

@bgoglin
Copy link
Contributor

bgoglin commented Oct 25, 2024

This issue will be closed once implemented in hwloc, which will take some days. The plan is to drop support for ZES_ENABLE_SYSMAN entirely and only support on zesInit(). Will publish that in hwloc 2.12.

@bgoglin
Copy link
Contributor

bgoglin commented Nov 8, 2024

FYI there's a candidate implementation in #695 and https://ci.inria.fr/hwloc/job/basic/view/change-requests/job/PR-695/ but I need to find platforms with PVC and a recent enough L0 runtime to really test things.

@TApplencourt
Copy link

TApplencourt commented Nov 8, 2024

I can have a try! Do you have a test-suite for me to run?

@bgoglin
Copy link
Contributor

bgoglin commented Nov 8, 2024

I can have a try! Do you have a test-suite for me to run?

That'd be great. Take the tarball from the ci.inria.fr link above, run ./configure && make && make check, and also run lstopo foo.xml and post the XML here so that I can verify if GPUs and subdevices were properly directed.

@TApplencourt
Copy link

TApplencourt commented Nov 8, 2024

I confirm I worked! Thanks a lot!

**** OpenCL configuration
checking for CL/cl_ext.h... yes
checking for clGetDeviceIDs in -lOpenCL... yes
**** end of OpenCL configuration

**** LevelZero configuration
checking for LEVELZERO... yes
checking for level_zero/zes_api.h... yes
checking for zesDriverGetDeviceByUuidExp... yes # tada
checking for final LEVELZERO support... yes
checking for zeDevicePciGetPropertiesExt in -lze_loader... yes
**** end of LevelZero configuration

make check give an error with infiniband (sorry, was not smart enough to find how to disable it during configure)

../../../tests/hwloc/openfabrics-verbs.c:11:10: fatal error: 'infiniband/verbs.h' file not found
   11 | #include <infiniband/verbs.h>
      |          ^~~~~~~~~~~~~~~~~~~~

But the xml output (renamed as .txt to please github) look good: aurora.xml.txt

The L0 device are present! Success.

It's nice that now the OpenCL are merged in the same PCI ( I think before they where in a separate box). In fact they are the same physical device (they should may / report the same UUID), I guess one possibility will be for the boxes to merged and named ze Coproc 5 / cl coproc . But not that important, and not at all related to this issue

@bgoglin
Copy link
Contributor

bgoglin commented Nov 8, 2024

Thanks! I think everything looks good. OpenCL works better than earlier because there's a new OpenCL extension that works for Intel GPUs too. Regarding InfiniBand, can you look the configure output if verbs.h was found? It's supposed to disable that specific test otherwise.

@TApplencourt
Copy link

Didn't found it indeed:

checking for infiniband/verbs.h... no

Strangely enough, I have it on my path

applenco@aurora-uan-0011:~/hwloc-PR-695-20241108.1320.git3cdfece6> whichinclude "infiniband/verbs.h"
/usr/include/infiniband/verbs.h

config.log:

configure:33558: checking for infiniband/verbs.h
configure:33558: icx -c -O2  conftest.c >&5
configure:33558: $? = 0
configure:33558: result: yes

config.h:

/* Define to 1 if you have the <infiniband/verbs.h> header file. */
#define HAVE_INFINIBAND_VERBS_H 1

@bgoglin
Copy link
Contributor

bgoglin commented Nov 13, 2024

I just merged #695 to close this. I'll release hwloc 2.12 soon with all these changes.

I tried to reproduce the unrelated IB issue from @TApplencourt above, without any success. Feel free to open another issue about it (I guess we'd want the entire configure output and config.log).

@bgoglin bgoglin closed this as completed Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants