-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent GPU Memory Usage Reporting Between dpctl
and xpu-smi
#1761
Comments
@avimanyu786 Please provide information about your GPU driver, e.g., output of |
Thank you for the assistance @oleksandr-pavlyk ! I will post the output as early as possible. |
Actually, you have already provided the information @avimanyu786 : |
While attempting to reproduce the reported behavior, I used
So I assume an explanation for the discrepancy is that |
I see. In that case, perhaps we could use a different approach. We could try to use the watch command with xpu-smi and then recheck with dpctl from a different terminal? |
Just to confirm, in one terminal I started
In another terminal I executed
and
These figures are kind of consistent, accounting for some GPU global memory used by the driver. |
@oleksandr-pavlyk Thanks so much for these confirmations! I will test when I have access to my machine from my end and report back again if I face any issue. |
I have conducted the tests (this time directly on the host) as suggested and observed the following results. Terminal 1(dpctl_env) root@Rig3073250:/home/user# export ZES_ENABLE_SYSMAN=1
(dpctl_env) root@Rig3073250:/home/user# python3.10
Python 3.10.14 (main, Apr 6 2024, 18:45:05) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import dpctl.tensor as dpt
>>> x = dpt.empty(2**26, dtype="i8")
>>> y = dpt.empty(2**26, dtype="i8")
>>> (y.nbytes + x.nbytes) / (1024 * 1024 * 1024)
1.0
>>> import dpctl.utils as du
>>> device_info = du.intel_device_info(x.sycl_device)
>>> print(device_info)
{'device_id': 22176, 'gpu_eu_count': 512, 'gpu_hw_threads_per_eu': 8, 'gpu_eu_simd_width': 8, 'gpu_slices': 1, 'gpu_subslices_per_slice': 32, 'gpu_eu_count_per_subslice': 16, 'free_memory': 16225243136, 'memory_bus_width': 64}
>>> used_memory = x.sycl_device.global_mem_size - device_info['free_memory']
>>> print(used_memory)
0
>>> (y.nbytes + x.nbytes)
1073741824 Terminal 2root@Rig3073250:/home/user# xpu-smi stats -d 0
+-----------------------------+--------------------------------------------------------------------+
| Device ID | 0 |
+-----------------------------+--------------------------------------------------------------------+
| GPU Utilization (%) | N/A |
| EU Array Active (%) | N/A |
| EU Array Stall (%) | N/A |
| EU Array Idle (%) | N/A |
| | |
| Compute Engine Util (%) | Engine 0: 0, Engine 1: 0, Engine 2: 0, Engine 3: 0 |
| Render Engine Util (%) | Engine 0: 0 |
| Media Engine Util (%) | N/A |
| Decoder Engine Util (%) | Engine 0: 0, Engine 1: 0 |
| Encoder Engine Util (%) | Engine 0: 0, Engine 1: 0 |
| Copy Engine Util (%) | Engine 0: 0 |
| Media EM Engine Util (%) | Engine 0: 0, Engine 1: 0 |
| 3D Engine Util (%) | N/A |
+-----------------------------+--------------------------------------------------------------------+
| Reset | N/A |
| Programming Errors | N/A |
| Driver Errors | N/A |
| Cache Errors Correctable | N/A |
| Cache Errors Uncorrectable | N/A |
| Mem Errors Correctable | N/A |
| Mem Errors Uncorrectable | N/A |
+-----------------------------+--------------------------------------------------------------------+
| GPU Power (W) | 38 |
| GPU Frequency (MHz) | 1000 |
| Media Engine Freq (MHz) | N/A |
| GPU Core Temperature (C) | N/A |
| GPU Memory Temperature (C) | N/A |
| GPU Memory Read (kB/s) | N/A |
| GPU Memory Write (kB/s) | N/A |
| GPU Memory Bandwidth (%) | N/A |
| GPU Memory Used (MiB) | 18 |
| GPU Memory Util (%) | 0 |
| Xe Link Throughput (kB/s) | N/A |
+-----------------------------+--------------------------------------------------------------------+
root@Rig3073250:/home/user# xpu-smi ps
PID Command DeviceID SHR MEM Summary
Additional ObservationUpon exiting the Python console after the Despite allocating memory in the Python script using UPDATEFurther investigated with PyOpenCL: import pyopencl as cl
import numpy as np
import time
# Create OpenCL context and queue
platforms = cl.get_platforms()
gpu_devices = [d for p in platforms for d in p.get_devices(device_type=cl.device_type.GPU)]
if not gpu_devices:
print("No GPU devices found.")
exit()
device = gpu_devices[0]
context = cl.Context([device])
queue = cl.CommandQueue(context)
# Allocate memory on the GPU
buffer_size = 2**26 # 64 MiB per buffer
mf = cl.mem_flags
buffer1 = cl.Buffer(context, mf.READ_WRITE, size=buffer_size)
buffer2 = cl.Buffer(context, mf.READ_WRITE, size=buffer_size)
allocated_memory_mib = (buffer_size * 2) / (1024 * 1024)
print(f"Allocated memory: {allocated_memory_mib:.2f} MiB")
# Initialize data to write to buffers
host_data = np.random.rand(buffer_size // 4).astype(np.float32)
# Write data to the buffers
cl.enqueue_copy(queue, buffer1, host_data)
cl.enqueue_copy(queue, buffer2, host_data)
queue.finish()
# Pause for 30 seconds to allow xpu-smi observation
print("Memory allocated. Pausing for 30 seconds for observation with xpu-smi...")
time.sleep(30)
# Perform a simple computation to ensure buffers are used
program_src = """
__kernel void add(__global const float *a, __global const float *b, __global float *c) {
int gid = get_global_id(0);
c[gid] = a[gid] + b[gid];
}
"""
program = cl.Program(context, program_src).build()
result_buffer = cl.Buffer(context, mf.WRITE_ONLY, size=buffer_size)
program.add(queue, host_data.shape, None, buffer1, buffer2, result_buffer)
queue.finish()
print("Performed computation on the GPU.") Output:
So at this point xpu-smi seems to be working correctly. When I tried to check with dpctl while the pyopencl program was running, I'm still getting the same output:
Environment
Please let me know if any further information or testing is required. |
From the overall testing on my machine so far, it looks like that both on host and on docker, the |
@avimanyu786 The Could you compile the following C++ executable and check whether its output is consistent with output of // icpx -fsycl mem.cpp -o mem.x
#include <iostream>
#include <vector>
#include <string>
#include <sycl/sycl.hpp>
int main(void) {
sycl::queue q{sycl::default_selector_v};
const sycl::device &dev = q.get_device();
const std::string &dev_name = dev.get_info<sycl::info::device::name>();
const std::string &driver_ver = dev.get_info<sycl::info::device::driver_version>();
std::cout << "Device: " << dev_name << " [" << driver_ver << "]" << std::endl;
auto global_mem_size = dev.get_info<sycl::info::device::global_mem_size>();
std::cout << "Global device memory size: " << global_mem_size << " bytes" << std::endl;
if (dev.has(sycl::aspect::ext_intel_free_memory)) {
auto free_memory = dev.get_info<sycl::ext::intel::info::device::free_memory>();
std::cout << "Free memory: " << free_memory << " bytes" << std::endl;
std::cout << "Implied memory in use: " << global_mem_size - free_memory << " bytes" << std::endl;
} else {
std::cout << "Free memory descriptor is not available" << std::endl;
}
return 0;
} Once compiled, execute as
This is what I observe when Python script allocating $ ZES_ENABLE_SYSMAN=1 ./mem
Device: Intel(R) Data Center GPU Max 1100 [7.66.28691]
Global device memory size: 51539607552 bytes
Free memory: 50416521216 bytes
Implied memory in use: 1123086336 bytes If the native application also reports the same value, as |
Hi @oleksandr-pavlyk , Based on the above suggestion, I faced the same issue on the host machine with Device: Intel(R) Arc(TM) A770 Graphics [1.3.27642]
Global device memory size: 16225243136 bytes
Free memory: 16225243136 bytes
Implied memory in use: 0 bytes After switching to HiveOS based on Ubuntu 22.04, I'm facing the same issue, even after upgrading the driver from 1.3.27642 to 1.3.29735: Device: Intel(R) Arc(TM) A770 Graphics [1.3.29735]
Global device memory size: 16225243136 bytes
Free memory: 16225243136 bytes
Implied memory in use: 0 bytes To update the driver on the host, I followed the client GPU documentation for Intel Arc. Summary
The output for Intel Data Center GPU Max 1100 shows a different driver version 7.66.28691. Possibly, this driver might include features or fixes that are not present in the driver versions available for the Intel Arc A770, which could explain the discrepancy in reported free memory. I'll wait for your further feedback. Thanks. |
It may be that a discrepancy is indeed explained by the driver. In that case one should file an issue with https://github.com/intel/compute-runtime and provide this C++ reproducer, driver version, OS version and the compiler version. I do not think the behavior you are witnessing is caused by an issue with Python, as you have confirmed by running a stand-alone executable compiled from C++ code. |
Many many thanks @oleksandr-pavlyk for following up on this issue! I have filed the corresponding issue in the compute runtime repository: intel/compute-runtime#750 |
For more added context, there is a python file called |
Description
When using
dpctl
to report GPU memory usage on Intel GPUs, the reported free and total memory values appear to be incorrect when compared to the output fromxpu-smi
. Specifically,dpctl
reports 0 bytes of used memory, whilexpu-smi
correctly reports the used memory as 17 MiB.Steps to Reproduce
dpctl
andxpu-smi
installed.dpctl
:xpu-smi stats -d 0
:Observed Behavior
dpctl
:Also,
python -c "import torch; import intel_extension_for_pytorch as ipex; print(torch.__version__); print(ipex.__version__); [print(f'[{i}]: {torch.xpu.get_device_properties(i)}') for i in range(torch.xpu.device_count())];"
shows the following output which matches the total memory:xpu-smi
:Expected Behavior
The used memory reported by
dpctl
should match the used memory reported byxpu-smi
.Environment
Additional Information
Setting the environment variable
ZES_ENABLE_SYSMAN
to1
was necessary as mentioned in the documentation, to report thefree_memory
. The discrepancy in reported values suggests a potential issue within thedpctl
library or its interaction with the GPU drivers.Further information on OS:
The container was launched through the following command:
docker run -ti --cap-add=PERFMON --device /dev/dri intel/intel-extension-for-pytorch:2.1.30-xpu bash
The
intel-basekit
(provides the necessary SYCL runtime and development tools for dpctl) andxpu-smi
packages were installed with the following commands before the testing the issue inside the container:Proposed Solution
Investigate and resolve the inconsistency in GPU memory reporting between
dpctl
andxpu-smi
. Ensure thatdpctl
accurately reflects the actual GPU memory usage.Thank you for looking into this issue. Please let me know if further information or testing is required.
The text was updated successfully, but these errors were encountered: