Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent GPU Memory Usage Reporting Between dpctl and xpu-smi #1761

Open
avimanyu786 opened this issue Jul 26, 2024 · 14 comments
Open

Inconsistent GPU Memory Usage Reporting Between dpctl and xpu-smi #1761

avimanyu786 opened this issue Jul 26, 2024 · 14 comments

Comments

@avimanyu786
Copy link

avimanyu786 commented Jul 26, 2024

Description

When using dpctl to report GPU memory usage on Intel GPUs, the reported free and total memory values appear to be incorrect when compared to the output from xpu-smi. Specifically, dpctl reports 0 bytes of used memory, while xpu-smi correctly reports the used memory as 17 MiB.

Steps to Reproduce

  1. Set up an environment with dpctl and xpu-smi installed.
  2. Use the following Python script to get GPU memory information using dpctl:
import os
import dpctl
from dpctl.utils import intel_device_info

def get_intel_gpu_memory_info():
    try:
        # Set the environment variable ZES_ENABLE_SYSMAN to 1
        os.environ["ZES_ENABLE_SYSMAN"] = "1"
        
        # Get the list of GPU devices
        devices = dpctl.get_devices(device_type=dpctl.device_type.gpu)
        for device in devices:
            # Get Intel GPU device info
            device_info = intel_device_info(device)
            if device_info:
                free_memory = device_info.get('free_memory', None)
                if free_memory is not None:
                    free_memory_mib = free_memory / (1024 * 1024)
                    print(f"Free Memory: {free_memory_mib:.2f} MiB")

                # Get the total global memory size
                try:
                    global_mem_size = device.get_info(dpctl.device_info.global_mem_size)
                except AttributeError:
                    global_mem_size = device.global_mem_size

                global_mem_size_mib = global_mem_size / (1024 * 1024)
                print(f"Total Memory: {global_mem_size_mib:.2f} MiB")

                # Calculate and display used memory
                if free_memory is not None and global_mem_size is not None:
                    used_memory = global_mem_size - free_memory
                    used_memory_mib = used_memory / (1024 * 1024)
                    print(f"Used Memory: {used_memory_mib:.2f} MiB")
                else:
                    print("Unable to calculate used memory due to missing information.")

                return
        print("No Intel GPU devices found or no information available.")
    except Exception as e:
        print(f"An error occurred: {e}")

if __name__ == "__main__":
    get_intel_gpu_memory_info()
  1. Compare the output with the results of running xpu-smi stats -d 0:
xpu-smi stats -d 0

Observed Behavior

  • Output from the Python script using dpctl:
Free Memory: 15473.60 MiB
Total Memory: 15473.60 MiB
Used Memory: 0.00 MiB

Also, python -c "import torch; import intel_extension_for_pytorch as ipex; print(torch.__version__); print(ipex.__version__); [print(f'[{i}]: {torch.xpu.get_device_properties(i)}') for i in range(torch.xpu.device_count())];" shows the following output which matches the total memory:

2.1.0.post2+cxx11.abi
2.1.30+xpu
[0]: _DeviceProperties(name='Intel(R) Arc(TM) A770 Graphics', platform_name='Intel(R) Level-Zero', dev_type='gpu', driver_version='1.3.27642', has_fp64=0, total_memory=15473MB, max_compute_units=512, gpu_eu_count=512)
  • Output from xpu-smi:
+-----------------------------+--------------------------------------------------------------------+
| Device ID                   | 0                                                                  |
+-----------------------------+--------------------------------------------------------------------+
| GPU Memory Used (MiB)       | 17                                                                 |
| GPU Memory Util (%)         | 0                                                                  |
+-----------------------------+--------------------------------------------------------------------+

Expected Behavior

The used memory reported by dpctl should match the used memory reported by xpu-smi.

Environment

  • dpctl version: 0.17.0
  • xpu-smi version: 1.2.38.20240718
  • OS: HiveOS [Based on Ubuntu 20.04]
  • Docker version: 24.0.7, build 24.0.7-0ubuntu2~20.04.1
  • Docker image: intel/intel-extension-for-pytorch:2.1.30-xpu
  • Python version: 3.10.12
  • GPU: Intel(R) Arc(TM) A770 Graphics

Additional Information

Setting the environment variable ZES_ENABLE_SYSMAN to 1 was necessary as mentioned in the documentation, to report the free_memory. The discrepancy in reported values suggests a potential issue within the dpctl library or its interaction with the GPU drivers.

Further information on OS:

# uname -r
6.1.0-hiveos
# lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.6 LTS
Release:	20.04
Codename:	focal

The container was launched through the following command:

docker run -ti --cap-add=PERFMON --device /dev/dri intel/intel-extension-for-pytorch:2.1.30-xpu bash

The intel-basekit (provides the necessary SYCL runtime and development tools for dpctl) and xpu-smi packages were installed with the following commands before the testing the issue inside the container:

wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | gpg --yes --dearmor --output /usr/share/keyrings/intel-graphics.gpg \
wget https://github.com/intel/xpumanager/releases/download/V1.2.38/xpu-smi_1.2.38_20240718.060204.0db09695+deb10u1_amd64.deb \
apt update && apt install -y ./xpu-smi_1.2.38_20240718.060204.0db09695+deb10u1_amd64.deb intel-basekit \
source /opt/intel/oneapi/setvars.sh

Proposed Solution

Investigate and resolve the inconsistency in GPU memory reporting between dpctl and xpu-smi. Ensure that dpctl accurately reflects the actual GPU memory usage.


Thank you for looking into this issue. Please let me know if further information or testing is required.

@oleksandr-pavlyk
Copy link
Collaborator

@avimanyu786 Please provide information about your GPU driver, e.g., output of python -m dpctl -f.

@avimanyu786
Copy link
Author

@avimanyu786 Please provide information about your GPU driver, e.g., output of python -m dpctl -f.

Thank you for the assistance @oleksandr-pavlyk ! I will post the output as early as possible.

@oleksandr-pavlyk
Copy link
Collaborator

Actually, you have already provided the information @avimanyu786 : driver_version='1.3.27642'

@oleksandr-pavlyk
Copy link
Collaborator

While attempting to reproduce the reported behavior, I used xpu-smi on a machine where GPU is not utilities (Used xpu-smi ps to verify that only xpu-smi was using GPU), and the xpu-smi reported non-zero GPU memory footprint:

$ xpu-smi ps
PID       Command             DeviceID       SHR            MEM
883105    xpu-smi             0              0              2228

$ sudo xpu-smi stats -d 0
+-----------------------------+--------------------------------------------------------------------+
| Device ID                   | 0                                                                  |
+-----------------------------+--------------------------------------------------------------------+
| GPU Utilization (%)         | 0                                                                  |
| EU Array Active (%)         | N/A                                                                |
| EU Array Stall (%)          | N/A                                                                |
| EU Array Idle (%)           | N/A                                                                |
|                             |                                                                    |
| Compute Engine Util (%)     | 0; Engine 0: 0, Engine 1: 0, Engine 2: 0, Engine 3: 0              |
| Render Engine Util (%)      | N/A                                                                |
| Media Engine Util (%)       | N/A                                                                |
| Decoder Engine Util (%)     | N/A                                                                |
| Encoder Engine Util (%)     | N/A                                                                |
| Copy Engine Util (%)        | 0; Engine 0: 0, Engine 1: 0, Engine 2: 0, Engine 3: 0              |
|                             | Engine 4: 0, Engine 5: 0                                           |
| Media EM Engine Util (%)    | N/A                                                                |
| 3D Engine Util (%)          | N/A                                                                |
+-----------------------------+--------------------------------------------------------------------+
| Reset                       | N/A                                                                |
| Programming Errors          | N/A                                                                |
| Driver Errors               | N/A                                                                |
| Cache Errors Correctable    | N/A                                                                |
| Cache Errors Uncorrectable  | N/A                                                                |
| Mem Errors Correctable      | N/A                                                                |
| Mem Errors Uncorrectable    | N/A                                                                |
+-----------------------------+--------------------------------------------------------------------+
| GPU Power (W)               | 31                                                                 |
| GPU Frequency (MHz)         | 1550                                                               |
| Media Engine Freq (MHz)     | N/A                                                                |
| GPU Core Temperature (C)    | N/A                                                                |
| GPU Memory Temperature (C)  | N/A                                                                |
| GPU Memory Read (kB/s)      | N/A                                                                |
| GPU Memory Write (kB/s)     | N/A                                                                |
| GPU Memory Bandwidth (%)    | N/A                                                                |
| GPU Memory Used (MiB)       | 28                                                                 |
| GPU Memory Util (%)         | 0                                                                  |
| Xe Link Throughput (kB/s)   | N/A                                                                |
+-----------------------------+--------------------------------------------------------------------+

So I assume an explanation for the discrepancy is that xpu-smi itself uses some amount of GPU global memory.

@avimanyu786
Copy link
Author

I see. In that case, perhaps we could use a different approach. We could try to use the watch command with xpu-smi and then recheck with dpctl from a different terminal?

@oleksandr-pavlyk
Copy link
Collaborator

Just to confirm, in one terminal I started dpctl (and set ZES_ENABLE_SYSMAN=1) and executed:

In [1]: import dpctl.tensor as dpt

In [2]: x = dpt.empty(2**26, dtype="i8")

In [3]: y = dpt.empty(2**26, dtype="i8")

In [4]: (y.nbytes + x.nbytes) / (1024 * 1024 * 1024)
Out[4]: 1.0

In [5]: import dpctl.utils as du

In [6]: du.intel_device_info(x.sycl_device)
Out[6]:
{'device_id': 3034,
 'gpu_eu_count': 448,
 'gpu_hw_threads_per_eu': 8,
 'gpu_eu_simd_width': 16,
 'gpu_slices': 1,
 'gpu_subslices_per_slice': 56,
 'gpu_eu_count_per_subslice': 8,
 'free_memory': 50417606656,
 'memory_clock_rate': 3200,
 'memory_bus_width': 64}

In [7]: x.sycl_device.global_mem_size - Out[6]['free_memory']
Out[7]: 1122000896

In [8]: (y.nbytes + x.nbytes)
Out[8]: 1073741824

In another terminal I executed sudo xpu-smi stats -d 0 which showed:

| GPU Memory Used (MiB)       | 1055                                                               |
| GPU Memory Util (%)         | 2                                                                  |

and xpu-smi ps showed:

$ xpu-smi ps
PID       Command             DeviceID       SHR            MEM
885596    ipython             0              0              1081212
885843    xpu-smi             0              0              2228

These figures are kind of consistent, accounting for some GPU global memory used by the driver.

@avimanyu786
Copy link
Author

@oleksandr-pavlyk Thanks so much for these confirmations! I will test when I have access to my machine from my end and report back again if I face any issue.

@avimanyu786
Copy link
Author

avimanyu786 commented Jul 29, 2024

Hi @oleksandr-pavlyk,

I have conducted the tests (this time directly on the host) as suggested and observed the following results.

Terminal 1

(dpctl_env) root@Rig3073250:/home/user# export ZES_ENABLE_SYSMAN=1
(dpctl_env) root@Rig3073250:/home/user# python3.10
Python 3.10.14 (main, Apr  6 2024, 18:45:05) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import dpctl.tensor as dpt
>>> x = dpt.empty(2**26, dtype="i8")
>>> y = dpt.empty(2**26, dtype="i8")
>>> (y.nbytes + x.nbytes) / (1024 * 1024 * 1024)
1.0
>>> import dpctl.utils as du
>>> device_info = du.intel_device_info(x.sycl_device)
>>> print(device_info)
{'device_id': 22176, 'gpu_eu_count': 512, 'gpu_hw_threads_per_eu': 8, 'gpu_eu_simd_width': 8, 'gpu_slices': 1, 'gpu_subslices_per_slice': 32, 'gpu_eu_count_per_subslice': 16, 'free_memory': 16225243136, 'memory_bus_width': 64}
>>> used_memory = x.sycl_device.global_mem_size - device_info['free_memory']
>>> print(used_memory)
0
>>> (y.nbytes + x.nbytes)
1073741824

Terminal 2

root@Rig3073250:/home/user# xpu-smi stats -d 0
+-----------------------------+--------------------------------------------------------------------+
| Device ID                   | 0                                                                  |
+-----------------------------+--------------------------------------------------------------------+
| GPU Utilization (%)         | N/A                                                                |
| EU Array Active (%)         | N/A                                                                |
| EU Array Stall (%)          | N/A                                                                |
| EU Array Idle (%)           | N/A                                                                |
|                             |                                                                    |
| Compute Engine Util (%)     | Engine 0: 0, Engine 1: 0, Engine 2: 0, Engine 3: 0                 |
| Render Engine Util (%)      | Engine 0: 0                                                        |
| Media Engine Util (%)       | N/A                                                                |
| Decoder Engine Util (%)     | Engine 0: 0, Engine 1: 0                                           |
| Encoder Engine Util (%)     | Engine 0: 0, Engine 1: 0                                           |
| Copy Engine Util (%)        | Engine 0: 0                                                        |
| Media EM Engine Util (%)    | Engine 0: 0, Engine 1: 0                                           |
| 3D Engine Util (%)          | N/A                                                                |
+-----------------------------+--------------------------------------------------------------------+
| Reset                       | N/A                                                                |
| Programming Errors          | N/A                                                                |
| Driver Errors               | N/A                                                                |
| Cache Errors Correctable    | N/A                                                                |
| Cache Errors Uncorrectable  | N/A                                                                |
| Mem Errors Correctable      | N/A                                                                |
| Mem Errors Uncorrectable    | N/A                                                                |
+-----------------------------+--------------------------------------------------------------------+
| GPU Power (W)               | 38                                                                 |
| GPU Frequency (MHz)         | 1000                                                               |
| Media Engine Freq (MHz)     | N/A                                                                |
| GPU Core Temperature (C)    | N/A                                                                |
| GPU Memory Temperature (C)  | N/A                                                                |
| GPU Memory Read (kB/s)      | N/A                                                                |
| GPU Memory Write (kB/s)     | N/A                                                                |
| GPU Memory Bandwidth (%)    | N/A                                                                |
| GPU Memory Used (MiB)       | 18                                                                 |
| GPU Memory Util (%)         | 0                                                                  |
| Xe Link Throughput (kB/s)   | N/A                                                                |
+-----------------------------+--------------------------------------------------------------------+
root@Rig3073250:/home/user# xpu-smi ps
PID       Command             DeviceID       SHR            MEM            

Summary

  • dpctl Output:

    • free_memory: 16225243136 bytes (15.1 GiB)
    • used_memory: 0 bytes
    • total_memory: 15.1 GiB (implied from global_mem_size)
  • xpu-smi stats -d 0 Output:

    • GPU Memory Used (MiB): 18 MiB
  • xpu-smi ps Output: Empty

Additional Observation

Upon exiting the Python console after the (y.nbytes + x.nbytes) step, the xpu-smi GPU memory usage drops to 17 MiB.

Despite allocating memory in the Python script using dpctl.tensor, the used_memory reported by dpctl is 0 bytes, which is inconsistent with the xpu-smi output, which is also showing only 18 MiB of GPU memory usage - just a 1 MiB difference. It seems there are issues with xpu-smi as well on my machine.

UPDATE

Further investigated with PyOpenCL:

import pyopencl as cl
import numpy as np
import time

# Create OpenCL context and queue
platforms = cl.get_platforms()
gpu_devices = [d for p in platforms for d in p.get_devices(device_type=cl.device_type.GPU)]
if not gpu_devices:
    print("No GPU devices found.")
    exit()

device = gpu_devices[0]
context = cl.Context([device])
queue = cl.CommandQueue(context)

# Allocate memory on the GPU
buffer_size = 2**26  # 64 MiB per buffer

mf = cl.mem_flags
buffer1 = cl.Buffer(context, mf.READ_WRITE, size=buffer_size)
buffer2 = cl.Buffer(context, mf.READ_WRITE, size=buffer_size)

allocated_memory_mib = (buffer_size * 2) / (1024 * 1024)
print(f"Allocated memory: {allocated_memory_mib:.2f} MiB")

# Initialize data to write to buffers
host_data = np.random.rand(buffer_size // 4).astype(np.float32)

# Write data to the buffers
cl.enqueue_copy(queue, buffer1, host_data)
cl.enqueue_copy(queue, buffer2, host_data)
queue.finish()

# Pause for 30 seconds to allow xpu-smi observation
print("Memory allocated. Pausing for 30 seconds for observation with xpu-smi...")
time.sleep(30)

# Perform a simple computation to ensure buffers are used
program_src = """
__kernel void add(__global const float *a, __global const float *b, __global float *c) {
    int gid = get_global_id(0);
    c[gid] = a[gid] + b[gid];
}
"""
program = cl.Program(context, program_src).build()
result_buffer = cl.Buffer(context, mf.WRITE_ONLY, size=buffer_size)
program.add(queue, host_data.shape, None, buffer1, buffer2, result_buffer)
queue.finish()

print("Performed computation on the GPU.")

Output:

Allocated memory: 128.00 MiB
Memory allocated. Pausing for 30 seconds for observation with xpu-smi...
Performed computation on the GPU.

xpu-smi stats -d 0 output:

| GPU Memory Used (MiB)       | 146                                                                |
| GPU Memory Util (%)         | 1   

So at this point xpu-smi seems to be working correctly. When I tried to check with dpctl while the pyopencl program was running, I'm still getting the same output:

Free Memory: 16225243136 bytes
Total Memory: 16225243136 bytes
Used Memory: 0 bytes

Environment

  • dpctl version: 0.17.0
  • xpu-smi version: 1.2.38.20240718
  • OS: HiveOS [Based on Ubuntu 20.04]
  • Python version: 3.10.14
  • GPU: Intel(R) Arc(TM) A770 Graphics
  • GPU driver version: 1.3.27642

Please let me know if any further information or testing is required.

@avimanyu786
Copy link
Author

avimanyu786 commented Jul 29, 2024

From the overall testing on my machine so far, it looks like that both on host and on docker, the free_memory key from the dpctl.utils.intel_device_info(sycl_device) dictionary, is reporting the same value as dpctl.device_info.global_mem_size even though the memory is being consumed on the Intel Arc GPU.

@oleksandr-pavlyk
Copy link
Collaborator

@avimanyu786 The dpctl uses this feature of DPC++ https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/supported/sycl_ext_intel_device_info.md#free-global-memory

Could you compile the following C++ executable and check whether its output is consistent with output of dpctl:

// icpx -fsycl mem.cpp -o mem.x
#include <iostream>
#include <vector>
#include <string>
#include <sycl/sycl.hpp>

int main(void) {
    sycl::queue q{sycl::default_selector_v};

    const sycl::device &dev = q.get_device();
    const std::string &dev_name = dev.get_info<sycl::info::device::name>();
    const std::string &driver_ver = dev.get_info<sycl::info::device::driver_version>();

    std::cout << "Device: " << dev_name << " ["  << driver_ver << "]" << std::endl;

    auto global_mem_size = dev.get_info<sycl::info::device::global_mem_size>();

    std::cout << "Global device memory size: " << global_mem_size << " bytes" << std::endl;

    if (dev.has(sycl::aspect::ext_intel_free_memory)) {
         auto free_memory = dev.get_info<sycl::ext::intel::info::device::free_memory>();
         std::cout << "Free memory: " << free_memory << " bytes" << std::endl;
         std::cout << "Implied memory in use: " << global_mem_size - free_memory << " bytes" << std::endl;
    } else {
        std::cout << "Free memory descriptor is not available" << std::endl;
    }

    return 0;
}

Once compiled, execute as $ ZES_ENABLE_SYSMAN=1 ./mem.x. This is the output I observe when no processes other than mem access GPU:

$ ZES_ENABLE_SYSMAN=1 ./mem
Device: Intel(R) Data Center GPU Max 1100 [7.66.28691]
Global device memory size: 51539607552 bytes
Free memory: 51492134912 bytes
Implied memory in use: 47472640 bytes

This is what I observe when Python script allocating x and y from earlier discussion allocate 1Gb:

$ ZES_ENABLE_SYSMAN=1 ./mem
Device: Intel(R) Data Center GPU Max 1100 [7.66.28691]
Global device memory size: 51539607552 bytes
Free memory: 50416521216 bytes
Implied memory in use: 1123086336 bytes

If the native application also reports the same value, as dpctl does, try upgrading GPU drivers (https://dgpu-docs.intel.com/driver/installation.html)

@avimanyu786
Copy link
Author

avimanyu786 commented Jul 30, 2024

Hi @oleksandr-pavlyk ,

Based on the above suggestion, I faced the same issue on the host machine with icpx (both when idle and after tensor allocation in Python):

Device: Intel(R) Arc(TM) A770 Graphics [1.3.27642]
Global device memory size: 16225243136 bytes
Free memory: 16225243136 bytes
Implied memory in use: 0 bytes

After switching to HiveOS based on Ubuntu 22.04, I'm facing the same issue, even after upgrading the driver from 1.3.27642 to 1.3.29735:

Device: Intel(R) Arc(TM) A770 Graphics [1.3.29735]
Global device memory size: 16225243136 bytes
Free memory: 16225243136 bytes
Implied memory in use: 0 bytes

To update the driver on the host, I followed the client GPU documentation for Intel Arc.

Summary

  • Operating System: HiveOS (Ubuntu 20.04 and Ubuntu 22.04)
  • GPU: Intel(R) Arc(TM) A770 Graphics
  • Driver Versions Tested:
    • 1.3.27642
    • 1.3.29735
  • Intel BaseKit: Installed and used icpx based on Intel BaseKit documentation for apt.
  • Python: 3.10.14
  • dpctl: 0.17.0

The output for Intel Data Center GPU Max 1100 shows a different driver version 7.66.28691. Possibly, this driver might include features or fixes that are not present in the driver versions available for the Intel Arc A770, which could explain the discrepancy in reported free memory. I'll wait for your further feedback. Thanks.

@oleksandr-pavlyk
Copy link
Collaborator

It may be that a discrepancy is indeed explained by the driver. In that case one should file an issue with https://github.com/intel/compute-runtime and provide this C++ reproducer, driver version, OS version and the compiler version.

I do not think the behavior you are witnessing is caused by an issue with Python, as you have confirmed by running a stand-alone executable compiled from C++ code.

@avimanyu786
Copy link
Author

Many many thanks @oleksandr-pavlyk for following up on this issue! I have filed the corresponding issue in the compute runtime repository: intel/compute-runtime#750

@avimanyu786
Copy link
Author

For more added context, there is a python file called check_xpu_smi.py in the https://github.com/intel/xpumanager repository that fetches the value of XPUM_STATS_MEMORY_USED to report the used GPU memory. I found this when I searched for "GPU Memory Used" in that repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants