You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are new RHEL 9.2 based GPU drivers to provision Intel GPU Flex and Max Series. Good news: the new drivers now do not have an incompatibility with ast driver. On RHEL 8.6 based OCP 4.12, ast driver needed to be unloaded or blacklisted (via machine config which triggers reboot) prior to loading out of tree GPU drivers.
Challenges:
In-tree i915 and intel_vsec drivers have to unloaded prior to loading of out of tree drivers. KMM can only unload one in-tree driver as of now. Now, it is found that we have a use case for unloading more than one in-tree driver. Short term potential solution: unload intel_vsec outside of KMM most likely using machine config.
Once the out of tree drivers are loaded, it is observed that unloading the drivers is difficult as they are always in use by GUI subcomponent i.e. framebuffer. The exact root cause is not determined but once the out of tree drivers are loaded, the GPU is actively used by a component in the system that prevents it from being unloaded. More exploration needed due to complexity to find root cause. lsof command was used to determine what was using the driver but did not provide any additional information.
Details:
2 components have changed:
New GPU drivers/FW for RHEL 9.2
New kernel for RHEL 9.2
KMM has a feature available on version 1.1.1 that can be used to unload 1 in-tree driver.
We can use this feature to unload in-tree i915. We cannot unload more than one kmod. We now have a use case to unload more than 1 in-tree driver. This includes i915 and intel_vsec for now and potentially cse in future.
3 Main Drivers for GPU: i915, intel_vsec (this is a prerequisite for i915), CSE (MEI)
Out of tree drivers behavior: Loading i915 driver will load the intel_vsec driver. Unloading i915 will unload intel_vsec.
In-tree driver behavior: Loading i915 does not load intel_vsec. Unloading i915 does not unload intel_vsec.
RHEL 9.2 OCP 4.13 has a new kernel based on 5.14.z upstream kernel. This is a huge jump from RHEL 8.6 based OCP 4.12 which used 4.18.z upstream kernel.
Initial smoke test analysis and Observed Impact:
There is an i915 and intel_vsec in-tree driver in RHEL 9.2 (not loaded by default, it is only loaded by kernel when it detects the GPU card via PCI device ID). These above 2 in-tree drivers do not support Intel GPU Flex or Max series. The in-tree i915 driver provides display support functionality for Intel Client Arc GPUs. As a result, customers will notice on dmesg the following message:
sh-5.1# dmesg | grep graphics
[ 12.385679] i915 0000:33:00.0: Your graphics device 56c0 is not properly supported by the driver in this
[ 478.732896] i915 0000:33:00.0: Your graphics device 56c0 is not properly supported by the driver in this
Intel® Data Center GPU Flex 170 -> PCI ID is 56c0.
Observation 1:
If in-tree intel_vsec is not unloaded prior to loading out of tree i915 driver, then unknown symbol errors observed in dmesg.
When we unload the in-tree intel_vsec driver and do nothing else different, the above issue is not observed.
Observation 2:
When you delete the KMM module CR, it unloads the out of tree i915 driver via a PreStop Hook, but it does not reload the in-tree i915 driver. This is by KMM design. Essentially, the kernel is tainted. When KMM tries to clean up, it is unable to unload the out of tree i915 driver as it says it is in use.
We are also unable to manually unload the out of tree i915 or intel_vsec driver.
sh-5.1# modprobe -rv intel_vsec
modprobe: FATAL: Module intel_vsec is in use.
sh-5.1# modprobe -rv i915
modprobe: FATAL: Module i915 is in use.
lsmod output after out of tree drivers loaded, keep an eye on the resource counts which is the 3rd column.
Summary
There are new RHEL 9.2 based GPU drivers to provision Intel GPU Flex and Max Series. Good news: the new drivers now do not have an incompatibility with ast driver. On RHEL 8.6 based OCP 4.12, ast driver needed to be unloaded or blacklisted (via machine config which triggers reboot) prior to loading out of tree GPU drivers.
Challenges:
In-tree i915 and intel_vsec drivers have to unloaded prior to loading of out of tree drivers. KMM can only unload one in-tree driver as of now. Now, it is found that we have a use case for unloading more than one in-tree driver. Short term potential solution: unload intel_vsec outside of KMM most likely using machine config.
Once the out of tree drivers are loaded, it is observed that unloading the drivers is difficult as they are always in use by GUI subcomponent i.e. framebuffer. The exact root cause is not determined but once the out of tree drivers are loaded, the GPU is actively used by a component in the system that prevents it from being unloaded. More exploration needed due to complexity to find root cause.
lsof
command was used to determine what was using the driver but did not provide any additional information.Details:
2 components have changed:
KMM has a feature available on version 1.1.1 that can be used to unload 1 in-tree driver.
We can use this feature to unload in-tree i915. We cannot unload more than one kmod. We now have a use case to unload more than 1 in-tree driver. This includes i915 and intel_vsec for now and potentially cse in future.
3 Main Drivers for GPU: i915, intel_vsec (this is a prerequisite for i915), CSE (MEI)
Out of tree drivers behavior: Loading i915 driver will load the intel_vsec driver. Unloading i915 will unload intel_vsec.
In-tree driver behavior: Loading i915 does not load intel_vsec. Unloading i915 does not unload intel_vsec.
RHEL 9.2 OCP 4.13 has a new kernel based on 5.14.z upstream kernel. This is a huge jump from RHEL 8.6 based OCP 4.12 which used 4.18.z upstream kernel.
Initial smoke test analysis and Observed Impact:
There is an i915 and intel_vsec in-tree driver in RHEL 9.2 (not loaded by default, it is only loaded by kernel when it detects the GPU card via PCI device ID). These above 2 in-tree drivers do not support Intel GPU Flex or Max series. The in-tree i915 driver provides display support functionality for Intel Client Arc GPUs. As a result, customers will notice on dmesg the following message:
Observation 1:
If in-tree intel_vsec is not unloaded prior to loading out of tree i915 driver, then unknown symbol errors observed in dmesg.
When we unload the in-tree intel_vsec driver and do nothing else different, the above issue is not observed.
Observation 2:
When you delete the KMM module CR, it unloads the out of tree i915 driver via a PreStop Hook, but it does not reload the in-tree i915 driver. This is by KMM design. Essentially, the kernel is tainted. When KMM tries to clean up, it is unable to unload the out of tree i915 driver as it says it is in use.
We are also unable to manually unload the out of tree i915 or intel_vsec driver.
lsmod output after out of tree drivers loaded, keep an eye on the resource counts which is the 3rd column.
It has been noted to document a dependency list diagram for out of tree GPU drivers as a future exercise.
The text was updated successfully, but these errors were encountered: