Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tasks hanging on 6.12.0-4.el9ueknext.x86_64 #32

Open
danielnorberg opened this issue Jan 23, 2025 · 2 comments
Open

Tasks hanging on 6.12.0-4.el9ueknext.x86_64 #32

danielnorberg opened this issue Jan 23, 2025 · 2 comments

Comments

@danielnorberg
Copy link

Thank you for UEK-next, it is very useful.

We recently tried 6.12.0-4.el9ueknext.x86_64 and had issues with processes hanging. Some example kernel logs:

2025-01-16T17:24:58.365Z INFO: task khugepaged:931 blocked for more than 1228 seconds.
2025-01-16T17:24:58.365Z Tainted: P OE 6.12.0-4.el9ueknext.x86_64 #1
2025-01-16T17:24:58.374Z ""echo 0 > /proc/sys/kernel/hung_task_timeout_secs"" disables this message.
2025-01-16T17:24:58.384Z task:khugepaged state:D stack:0 pid:931 tgid:931 ppid:2 flags:0x00004002
2025-01-16T17:24:58.387Z Call Trace:
2025-01-16T17:24:58.390Z <TASK>
2025-01-16T17:24:58.394Z __schedule+0x266/0x720
2025-01-16T17:24:58.397Z schedule+0x27/0xa0
2025-01-16T17:24:58.403Z schedule_preempt_disabled+0x15/0x30
2025-01-16T17:24:58.408Z rwsem_down_write_slowpath+0x1d3/0x4e0
2025-01-16T17:24:58.412Z down_write+0x6a/0x70
2025-01-16T17:24:58.417Z collapse_huge_page+0x26d/0x7d0
2025-01-16T17:24:58.422Z hpage_collapse_scan_pmd+0x62b/0x750
2025-01-16T17:24:58.429Z khugepaged_scan_mm_slot.constprop.0+0x3c6/0x580
2025-01-16T17:24:58.432Z khugepaged+0xce/0x200
2025-01-16T17:24:58.437Z ? __pfx_khugepaged+0x10/0x10
2025-01-16T17:24:58.441Z kthread+0xcf/0x100
2025-01-16T17:24:58.445Z ? __pfx_kthread+0x10/0x10
2025-01-16T17:24:58.449Z ret_from_fork+0x31/0x50
2025-01-16T17:24:58.454Z ? __pfx_kthread+0x10/0x10
2025-01-16T17:24:58.458Z ret_from_fork_asm+0x1a/0x30
2025-01-16T17:24:58.461Z </TASK>
2025-01-16T17:38:19.036Z INFO: task tokio-runtime-w:27320 blocked for more than 1228 seconds.
2025-01-16T17:38:19.036Z Tainted: P OE 6.12.0-4.el9ueknext.x86_64 #1
2025-01-16T17:38:19.045Z ""echo 0 > /proc/sys/kernel/hung_task_timeout_secs"" disables this message.
2025-01-16T17:38:19.055Z task:tokio-runtime-w state:D stack:0 pid:27320 tgid:7967 ppid:7880 flags:0x00000002
2025-01-16T17:38:19.058Z Call Trace:
2025-01-16T17:38:19.061Z <TASK>
2025-01-16T17:38:19.065Z __schedule+0x266/0x720
2025-01-16T17:38:19.069Z schedule+0x27/0xa0
2025-01-16T17:38:19.074Z schedule_preempt_disabled+0x15/0x30
2025-01-16T17:38:19.080Z rwsem_down_write_slowpath+0x1d3/0x4e0
2025-01-16T17:38:19.085Z ? srso_return_thunk+0x5/0x5f
2025-01-16T17:38:19.089Z down_write+0x6a/0x70
2025-01-16T17:38:19.093Z vfs_unlink+0x48/0x2c0
2025-01-16T17:38:19.097Z do_unlinkat+0x2bc/0x340
2025-01-16T17:38:19.102Z __x64_sys_unlinkat+0x56/0xc0
2025-01-16T17:38:19.106Z do_syscall_64+0x8c/0x1b0
2025-01-16T17:38:19.113Z ? arch_exit_to_user_mode_prepare.isra.0+0x1e/0xd0
2025-01-16T17:38:19.118Z ? srso_return_thunk+0x5/0x5f
2025-01-16T17:38:19.123Z ? syscall_exit_to_user_mode+0x36/0x190
2025-01-16T17:38:19.128Z ? srso_return_thunk+0x5/0x5f
2025-01-16T17:38:19.133Z ? do_syscall_64+0xb9/0x1b0
2025-01-16T17:38:19.138Z ? syscall_exit_to_user_mode+0x36/0x190
2025-01-16T17:38:19.143Z ? srso_return_thunk+0x5/0x5f
2025-01-16T17:38:19.148Z ? do_syscall_64+0xb9/0x1b0
2025-01-16T17:38:19.154Z ? arch_exit_to_user_mode_prepare.isra.0+0xc0/0xd0
2025-01-16T17:38:19.160Z entry_SYSCALL_64_after_hwframe+0x76/0x7e
2025-01-16T17:38:19.165Z RIP: 0033:0x7f8b9beff42b
2025-01-16T17:38:19.174Z RSP: 002b:00007f8a8fdfc808 EFLAGS: 00000246 ORIG_RAX: 0000000000000107
2025-01-16T17:38:19.182Z RAX: ffffffffffffffda RBX: 00007f8a8fdfc828 RCX: 00007f8b9beff42b
2025-01-16T17:38:19.191Z RDX: 0000000000000000 RSI: 00007f8b99e54470 RDI: 000000000000004b
2025-01-16T17:38:19.199Z RBP: 00007f8a8fdfc8b0 R08: 00007f8b71b30608 R09: 0000000000000000
2025-01-16T17:38:19.208Z R10: 000000000a463408 R11: 0000000000000246 R12: ffffffff00000003
2025-01-16T17:38:19.216Z R13: 00007f8b72014ce0 R14: 0000000000000048 R15: 000000000000004b
2025-01-16T17:38:19.219Z </TASK>
2025-01-16T17:02:18.849Z INFO: task exe:3502400 blocked for more than 124 seconds.
2025-01-16T17:02:18.863Z Tainted: P OE 6.12.0-4.el9ueknext.x86_64 #1
2025-01-16T17:02:18.872Z ""echo 0 > /proc/sys/kernel/hung_task_timeout_secs"" disables this message.
2025-01-16T17:02:18.884Z task:exe state:D stack:0 pid:3502400 tgid:3481472 ppid:3481460 flags:0x00000000
2025-01-16T17:02:18.887Z Call Trace:
2025-01-16T17:02:18.889Z <TASK>
2025-01-16T17:02:18.893Z __schedule+0x266/0x720
2025-01-16T17:02:18.897Z schedule+0x27/0xa0
2025-01-16T17:02:18.903Z schedule_preempt_disabled+0x15/0x30
2025-01-16T17:02:18.908Z rwsem_down_read_slowpath+0x25c/0x490
2025-01-16T17:02:18.912Z down_read+0x48/0xb0
2025-01-16T17:02:18.916Z do_madvise+0xdd/0x4e9
2025-01-16T17:02:18.921Z __x64_sys_madvise+0x2b/0x40
2025-01-16T17:02:18.925Z do_syscall_64+0x8c/0x1b0
2025-01-16T17:02:18.932Z ? arch_exit_to_user_mode_prepare.isra.0+0x1e/0xd0
2025-01-16T17:02:18.937Z ? syscall_exit_to_user_mode+0x36/0x190
2025-01-16T17:02:18.942Z ? do_syscall_64+0xb9/0x1b0
2025-01-16T17:02:18.947Z ? flush_tlb_func+0x1dd/0x220
2025-01-16T17:02:18.951Z ? sched_clock+0x10/0x30
2025-01-16T17:02:18.956Z ? sched_clock_cpu+0xf/0x1e0
2025-01-16T17:02:18.961Z ? irqtime_account_irq+0x46/0xd0
2025-01-16T17:02:18.965Z ? clear_bhb_loop+0x45/0xa0
2025-01-16T17:02:18.971Z entry_SYSCALL_64_after_hwframe+0x76/0x7e
2025-01-16T17:02:18.975Z RIP: 0033:0x48250e
2025-01-16T17:02:18.984Z RSP: 002b:000000c001a6d5c0 EFLAGS: 00000212 ORIG_RAX: 000000000000001c
2025-01-16T17:02:18.992Z RAX: ffffffffffffffda RBX: 00003f1b62c6d000 RCX: 000000000048250e
2025-01-16T17:02:19Z     RDX: 0000000000000017 RSI: 0000000000002000 RDI: 00003f1b62c6d000
2025-01-16T17:02:19.008Z RBP: 000000c001a6d600 R08: 0000000000000000 R09: 0000000000000000
2025-01-16T17:02:19.017Z R10: 0000000000000000 R11: 0000000000000212 R12: 00000001ffc6d000
2025-01-16T17:02:19.025Z R13: ffffffffffffffff R14: 000000c005821340 R15: 0000000000000050
2025-01-16T17:02:19.028Z </TASK>

We could also see ps and top etc hanging on these hosts.

We have not seen similar issues with 6.10.0-2.el9ueknext.x86_64.

Is this a known issue with 6.12.0-4.el9ueknext.x86_64?

@darrenkenny
Copy link
Member

@danielnorberg no we have not seen this before, can you provide more information about the system that you're testing this on?

@danielnorberg
Copy link
Author

@darrenkenny Sure. The messages above are from Oracle Cloud bare metal instances, mostly bm.gpu4.8 and bm.gpu.a10.4 etc, although we saw the same on other large cloud vendor VMs.

The hosts run gVisor containers with different GPU workloads, often ML inference jobs. To fully utilize the hosts we typically schedule multiple containers for separate workloads per host. The workloads are often bursty, at times saturating network/disk I/O or CPU/mem, depending on the workload and phase of execution.

Happy to provide more specific details if that would be useful.

oraclelinuxkernel pushed a commit that referenced this issue Jan 28, 2025
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed
packet, with shorter length can eventually cause system panic further
down in the code path. Avoid it by validating the length and dropping it
at the earliest.

Following is seen in our env with shorter skb->len

crash> bt
PID: 76981    TASK: ff19828cfe508000  CPU: 106  COMMAND: "vhost-76942"
 #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801
 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142
 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640
 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1
 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de
 #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96
 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e
 #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30
 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6
 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd
    [exception RIP: memcpy_erms+6]
    RIP: ffffffffae261ab6  RSP: ff2d20159b39f700  RFLAGS: 00010293
    RAX: ff198291741ecf2e  RBX: ff19828e70d6a100  RCX: fffffffffea1af2b
    RDX: fffffffffffffffd  RSI: ff19828eba6d7e5e  RDI: ff198291757d2000
    RBP: ff2d20159b39f760   R8: ff198291741ecf00   R9: 000000000000037c
    R10: 000000000000003c  R11: ff19828ffe953940  R12: ff198291741ecf20
    R13: ff198267dcb1b600  R14: ff19828eeebb09c0  R15: ff198291741ecf00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core]
 #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core]
 #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766
 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564
 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e
 #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee
 #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370
 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q]
 #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766
 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a
 #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e
 #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan]
 #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766
 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564
 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e
 #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81
 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370
 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap]
 #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net]
 #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net]
 #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net]
 #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net]
 #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost]
 #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5
 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364

This change was discussed with Nvidia and they are in agreement.

Orabug: 36660755

Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration")
Reported-and-tested-by: Dongli Zhang <[email protected]>
Signed-off-by: Manjunath Patil <[email protected]>
Reviewed-by: Si-Wei Liu <[email protected]>
Reviewed-by: Jack Vogel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants