Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel NULL pointer dereference occurs randomly while manipulating Docker images when using ZFS storage driver #17002

Open
taisph opened this issue Jan 28, 2025 · 2 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@taisph
Copy link

taisph commented Jan 28, 2025

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 24.04.1 LTS
Kernel Version 6.8.0-52-generic
Architecture amd64
OpenZFS Version zfs-2.2.2-0ubuntu9.1

Describe the problem you're observing

Periodically kernel oops loading/starting/stopping Docker containers using ZFS storage driver.

Describe how to reproduce the problem

Happens randomly (few times a week) when either loading images and/or creating or destroying containers.

Include any warning/errors/backtraces from the system logs

[27202.657779] [    T832] BUG: kernel NULL pointer dereference, address: 0000000000000000
[27202.657788] [    T832] #PF: supervisor instruction fetch in kernel mode
[27202.657791] [    T832] #PF: error_code(0x0010) - not-present page
[27202.657793] [    T832] PGD 0 P4D 0 
[27202.657797] [    T832] Oops: 0010 [#1] PREEMPT SMP NOPTI
[27202.657801] [    T832] CPU: 12 PID: 832 Comm: arc_prune Kdump: loaded Tainted: P           O       6.8.0-52-generic #53-Ubuntu
[27202.657804] [    T832] Hardware name: LENOVO 20YQCTO1WW/20YQCTO1WW, BIOS N37ET55W (1.36 ) 07/09/2024
[27202.657807] [    T832] RIP: 0010:0x0
[27202.657840] [    T832] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[27202.657842] [    T832] RSP: 0018:ffffb7ddc15f7d40 EFLAGS: 00010246
[27202.657845] [    T832] RAX: 0000000000000000 RBX: ffffb7ddc15f7dac RCX: 0000000000000000
[27202.657847] [    T832] RDX: 0000000000000000 RSI: ffffb7ddc15f7d48 RDI: ffff88fdb3f3e680
[27202.657849] [    T832] RBP: ffffb7ddc15f7d98 R08: 0000000000000000 R09: 0000000000000000
[27202.657851] [    T832] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000043a27
[27202.657853] [    T832] R13: 0000000000000000 R14: ffff88ff1e3d0000 R15: ffff88fdb3f3e680
[27202.657855] [    T832] FS:  0000000000000000(0000) GS:ffff89039f600000(0000) knlGS:0000000000000000
[27202.657857] [    T832] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[27202.657859] [    T832] CR2: ffffffffffffffd6 CR3: 000000065eaae005 CR4: 0000000000f70ef0
[27202.657862] [    T832] PKRU: 55555554
[27202.657863] [    T832] Call Trace:
[27202.657866] [    T832]  <TASK>
[27202.657868] [    T832]  ? show_regs+0x6d/0x80
[27202.657874] [    T832]  ? __die+0x24/0x80
[27202.657876] [    T832]  ? page_fault_oops+0x99/0x1b0
[27202.657881] [    T832]  ? do_user_addr_fault+0x2e9/0x670
[27202.657885] [    T832]  ? exc_page_fault+0x83/0x1b0
[27202.657889] [    T832]  ? asm_exc_page_fault+0x27/0x30
[27202.657898] [    T832]  zfs_prune+0x8d/0x130 [zfs]
[27202.658189] [    T832]  zpl_prune_sb+0x35/0x60 [zfs]
[27202.658427] [    T832]  arc_prune_task+0x1f/0x40 [zfs]
[27202.658672] [    T832]  taskq_thread+0x1f3/0x3c0 [spl]
[27202.658702] [    T832]  ? __pfx_default_wake_function+0x10/0x10
[27202.658709] [    T832]  ? __pfx_taskq_thread+0x10/0x10 [spl]
[27202.658725] [    T832]  kthread+0xef/0x120
[27202.658729] [    T832]  ? __pfx_kthread+0x10/0x10
[27202.658732] [    T832]  ret_from_fork+0x44/0x70
[27202.658736] [    T832]  ? __pfx_kthread+0x10/0x10
[27202.658739] [    T832]  ret_from_fork_asm+0x1b/0x30
[27202.658744] [    T832]  </TASK>

Also reported on Launchpad.

@taisph taisph added the Type: Defect Incorrect behavior (e.g. crash, hang) label Jan 28, 2025
@chrisrd
Copy link
Contributor

chrisrd commented Jan 30, 2025

Possibly #16770 ? Fix is in zfs-2.2.7

@taisph
Copy link
Author

taisph commented Jan 31, 2025

Currently testing zfs-2.2.7 via the Ubuntu plucky source package built for noble. I have yet to encounter the NULL pointer issue but have bumped into a new one instead.

[16958.589146] INFO: task dockerd:34576 blocked for more than 983 seconds.
[16958.589152]       Tainted: P           O       6.8.0-52-generic #53-Ubuntu
[16958.589153] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[16958.589154] task:dockerd         state:D stack:0     pid:34576 tgid:31566 ppid:1      flags:0x00000002
[16958.589157] Call Trace:
[16958.589158]  <TASK>
[16958.589160]  __schedule+0x27c/0x6b0
[16958.589165]  schedule+0x33/0x110
[16958.589166]  grab_super+0x144/0x180
[16958.589169]  ? __pfx_var_wake_function+0x10/0x10
[16958.589173]  ? __pfx_zpl_test_super+0x10/0x10 [zfs]
[16958.589285]  sget+0x1cf/0x350
[16958.589286]  ? __pfx_set_anon_super+0x10/0x10
[16958.589289]  zpl_mount+0xee/0x300 [zfs]
[16958.589372]  legacy_get_tree+0x28/0x60
[16958.589375]  vfs_get_tree+0x27/0x100
[16958.589377]  do_new_mount+0x1a0/0x340
[16958.589379]  path_mount+0x1e0/0x830
[16958.589380]  ? putname+0x5b/0x80
[16958.589382]  __x64_sys_mount+0x127/0x160
[16958.589383]  x64_sys_call+0x1e57/0x25a0
[16958.589385]  do_syscall_64+0x7f/0x180
[16958.589387]  ? __do_sys_newfstatat+0x53/0x90
[16958.589389]  ? syscall_exit_to_user_mode+0x86/0x260
[16958.589392]  ? do_syscall_64+0x8c/0x180
[16958.589393]  ? syscall_exit_to_user_mode+0x86/0x260
[16958.589394]  ? do_syscall_64+0x8c/0x180
[16958.589396]  ? __do_sys_newfstatat+0x53/0x90
[16958.589398]  ? syscall_exit_to_user_mode+0x86/0x260
[16958.589400]  ? do_syscall_64+0x8c/0x180
[16958.589401]  ? clear_bhb_loop+0x15/0x70
[16958.589402]  ? clear_bhb_loop+0x15/0x70
[16958.589403]  ? clear_bhb_loop+0x15/0x70
[16958.589404]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[16958.589407] RIP: 0033:0x5630bfb6700e
[16958.589429] RSP: 002b:000000c000926928 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[16958.589431] RAX: ffffffffffffffda RBX: 000000c015800c90 RCX: 00005630bfb6700e
[16958.589432] RDX: 000000c011e9a430 RSI: 000000c0199dc480 RDI: 000000c015800c90
[16958.589432] RBP: 000000c000926968 R08: 0000000000000000 R09: 0000000000000000
[16958.589433] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000043
[16958.589434] R13: 000000c000d8c000 R14: 000000c001390820 R15: 0000000000000012
[16958.589435]  </TASK>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

2 participants