-
Notifications
You must be signed in to change notification settings - Fork 28
kernel_debug
== kernel debug ==
Dealing with panics.
Boot target VM with "-v keepsyms=y debug=0x144"
Make it panic.
On your dev machine, you will need the kernelDebugKit from apple. Download it.
gdb /Volumes/Kernelit/mach_kernel
source /Volumes/KernelDebugKit/kgmacros
target remote-kdp
kdp-reattach 192.168.30.133 # obviously use the IP of your target / crashed VM
showallkmods
(find "address" for zfs and spl modules)
^Z # suspend gdb, or, use another terminal
kextutil -s /tmp -n -k /Volumes/KernelDebugKit/mach_kernel -e -r /Volumes/KernelDebugKit module/zfs/zfs.kext/ ../spl/module/spl/spl.kext/
fg # resume gdb, or go back to gdb terminal
set kext-symbol-file-path /tmp
add-kext /tmp/spl.kext
add-kext /tmp/zfs.kext
bt
=== Non-panic ===
You can always panic a kernel with
sudo dtrace -w -n "BEGIN{ panic();}"
If you prefer to work in gdb.
But this was revealing;
# /usr/libexec/stackshot -i -f /tmp/stackshot.log
# symstacks.rb -f /tmp/stackshot.log -s -w /tmp/trace.txt
# less /tmp/trace.txt
Note that my hang is here;
PID: 156
Process: zpool
Thread ID: 0x4e2
Thread state: 0x9 == TH_WAIT |TH_UNINT
Thread wait_event: 0xffffff8006608a6c
Kernel stack:
machine_switch_context (in mach_kernel) + 366 (0xffffff80002b3d3e)
0xffffff800022e711 (in mach_kernel) + 1281 (0xffffff800022e711)
thread_block_reason (in mach_kernel) + 300 (0xffffff800022d9dc)
lck_mtx_sleep (in mach_kernel) + 78 (0xffffff80002265ce)
0xffffff8000569ef6 (in mach_kernel) + 246 (0xffffff8000569ef6)
msleep (in mach_kernel) + 116 (0xffffff800056a2e4)
0xffffff7f80e52a76 (0xffffff7f80e52a76)
0xffffff7f80e53fae (0xffffff7f80e53fae)
0xffffff7f80e54173 (0xffffff7f80e54173)
0xffffff7f80f1a870 (0xffffff7f80f1a870)
0xffffff7f80f2bb4e (0xffffff7f80f2bb4e)
0xffffff7f80f1a9b7 (0xffffff7f80f1a9b7)
0xffffff7f80f1b65f (0xffffff7f80f1b65f)
0xffffff7f80f042ee (0xffffff7f80f042ee)
0xffffff7f80f45c5b (0xffffff7f80f45c5b)
0xffffff7f80f4ce92 (0xffffff7f80f4ce92)
spec_ioctl (in mach_kernel) + 157 (0xffffff8000320bfd)
VNOP_IOCTL (in mach_kernel) + 244 (0xffffff8000311e84)
It is a little shame that it only shows the kernel symbols, and not inside SPL and ZFS. But we can ask it to load another sym file (alas, it can not handle multiple symbols files, fix this Apple)
# kextstat
# Grab the addresses of spl and zfs again
# kextutil -s /tmp -n -k /Volumes/KernelDebugKit/mach_kernel -e -r /Volumes/KernelDebugKit module/zfs/zfs.kext/ ../spl/module/spl/spl.kext/
# symstacks.rb -f /tmp/stackshot.log -s -k /tmp/net.lundman.spl.sym
0xffffff800056a2e4 (0xffffff800056a2e4)
spl_cv_wait (in net.lundman.spl.sym) + 54 (0xffffff7f80e52a76)
taskq_wait (in net.lundman.spl.sym) + 78 (0xffffff7f80e53fae)
taskq_destroy (in net.lundman.spl.sym) + 35 (0xffffff7f80e54173)
0xffffff7f80f1a870 (0xffffff7f80f1a870)
# symstacks.rb -f /tmp/stackshot.log -s -k /tmp/net.lundman.zfs.sym
0xffffff7f80e54173 (0xffffff7f80e54173)
vdev_open_children (in net.lundman.zfs.sym) + 336 (0xffffff7f80f1a870)
vdev_root_open (in net.lundman.zfs.sym) + 94 (0xffffff7f80f2bb4e)
vdev_open (in net.lundman.zfs.sym) + 311 (0xffffff7f80f1a9b7)
vdev_create (in net.lundman.zfs.sym) + 31 (0xffffff7f80f1b65f)
spa_create (in net.lundman.zfs.sym) + 878 (0xffffff7f80f042ee)
Voila!
=== Memory Leaks ===
If you suspect memory issues, for example from a panic like:
panic(cpu 1 caller 0xffffff80002438d8): "zalloc: \"kalloc.1024\" (100535 elements) retry fail 3, kfree_nop_count: 0"@/SourceCache/xnu/xnu-2050.7.9/osfmk/kern/zalloc.c:1826
Attach gdb and use thezprint
command;
(gdb) zprint
ZONE COUNT TOT_SZ MAX_SZ ELT_SZ ALLOC_SZ TOT_ALLOC TOT_FREE NAME
0xffffff8002a89250 1620133 18c1000 22a3599 16 1000 125203838 123583705 kalloc.16 CX
0xffffff8006306c50 110335 35f000 4ce300 32 1000 13634985 13524650 kalloc.32 CX
0xffffff8006306a00 133584 82a000 e6a900 64 1000 26510120 26376536 kalloc.64 CX
0xffffff80063067b0 610090 4a84000 614f4c0 128 1000 50524515 49914425 kalloc.128 CX
0xffffff8006306560 1070398 121a2000 1b5e4d60 256 1000 72534632 71464234 kalloc.256 CX
0xffffff8006306310 399302 d423000 daf26b0 512 1000 39231204 38831902 kalloc.512 CX
0xffffff80063060c0 100404 6231000 c29e980 1024 1000 22949693 22849289 kalloc.1024 CX
0xffffff8006305e70 292 9a000 200000 2048 1000 77633725 77633433 kalloc.2048 CX
In this case, kalloc.256 is suspect.
Reboot kernel with zlog=kalloc.256
on the command line, then we can use
(gdb) findoldest
oldest record is at log index 393:
--------------- ALLOC 0xffffff803276ec00 : index 393 : ztime 21643824 -------------
0xffffff800024352e <zalloc_canblock+78>: mov %eax,-0xcc(%rbp)
0xffffff80002245bd <get_zone_search+23>: jmpq 0xffffff80002246d8 <KALLOC_ZINFO_SALLOC+35>
0xffffff8000224c39 <OSMalloc+89>: mov %rax,-0x18(%rbp)
0xffffff7f80e847df <zfs_kmem_alloc+15>: mov %rax,%r15
0xffffff7f80e90649 <arc_buf_alloc+41>: mov %rax,-0x28(%rbp)
and indeed, list any index
(gdb) zstack 394
--------------- ALLOC 0xffffff8032d60700 : index 394 : ztime 21648810 -------------
0xffffff800024352e <zalloc_canblock+78>: mov %eax,-0xcc(%rbp)
0xffffff80002245bd <get_zone_search+23>: jmpq 0xffffff80002246d8 <KALLOC_ZINFO_SALLOC+35>
0xffffff8000224c39 <OSMalloc+89>: mov %rax,-0x18(%rbp)
0xffffff7f80e847df <zfs_kmem_alloc+15>: mov %rax,%r15
0xffffff7f80e90649 <arc_buf_alloc+41>: mov %rax,-0x28(%rbp)
How many times was zfs_kmem_alloc
involved in the leaked allocs?
(gdb) countpcs 0xffffff7f80e847df
occurred 3999 times in log (100% of records)
At least we know it is our fault.
How many times is it arc_buf_alloc
?
(gdb) countpcs 0xffffff7f80e90649
occurred 2390 times in log (59% of records)
== lldb kernel debug ==
echo "settings set target.load-script-from-symbol-file true" >> ~/.lldbinit
lldb /Volumes/KernelDebugKit/mach_kernel
kdp-remote 192.168.30.146
showallkmods
addkext -F /tmp/spl.kext/Contents/MacOS/spl 0xffffff7f8ebb0000 (Address from showallkmods)
addkext -F /tmp/zfs.kext/Contents/MacOS/zfs 0xffffff7f8ebbf000
Then follow guide for GDB above