Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tegra-se-nvhost incompatible with LUKS #114

Open
j-baker opened this issue Jul 25, 2023 · 2 comments
Open

tegra-se-nvhost incompatible with LUKS #114

j-baker opened this issue Jul 25, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@j-baker
Copy link

j-baker commented Jul 25, 2023

Reproduction on Orin Nano using a disk on the built-in m.2 slot, but probably not a mandatory requirement.

set -e
ROOT_PARTITION="/dev/nvme0n1p1" # replace this with the device of your choice if you don't want to nuke your boot volume!
ROOT_LABEL="luks-repro"
ENCRYPTED_LABEL="$ROOT_LABEL-enc"
cryptsetup luksFormat --label "$ENCRYPTED_LABEL" "$ROOT_PARTITION" 
cryptsetup luksOpen "$ROOT_PARTITION" "$ENCRYPTED_LABEL"
mkfs.ext4 -L "$ROOT_LABEL" "/dev/mapper/$ENCRYPTED_LABEL"

mkdir -p /mnt/luks
mount "/dev/mapper/$ENCRYPTED_LABEL" /mnt/luks

You will start getting kernel errors along the lines of Couldn't get free cmdbuf. If you start doing a bunch of disk operations (e.g. installing Nix onto that volume), eventually you will get ext4 errors and see your filesystem remounted as read only due to corruption. My best guess is that a bunch of writes were simply not applied. This behaviour can be reproduced with btrfs instead of ext4, obviously with different ways of reporting corruption.

other reproductions

  • If you luksFormat with --sector-size 4096 you may receive Bug: scheduling while atomic errors and potential lockups.
  • If you luksOpen with --perf-no_read_workqueue --perf-no_write_workqueue you will very likely observe a kernel panic.

mitigation

'Couldn't get free cmdbuf' is only reported from within this file, which provides "Tegra Crypto algorithm support using Host1x interface". A mitigation is to disable this kernel module at startup time with initcall_blacklist=tegra_se_module_init, after which the bug is not reproducible, although presumably you lose hardware crypto. However, fast but sometimes broken is something I want from my food deliveries, not my crypto, so it's probably no major loss.

@danielfullmer
Copy link
Collaborator

Thanks for the detailed report!

Merged a workaround in #122, but I'll leave this issue open until we're able to get Tegra-SE + LUKS working reliably.

@danielfullmer
Copy link
Collaborator

For some additional details, since I recently took a look at this again:

One issue is:

[ 1058.429722] BUG: scheduling while atomic: swapper/6/0/0x00000100
[ 1058.429920] Modules linked in: 8021q garp mrp tcp_bbr rtl8822ce atkbd libps2 nvidia(O) snd_soc_tegra186_asrc snd_soc_tegra186_dspk snd_soc_tegra210_ope snd_soc_tegra186_arad snd_soc_tegra210_iqc snd_soc_tegra210_mvc aes_ce_blk snd_soc_tegra210_admaif snd_soc_tegra210_dmic snd_soc_tegra210_afc crypto_simd cryptd snd_soc_tegra210_adx aes_ce_cipher snd_soc_tegra_pcm snd_soc_tegra210_amx snd_soc_tegra210_i2s snd_soc_tegra210_mixer rtw88_8822ce snd_soc_tegra210_sfc ghash_ce sha2_ce rtw88_8822c rtw88_pci snd_hda_codec_hdmi rtw88_core sha256_arm64 rtk_btusb snd_soc_spdif_tx sha1_ce mttcan snd_soc_tegra_machine_driver pwm_fan snd_soc_tegra210_adsp snd_hda_tegra mac80211 ina3221 btusb snd_hda_codec snd_soc_tegra_utils snd_soc_simple_card_utils fusb301 snd_soc_tegra210_ahub btrtl can_dev userspace_alert nvadsp btbcm btintel nv_imx219 tegra_bpmp_thermal tegra210_adma snd_hda_core cfg80211 spi_tegra114 r8168 loop tap macvlan nvgpu nvmap fuse ip_tables x_tables ahci libahci libata overlay r8169 realtek
[ 1058.434983] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G           O      5.10.104 #1-NixOS
[ 1058.440068] Hardware name: Unknown NVIDIA Orin NX Developer Kit/NVIDIA Orin NX Developer Kit, BIOS 35.3.1 01/01/1980
[ 1058.450830] Call trace:
[ 1058.453461]  dump_backtrace+0x0/0x200
[ 1058.457131]  show_stack+0x30/0x40
[ 1058.460547]  dump_stack+0xd8/0x138
[ 1058.463958]  __schedule_bug+0x78/0x88
[ 1058.467631]  __schedule+0x754/0x838
[ 1058.471305]  schedule+0x50/0xd0
[ 1058.474454]  schedule_preempt_disabled+0x18/0x20
[ 1058.479181]  __mutex_lock.isra.0+0x550/0x560
[ 1058.483642]  __mutex_lock_slowpath+0x28/0x38
[ 1058.488105]  mutex_lock+0x64/0x70
[ 1058.491520]  tegra_se_aes_queue_req+0x30/0x98
[ 1058.495981]  tegra_se_aes_xts_decrypt+0x44/0x68
[ 1058.500531]  crypto_skcipher_decrypt+0x38/0x50
[ 1058.504993]  crypt_convert+0x9bc/0xd30
[ 1058.508842]  kcryptd_crypt+0xb4/0x430
[ 1058.512517]  kcryptd_crypt_tasklet+0x24/0x30
[ 1058.516981]  tasklet_action_common.isra.0+0x15c/0x180
[ 1058.521880]  tasklet_action+0x30/0x38
[ 1058.525556]  __do_softirq+0x130/0x368
[ 1058.529142]  irq_exit+0x128/0x130
[ 1058.532559]  __handle_domain_irq+0x74/0xc8
[ 1058.536755]  gic_handle_irq+0x68/0x134
[ 1058.540517]  el1_irq+0xd0/0x180
[ 1058.543670]  cpuidle_enter_state+0xbc/0x3d0
[ 1058.547779]  cpuidle_enter+0x40/0x58
[ 1058.551283]  call_cpuidle+0x44/0x78
[ 1058.554779]  do_idle+0x208/0x270
[ 1058.558106]  cpu_startup_entry+0x30/0x98
[ 1058.561959]  secondary_start_kernel+0x154/0x178
[ 1058.566584] softirq: huh, entered softirq 6 TASKLET 000000002a8329de with preempt_count 00000100, exited with 00000000?

This appears to be due to the tegra-se-nvhost driver using a mutex_lock in an atomic context, since it is called from the kcryptd tasklet. Other drivers in the same directory use a spinlock instead, which would appear to be the correct option, so it's unclear why a mutex is used here.

===

[  329.008351] tegra-se-nvhost 15820000.se: Couldn't get free cmdbuf
[  329.008556] Unable to handle kernel paging request at virtual address 000000000058d148
[  329.008791] Mem abort info:
[  329.008866]   ESR = 0x96000004
[  329.008959]   EC = 0x25: DABT (current EL), IL = 32 bits
[  329.009114]   SET = 0, FnV = 0
[  329.009203]   EA = 0, S1PTW = 0
[  329.009296] Data abort info:
[  329.009382]   ISV = 0, ISS = 0x00000004
[  329.009496]   CM = 0, WnR = 0
[  329.009595] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000115a42000
[  329.009784] [000000000058d148] pgd=0000000000000000, p4d=0000000000000000
[  329.009972] Internal error: Oops: 96000004 [#1] SMP
[  329.010099] Modules linked in: twofish_generic twofish_common 8021q garp mrp rtl8822ce tcp_bbr nvidia(O) snd_soc_tegra186_asrc snd_soc_tegra186_dspk snd_soc_tegra210_ope snd_soc_tegra186_arad snd_soc_tegra210_iqc snd_soc_tegra210_mvc aes_ce_blk snd_soc_tegra210_afc crypto_simd rtw88_8822ce atkbd snd_soc_tegra210_admaif snd_soc_tegra210_dmic snd_soc_tegra210_adx cryptd libps2 snd_soc_tegra210_amx rtw88_8822c aes_ce_cipher snd_soc_tegra210_i2s snd_soc_tegra210_adsp ghash_ce snd_soc_tegra_pcm snd_soc_tegra210_mixer snd_soc_tegra210_sfc rtw88_pci sha2_ce snd_soc_tegra_machine_driver snd_hda_codec_hdmi sha256_arm64 rtw88_core rtk_btusb sha1_ce snd_soc_tegra_utils snd_soc_spdif_tx snd_soc_simple_card_utils btusb pwm_fan mac80211 nvadsp btrtl snd_hda_tegra btbcm snd_soc_tegra210_ahub mttcan snd_hda_codec btintel fusb301 ina3221 snd_hda_core tegra_bpmp_thermal cfg80211 can_dev userspace_alert tegra210_adma nv_imx219 spi_tegra114 r8168 loop tap macvlan nvgpu nvmap fuse ip_tables x_tables ahci
[  329.010168]  libahci libata overlay r8169 realtek
[  329.068136] CPU: 0 PID: 591 Comm: kworker/u17:2 Tainted: G           O      5.10.104 #1-NixOS
[  329.076796] Hardware name: Unknown NVIDIA Orin NX Developer Kit/NVIDIA Orin NX Developer Kit, BIOS 35.3.1 01/01/1980
[  329.087571] Workqueue: se_work_q tegra_se_work_handler
[  329.092809] pstate: a0c00009 (NzCv daif +PAN +UAO -TCO BTYPE=--)
[  329.098849] pc : kfree+0x7c/0x330
[  329.102257] lr : tegra_se_work_handler+0x684/0x948
[  329.107244] sp : ffff800010debc70
[  329.110657] x29: ffff800010debc70 x28: 00000000fffffff4
[  329.116170] x27: ffff7583c2461180 x26: ffff80001c0a1000
[  329.121681] x25: 0000000000000040 x24: ffffce4f83448000
[  329.127194] x23: 0000000000000013 x22: ffffce4f817d6ed4
[  329.132532] x21: ffff7583d5ec9b80 x20: ffff80001e345000
[  329.138045] x19: 000000000058d140 x18: 0000000000000010
[  329.143468] x17: 0000000000018021 x16: 0000000000018020
[  329.148981] x15: ffff7583d5ec9fe8 x14: ffffffffffffffff
[  329.154320] x13: ffff800090deb8c7 x12: ffff800010deb8d1
[  329.159744] x11: 0000000000000003 x10: 0101010101010101
[  329.165257] x9 : 00000000fffffffe x8 : ffffce4f810af9b0
[  329.170682] x7 : c0000000ffffefff x6 : 000000000000000a
[  329.176020] x5 : 0000000000000000 x4 : ffff7583cb763798
[  329.181446] x3 : 0000000000000000 x2 : ffffce4f80a7d680
[  329.186782] x1 : 0000000000000030 x0 : fffffdffffe00000
[  329.192123] Call trace:
[  329.194570]  kfree+0x7c/0x330
[  329.197719]  tegra_se_work_handler+0x684/0x948
[  329.202009]  process_one_work+0x1bc/0x480
[  329.206032]  worker_thread+0x158/0x4b8
[  329.209796]  kthread+0x104/0x130
[  329.213210]  ret_from_fork+0x10/0x18
[  329.216708] Code: b26babe0 d34cfe73 f2dfbfe0 8b131813 (f9400660)
[  329.222838] ---[ end trace d8fc7bc266803d06 ]---
[  329.227382] Kernel panic - not syncing: Oops: Fatal exception
[  329.232984] SMP: stopping secondary CPUs
[  329.236750] Kernel Offset: 0x4e4f708a0000 from 0xffff800010000000
[  329.242869] PHYS_OFFSET: 0xffff8a7d40000000
[  329.246895] CPU features: 0x0040006,4a80aa38
[  329.251357] Memory Limit: none
[  329.254333] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---

From reading the code, these parameters may be related to the issue of running out of cmdbufs: https://github.com/OE4T/linux-tegra-5.10/blob/5921377f5ffb5b1fbca9e40a187d1059743ef631/nvidia/drivers/crypto/tegra-se-nvhost.h#L252-L254

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants