feat(1808): migrate to linux 6.8.9 #2124

kingluo · 2024-05-21T09:30:31Z

…egfault

…STAMPING

Problems: 1. In the new kernel, assembly functions uniformly return from `__x86_return_thunk`. However, our assembly code uses the original `ret` instruction, so objtool in the kernel will notice this is a naked return during compilation. 2. `SYM_FUNC_START` in the new kernel will add endbr64 to the head of the assembly function, and all indirect jumps to ENDBR instructions, that is, the code snippet within the same function, will fail, but we use jump tables in the assembly function to perform indirect jumps. It will raise CET exception: https://en.wikipedia.org/wiki/X86_instruction_listings#Added_with_Intel_CET). Solutions: 1. Substitute the `ret` with `RET`, a macro in the new kernel to ensure the correct return. 2. `notrack jmp` and enable notrack in CPU setting: `wrmsrl(MSR_IA32_S_CET, CET_ENDBR_EN | CET_NO_TRACK_EN)` As an aside, interestingly, if a user-mode C program uses a switch statement that meets the conditions for generating a jump table (gcc uses `-fcf-protection=full` by default), the generated jump table will use a `jmp` with the `notrack` prefix, and IBT will be marked as `true` in the `.note.gnu.property` section of the compiled elf file, so that the `NO_TRACK_EN` of the `MSR` will be set to `true` in user mode when the kernel is loaded. So user mode can use `notrack` to bypass CET without caring about setting or not setting `NO_TRACK_EN`.

…grate-to-linux-6.8.9

This reverts commit 89d2f30.

biathlon3 · 2024-08-13T06:25:43Z

During tests sometimes occurs this crash.
It does not depend on exactly one test, but this case happened in forwarding.test_match_host_forwarded_regex.TestMatchLocationsH2.test_host_WorkShop_uri_testwiki from PR#649

[  509.314542] ------------[ cut here ]------------
[  509.321182] [tdb] Close table 'sessions0.tdb'
[  569.337733] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  569.337736] rcu:     0-...!: (0 ticks this GP) idle=01c4/1/0x4000000000000000 softirq=8073/8073 fqs=0
[  569.337740] rcu:     2-...!: (0 ticks this GP) idle=a294/1/0x4000000000000000 softirq=8721/8721 fqs=0
[  569.337741] rcu:     (detected by 3, t=15005 jiffies, g=9857, q=37 ncpus=4)
[  569.337743] Sending NMI from CPU 3 to CPUs 0:
[  566.596664] NMI backtrace for cpu 0
[  566.596664] CPU: 0 PID: 4226 Comm: sysctl Tainted: G           OE      6.8.9+ #1
[  566.596664] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
[  566.596664] RIP: 0010:vprintk_emit+0x258/0x320
[  566.596664] Code: 4d 85 ed 74 57 65 48 8b 04 25 40 41 03 00 49 39 c5 74 49 48 c7 c7 98 dd 39 84 c6 05 c0 de 20 03 01 e8 9c a9 ef 00 eb 02 f3 90 <0f> b6 1d b0 de 20 03 80 fb 01 0f 87 73 6c e7 00 83 e3 01 75 e9 e8
[  566.596664] RSP: 0018:ffffc90001377ac0 EFLAGS: 00000002
[  566.596664] RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffffff83380788
[  566.596664] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffffffff8439dd98
[  566.596664] RBP: ffffc90001377b00 R08: 0000000000000021 R09: 00000000843b22d4
[  566.596664] R10: ffffffffffffffff R11: 0000000000000025 R12: 0000000000000246
[  566.596664] R13: ffff888171b119c0 R14: 0000000000000021 R15: ffffffffc0ab915e
[  566.596664] FS:  00007f0ed78e6740(0000) GS:ffff888277c00000(0000) knlGS:0000000000000000
[  566.596664] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  566.596664] CR2: 00007f0d779a1548 CR3: 0000000171b3e000 CR4: 0000000000750ef0
[  566.596664] PKRU: 55555554
[  566.596664] Call Trace:
[  566.596664]  <NMI>
[  566.596664]  ? show_regs+0x6e/0x80
[  566.596664]  ? nmi_cpu_backtrace+0xb1/0x120
[  566.596664]  ? nmi_cpu_backtrace_handler+0x15/0x20
[  566.596664]  ? nmi_handle+0x68/0x180
[  566.596664]  ? default_do_nmi+0x45/0x120
[  566.596664]  ? exc_nmi+0x12e/0x1b0
[  566.596664]  ? end_repeat_nmi+0xf/0x60
[  566.596664]  ? vprintk_emit+0x258/0x320
[  566.596664]  ? vprintk_emit+0x258/0x320
[  566.596664]  ? vprintk_emit+0x258/0x320
[  566.596664]  </NMI>
[  566.596664]  <TASK>
[  566.596664]  ? pcpu_free_area+0x1fd/0x320
[  566.596664]  vprintk_default+0x21/0x30
[  566.596664]  vprintk+0x40/0x70
[  566.596664]  _printk+0x5c/0x80
[  566.596664]  tdb_close+0x4e/0x70 [tempesta_db]
[  566.596664]  tfw_http_sess_stop+0x31/0x40 [tempesta_fw]
[  566.596664]  tfw_mods_stop+0x35/0xc0 [tempesta_fw]
[  566.596664]  tfw_ctlfn_state_io+0x1c3/0x4e0 [tempesta_fw]
[  566.596664]  ? __pfx_tfw_ctlfn_state_io+0x10/0x10 [tempesta_fw]
[  566.596664]  ? kvmalloc_node+0x2a/0x100
[  566.596664]  proc_sys_call_handler+0x1b3/0x2d0
[  566.596664]  proc_sys_write+0x17/0x20
[  566.596664]  vfs_write+0x311/0x430
[  566.596664]  ksys_write+0x6b/0xf0
[  566.596664]  __x64_sys_write+0x1d/0x30
[  566.596664]  x64_sys_call+0x1681/0x20c0
[  566.596664]  do_syscall_64+0x72/0x120
[  566.596664]  ? __count_memcg_events+0x6f/0x110
[  566.596664]  ? count_memcg_events.constprop.0+0x1e/0x40
[  566.596664]  ? handle_mm_fault+0x192/0x2f0
[  566.596664]  ? do_user_addr_fault+0x33f/0x6c0
[  566.596664]  ? irqentry_exit_to_user_mode+0x65/0x180
[  566.596664]  ? irqentry_exit+0x3f/0x50
[  566.596664]  ? clear_bhb_loop+0x25/0x80
[  566.596664]  ? clear_bhb_loop+0x25/0x80
[  566.596664]  ? clear_bhb_loop+0x25/0x80
[  566.596664]  ? clear_bhb_loop+0x25/0x80
[  566.596664]  ? clear_bhb_loop+0x25/0x80
[  566.596664]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[  566.596664] RIP: 0033:0x7f0ed7714887
[  566.596664] Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[  566.596664] RSP: 002b:00007fff0a433748 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  566.596664] RAX: ffffffffffffffda RBX: 000055d0974204a0 RCX: 00007f0ed7714887
[  566.596664] RDX: 0000000000000005 RSI: 000055d0974204e0 RDI: 0000000000000004
[  566.596664] RBP: 000055d097422610 R08: 0000000000000010 R09: 000055d097422610
[  566.596664] R10: 0000000000000077 R11: 0000000000000246 R12: 0000000000000005
[  566.596664] R13: 0000000000000005 R14: 00007f0ed7816b80 R15: 00007f0ed7816a00
[  566.596664]  </TASK>
[  569.338739] Sending NMI from CPU 3 to CPUs 2:
[  569.337736] NMI backtrace for cpu 2
[  569.337736] CPU: 2 PID: 994 Comm: kworker/u16:1 Tainted: G           OE      6.8.9+ #1
[  569.337736] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014

Full log
crush2.txt

kingluo · 2024-08-13T06:31:20Z

@biathlon3 Please describe which commit of this branch the crash happens at.

biathlon3 · 2024-08-13T06:38:00Z

And a very rare case of OOM during compilation, VM with 4 cpu, make -j4

[ 7715.787707] process 'tempesta/tls/t/tgen_ec256' started with executable stack
[ 7770.509446] cc1 invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
[ 7770.510470] CPU: 2 PID: 26858 Comm: cc1 Tainted: G        W  OE      6.8.9+ #1
[ 7770.511212] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
[ 7770.512200] Call Trace:
[ 7770.512677]  <TASK>
[ 7770.512923]  dump_stack_lvl+0x70/0x90
[ 7770.512923]  dump_stack+0x14/0x20
[ 7770.512923]  dump_header+0x47/0x1c0
[ 7770.512923]  out_of_memory+0x461/0x570
[ 7770.512923]  __alloc_pages+0x101a/0x1230
[ 7770.512923]  alloc_pages_mpol+0x95/0x210
[ 7770.512923]  ? filemap_alloc_folio+0xf9/0x100
[ 7770.512923]  alloc_pages+0x62/0xd0
[ 7770.512923]  folio_alloc+0x1c/0x50
[ 7770.512923]  filemap_alloc_folio+0xf9/0x100
[ 7770.512923]  __filemap_get_folio+0x116/0x2f0
[ 7770.512923]  filemap_fault+0x170/0xcd0
[ 7770.512923]  __do_fault+0x38/0x130
[ 7770.512923]  do_fault+0x279/0x4a0
[ 7770.512923]  __handle_mm_fault+0x8b0/0xed0
[ 7770.512923]  handle_mm_fault+0xc7/0x2f0
[ 7770.512923]  do_user_addr_fault+0x168/0x6c0
[ 7770.512923]  exc_page_fault+0x7d/0x190
[ 7770.512923]  asm_exc_page_fault+0x2b/0x30
[ 7770.512923] RIP: 0033:0x5fed26
[ 7770.512923] Code: Unable to access opcode bytes at 0x5fecfc.
[ 7770.512923] RSP: 002b:00007ffcb2ee4c80 EFLAGS: 00010206
[ 7770.512923] RAX: 00007fe6f2fc2c40 RBX: 0000000000000001 RCX: 0000000000000001
[ 7770.512923] RDX: 00007fe6f2fc3900 RSI: 0000012000000003 RDI: 00007fe6f6100dc8
[ 7770.512923] RBP: 00007fe6f2fc2be0 R08: 000000000000002f R09: 000000000000007f
[ 7770.512923] R10: 00007fe6f2fc2c40 R11: 0000000000000000 R12: 00007fe6f2fc1300
[ 7770.512923] R13: 0000000000000060 R14: 00007fe6f861b930 R15: 00007fe6f2fc2c40
[ 7770.512923]  </TASK>

Full log
crash2.txt

biathlon3 · 2024-08-13T06:40:31Z

@biathlon3 Please describe which commit of this branch the crash happens at.

#2161

kingluo · 2024-08-13T06:48:43Z

@biathlon3 have you made changes beyond this PR to adapt #2131? If so, please record the error in #2131 instead of this one. I cannot reproduce your error.

kingluo added 8 commits May 25, 2024 17:29

feat(1808): migrate to Linux 6.8.9

21e7fff

close #1808

apply stats fix in advanced

8e0e942

remove duplicated sk->sk_cgrp_data.cgroup init

dc6d26d

disable AVX2 temporarily before fpu works

bca16f5

bignum_x86-64.S: replace ret with RET, to use __x86_return_thunk

1040a54

#1808 (comment)

re-enable AVX2, replace ret with RET

95d8481

disable AVX2 for memcpy/memcmp/bzero temporarily to avoid userspace s…

86e3043

…egfault

disable skb->head_frag assertion temporarily to avoid panic

e6256ea

kingluo force-pushed the jinhua/feat-1808-migrate-to-linux-6.8.9 branch from 669c09b to e6256ea Compare May 25, 2024 09:30

kingluo added 8 commits May 26, 2024 02:08

fix TfwGState->curr type: ensure correct frang index

7c10e6a

filter out SO_EE_ORIGIN_TIMESTAMPING in sk->sk_error_queue

3d5730e

enable fpu in the whole softirq ctx

89d2f30

continue sk->sk_receive_queue processing in case of SO_EE_ORIGIN_TIME…

2e4a18d

…STAMPING

Merge remote-tracking branch 'origin/master' into jinhua/feat-1808-mi…

356c594

…grate-to-linux-6.8.9

clean up temporary changes

8afb1d6

add linux-6.8.9.patch

e95951a

krizhanovsky requested review from krizhanovsky, const-t, EvgeniiMekhanik and biathlon3 June 17, 2024 16:26

patch update: change for_each_possible_cpu to for_each_online_cpu

99bb027

kingluo marked this pull request as ready for review June 26, 2024 08:17

kingluo added 7 commits July 4, 2024 15:06

handle SKBFL_SHARED_FRAG in flags, not tx_flags

c1c068b

merge master branch: 0dee025

96628ac

update linux-6.8.9.patch

f8de856

Revert "enable fpu in the whole softirq ctx"

094940e

This reverts commit 89d2f30.

try endbr64 on each switch label

eb1e9ec

remove notrack, prefix endbr64 at each jump table entry

2406021

use struct_group to avoid __write_overflow_field

b73a31a

Remove kernel_fpu_begin() added by mistake

09bd329

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(1808): migrate to linux 6.8.9 #2124

feat(1808): migrate to linux 6.8.9 #2124

kingluo commented May 21, 2024 •

edited

Loading

biathlon3 commented Aug 13, 2024

kingluo commented Aug 13, 2024

biathlon3 commented Aug 13, 2024

biathlon3 commented Aug 13, 2024

kingluo commented Aug 13, 2024

feat(1808): migrate to linux 6.8.9 #2124

Are you sure you want to change the base?

feat(1808): migrate to linux 6.8.9 #2124

Conversation

kingluo commented May 21, 2024 • edited Loading

biathlon3 commented Aug 13, 2024

kingluo commented Aug 13, 2024

biathlon3 commented Aug 13, 2024

biathlon3 commented Aug 13, 2024

kingluo commented Aug 13, 2024

kingluo commented May 21, 2024 •

edited

Loading