Skip to content

Commit

Permalink
add function metadata support
Browse files Browse the repository at this point in the history
For now, there isn't a way to set and get per-function metadata with
a low overhead, which is not convenient for some situations. Take
BPF trampoline for example, we need to create a trampoline for each
kernel function, as we have to store some information of the function
to the trampoline, such as BPF progs, function arg count, etc. The
performance overhead and memory consumption can be higher to create
these trampolines. With the supporting of per-function metadata storage,
we can store these information to the metadata, and create a global BPF
trampoline for all the kernel functions. In the global trampoline, we
get the information that we need from the function metadata through the
ip (function address) with almost no overhead.

Another beneficiary can be ftrace. For now, all the kernel functions that
are enabled by dynamic ftrace will be added to a filter hash if there are
more than one callbacks. And hash lookup will happen when the traced
functions are called, which has an impact on the performance, see
__ftrace_ops_list_func() -> ftrace_ops_test(). With the per-function
metadata supporting, we can store the information that if the callback is
enabled on the kernel function to the metadata.

Support per-function metadata storage in the function padding, and
previous discussion can be found in [1]. Generally speaking, we have two
way to implement this feature:

1. Create a function metadata array, and prepend a insn which can hold
the index of the function metadata in the array. And store the insn to
the function padding.

2. Allocate the function metadata with kmalloc(), and prepend a insn which
hold the pointer of the metadata. And store the insn to the function
padding.

Compared with way 2, way 1 consume less space, but we need to do more work
on the global function metadata array. And we implement this function in
the way 1.

We implement this function in the following way for x86:

With CONFIG_CALL_PADDING enabled, there will be 16-bytes(or more) padding
space before all the kernel functions. And some kernel features can use
it, such as MITIGATION_CALL_DEPTH_TRACKING, CFI_CLANG, FINEIBT, etc.

In my research, MITIGATION_CALL_DEPTH_TRACKING will consume the tail
9-bytes in the function padding, and FINEIBT + CFI_CLANG will consume
the head 7-bytes. So there will be no space for us if
MITIGATION_CALL_DEPTH_TRACKING and CFI_CLANG are both enabled.

In x86, we need 5-bytes to prepend a "mov %eax xxx" insn, which can hold
a 4-bytes index. So we have following logic:

1. use the head 5-bytes if CFI_CLANG is not enabled
2. use the tail 5-bytes if MITIGATION_CALL_DEPTH_TRACKING is not enabled
3. compile the kernel with extra 5-bytes padding if
   MITIGATION_CALL_DEPTH_TRACKING and CFI_CLANG are both enabled.

In the third case, we compile the kernel with a function padding of
21-bytes, which means the real function is not 16-bytes aligned any more.
And in [2], I tested the performance of the kernel with different padding,
and it seems that extra 5-bytes don't have impact on the performance.
However, it's a huge change to make the kernel function unaligned in
16-bytes, and I'm not sure if there are any other influence. So another
choice is to compile the kernel with 32-bytes aligned if there is no space
available for us in the function padding. But this will increase the text
size ~5%. (And I'm not sure with method to use.)

We implement this function in the following way for arm64:

The per-function metadata storage is already used by ftrace if
CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS is enabled, and it store the pointer
of the callback directly to the function padding, which consume 8-bytes,
in the commit
baaf553 ("arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS") [3].
So we can directly store the index to the function padding too, without
a prepending. With CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS enabled, the
function is 8-bytes aligned, and we will compile the kernel with extra
8-bytes (2 NOPS) padding space. Otherwise, the function is 4-bytes
aligned, and only extra 4-bytes (1 NOPS) is needed.

However, we have the same problem with Mark in [3]: we can't use the
function padding together with CFI_CLANG, which can make the clang
compiles a wrong offset to the pre-function type hash. He said that he was
working with others on this problem 2 years ago. Hi Mark, is there any
progress on this problem?

I tested this function by setting metadata for all the kernel function,
and it consumes 0.7s for 70k+ functions, not bad :/

Maybe we should split this patch into 3 patches :/

Link: https://lore.kernel.org/bpf/CADxym3anLzM6cAkn_z71GDd_VeKiqqk1ts=xuiP7pr4PO6USPA@mail.gmail.com/ [1]
Link: https://lore.kernel.org/bpf/CADxym3af+CU5Mx8myB8UowdXSc3wJOqWyH4oyq+eXKahXBTXyg@mail.gmail.com/ [2]
Signed-off-by: Menglong Dong <[email protected]>
  • Loading branch information
image-dragon authored and Kernel Patches Daemon committed Feb 26, 2025
1 parent 16566af commit dbc51df
Show file tree
Hide file tree
Showing 11 changed files with 425 additions and 10 deletions.
15 changes: 15 additions & 0 deletions arch/arm64/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -1536,6 +1536,21 @@ config NODES_SHIFT
Specify the maximum number of NUMA Nodes available on the target
system. Increases memory reserved to accommodate various tables.

config FUNCTION_METADATA
bool "Per-function metadata storage support"
default y
select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE if !FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY
depends on !CFI_CLANG
help
Support per-function metadata storage for kernel functions, and
get the metadata of the function by its address with almost no
overhead.

The index of the metadata will be stored in the function padding,
which will consume 4-bytes. If FUNCTION_ALIGNMENT_8B is enabled,
extra 8-bytes function padding will be reserved during compiling.
Otherwise, only extra 4-bytes function padding is needed.

source "kernel/Kconfig.hz"

config ARCH_SPARSEMEM_ENABLE
Expand Down
23 changes: 21 additions & 2 deletions arch/arm64/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -144,12 +144,31 @@ endif

CHECKFLAGS += -D__aarch64__

ifeq ($(CONFIG_FUNCTION_METADATA),y)
ifeq ($(CONFIG_FUNCTION_ALIGNMENT_8B),y)
__padding_nops := 2
else
__padding_nops := 1
endif
else
__padding_nops := 0
endif

ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS),y)
__padding_nops := $(shell echo $(__padding_nops) + 2 | bc)
KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
CC_FLAGS_FTRACE := -fpatchable-function-entry=4,2
CC_FLAGS_FTRACE := -fpatchable-function-entry=$(shell echo $(__padding_nops) + 2 | bc),$(__padding_nops)
else ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_ARGS),y)
CC_FLAGS_FTRACE := -fpatchable-function-entry=$(shell echo $(__padding_nops) + 2 | bc),$(__padding_nops)
KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY
CC_FLAGS_FTRACE := -fpatchable-function-entry=2
else ifeq ($(CONFIG_FUNCTION_METADATA),y)
CC_FLAGS_FTRACE += -fpatchable-function-entry=$(__padding_nops),$(__padding_nops)
ifneq ($(CONFIG_FUNCTION_TRACER),y)
KBUILD_CFLAGS += $(CC_FLAGS_FTRACE)
# some file need to remove this cflag even when CONFIG_FUNCTION_TRACER
# is not enabled, so we need to export it here
export CC_FLAGS_FTRACE
endif
endif

ifeq ($(CONFIG_KASAN_SW_TAGS), y)
Expand Down
34 changes: 34 additions & 0 deletions arch/arm64/include/asm/ftrace.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,16 @@
#define FTRACE_PLT_IDX 0
#define NR_FTRACE_PLTS 1

#ifdef CONFIG_FUNCTION_METADATA
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
#define KFUNC_MD_DATA_OFFSET (AARCH64_INSN_SIZE * 3)
#else
#define KFUNC_MD_DATA_OFFSET AARCH64_INSN_SIZE
#endif
#define KFUNC_MD_INSN_SIZE AARCH64_INSN_SIZE
#define KFUNC_MD_INSN_OFFSET KFUNC_MD_DATA_OFFSET
#endif

/*
* Currently, gcc tends to save the link register after the local variables
* on the stack. This causes the max stack tracer to report the function
Expand Down Expand Up @@ -216,6 +226,30 @@ static inline bool arch_syscall_match_sym_name(const char *sym,
*/
return !strcmp(sym + 8, name);
}

#ifdef CONFIG_FUNCTION_METADATA
#include <asm/text-patching.h>

static inline bool kfunc_md_arch_exist(void *ip)
{
return !aarch64_insn_is_nop(*(u32 *)(ip - KFUNC_MD_INSN_OFFSET));
}

static inline void kfunc_md_arch_pretend(u8 *insn, u32 index)
{
*(u32 *)insn = index;
}

static inline void kfunc_md_arch_nops(u8 *insn)
{
*(u32 *)insn = aarch64_insn_gen_nop();
}

static inline int kfunc_md_arch_poke(void *ip, u8 *insn)
{
return aarch64_insn_patch_text_nosync(ip, *(u32 *)insn);
}
#endif
#endif /* ifndef __ASSEMBLY__ */

#ifndef __ASSEMBLY__
Expand Down
13 changes: 11 additions & 2 deletions arch/arm64/kernel/ftrace.c
Original file line number Diff line number Diff line change
Expand Up @@ -88,8 +88,10 @@ unsigned long ftrace_call_adjust(unsigned long addr)
* to `BL <caller>`, which is at `addr + 4` bytes in either case.
*
*/
if (!IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS))
return addr + AARCH64_INSN_SIZE;
if (!IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS)) {
addr += AARCH64_INSN_SIZE;
goto out;
}

/*
* When using patchable-function-entry with pre-function NOPs, addr is
Expand Down Expand Up @@ -139,6 +141,13 @@ unsigned long ftrace_call_adjust(unsigned long addr)

/* Skip the first NOP after function entry */
addr += AARCH64_INSN_SIZE;
out:
if (IS_ENABLED(CONFIG_FUNCTION_METADATA)) {
if (IS_ENABLED(CONFIG_FUNCTION_ALIGNMENT_8B))
addr += 2 * AARCH64_INSN_SIZE;
else
addr += AARCH64_INSN_SIZE;
}

return addr;
}
Expand Down
15 changes: 15 additions & 0 deletions arch/x86/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -2509,6 +2509,21 @@ config PREFIX_SYMBOLS
def_bool y
depends on CALL_PADDING && !CFI_CLANG

config FUNCTION_METADATA
bool "Per-function metadata storage support"
default y
select CALL_PADDING
help
Support per-function metadata storage for kernel functions, and
get the metadata of the function by its address with almost no
overhead.

The index of the metadata will be stored in the function padding
and consumes 5-bytes. The spare space of the padding is enough
with CALL_PADDING and FUNCTION_ALIGNMENT_16B if CALL_THUNKS or
CFI_CLANG not enabled. Otherwise, we need extra 5-bytes in the
function padding, which will increases text ~1%.

menuconfig CPU_MITIGATIONS
bool "Mitigations for CPU vulnerabilities"
default y
Expand Down
17 changes: 11 additions & 6 deletions arch/x86/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -240,13 +240,18 @@ ifdef CONFIG_MITIGATION_SLS
endif

ifdef CONFIG_CALL_PADDING
PADDING_CFLAGS := -fpatchable-function-entry=$(CONFIG_FUNCTION_PADDING_BYTES),$(CONFIG_FUNCTION_PADDING_BYTES)
KBUILD_CFLAGS += $(PADDING_CFLAGS)
export PADDING_CFLAGS
__padding_nops := $(CONFIG_FUNCTION_PADDING_BYTES)
ifneq ($(and $(CONFIG_FUNCTION_METADATA),$(CONFIG_CALL_THUNKS),$(CONFIG_CFI_CLANG)),)
__padding_nops := $(shell echo $(__padding_nops) + 5 | bc)
endif

PADDING_CFLAGS := -fpatchable-function-entry=$(__padding_nops),$(__padding_nops)
KBUILD_CFLAGS += $(PADDING_CFLAGS)
export PADDING_CFLAGS

PADDING_RUSTFLAGS := -Zpatchable-function-entry=$(CONFIG_FUNCTION_PADDING_BYTES),$(CONFIG_FUNCTION_PADDING_BYTES)
KBUILD_RUSTFLAGS += $(PADDING_RUSTFLAGS)
export PADDING_RUSTFLAGS
PADDING_RUSTFLAGS := -Zpatchable-function-entry=$(__padding_nops),$(__padding_nops)
KBUILD_RUSTFLAGS += $(PADDING_RUSTFLAGS)
export PADDING_RUSTFLAGS
endif

KBUILD_LDFLAGS += -m elf_$(UTS_MACHINE)
Expand Down
52 changes: 52 additions & 0 deletions arch/x86/include/asm/ftrace.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,26 @@

#include <asm/ptrace.h>

#ifdef CONFIG_FUNCTION_METADATA
#ifdef CONFIG_CFI_CLANG
#ifdef CONFIG_CALL_THUNKS
/* use the extra 5-bytes that we reserve */
#define KFUNC_MD_INSN_OFFSET (CONFIG_FUNCTION_PADDING_BYTES + 5)
#define KFUNC_MD_DATA_OFFSET (CONFIG_FUNCTION_PADDING_BYTES + 4)
#else
/* use the space that CALL_THUNKS suppose to use */
#define KFUNC_MD_INSN_OFFSET (5)
#define KFUNC_MD_DATA_OFFSET (4)
#endif
#else
/* use the space that CFI_CLANG suppose to use */
#define KFUNC_MD_INSN_OFFSET (CONFIG_FUNCTION_PADDING_BYTES)
#define KFUNC_MD_DATA_OFFSET (CONFIG_FUNCTION_PADDING_BYTES - 1)
#endif

#define KFUNC_MD_INSN_SIZE (5)
#endif

#ifdef CONFIG_FUNCTION_TRACER
#ifndef CC_USING_FENTRY
# error Compiler does not support fentry?
Expand Down Expand Up @@ -168,4 +188,36 @@ static inline bool arch_trace_is_compat_syscall(struct pt_regs *regs)
#endif /* !COMPILE_OFFSETS */
#endif /* !__ASSEMBLY__ */

#if !defined(__ASSEMBLY__) && defined(CONFIG_FUNCTION_METADATA)
#include <asm/text-patching.h>

static inline bool kfunc_md_arch_exist(void *ip)
{
return *(u8 *)(ip - KFUNC_MD_INSN_OFFSET) == 0xB8;
}

static inline void kfunc_md_arch_pretend(u8 *insn, u32 index)
{
*insn = 0xB8;
*(u32 *)(insn + 1) = index;
}

static inline void kfunc_md_arch_nops(u8 *insn)
{
*(insn++) = BYTES_NOP1;
*(insn++) = BYTES_NOP1;
*(insn++) = BYTES_NOP1;
*(insn++) = BYTES_NOP1;
*(insn++) = BYTES_NOP1;
}

static inline int kfunc_md_arch_poke(void *ip, u8 *insn)
{
text_poke(ip, insn, KFUNC_MD_INSN_SIZE);
text_poke_sync();
return 0;
}

#endif

#endif /* _ASM_X86_FTRACE_H */
25 changes: 25 additions & 0 deletions include/linux/kfunc_md.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_KFUNC_MD_H
#define _LINUX_KFUNC_MD_H

#include <linux/kernel.h>

struct kfunc_md {
int users;
/* we can use this field later, make sure it is 8-bytes aligned
* for now.
*/
int pad0;
void *func;
};

extern struct kfunc_md *kfunc_mds;

struct kfunc_md *kfunc_md_find(void *ip);
struct kfunc_md *kfunc_md_get(void *ip);
void kfunc_md_put(struct kfunc_md *meta);
void kfunc_md_put_by_ip(void *ip);
void kfunc_md_lock(void);
void kfunc_md_unlock(void);

#endif
1 change: 1 addition & 0 deletions kernel/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ obj-$(CONFIG_TRACE_CLOCK) += trace/
obj-$(CONFIG_RING_BUFFER) += trace/
obj-$(CONFIG_TRACEPOINTS) += trace/
obj-$(CONFIG_RETHOOK) += trace/
obj-$(CONFIG_FUNCTION_METADATA) += trace/
obj-$(CONFIG_IRQ_WORK) += irq_work.o
obj-$(CONFIG_CPU_PM) += cpu_pm.o
obj-$(CONFIG_BPF) += bpf/
Expand Down
1 change: 1 addition & 0 deletions kernel/trace/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ obj-$(CONFIG_FTRACE_RECORD_RECURSION) += trace_recursion_record.o
obj-$(CONFIG_FPROBE) += fprobe.o
obj-$(CONFIG_RETHOOK) += rethook.o
obj-$(CONFIG_FPROBE_EVENTS) += trace_fprobe.o
obj-$(CONFIG_FUNCTION_METADATA) += kfunc_md.o

obj-$(CONFIG_TRACEPOINT_BENCHMARK) += trace_benchmark.o
obj-$(CONFIG_RV) += rv/
Expand Down
Loading

0 comments on commit dbc51df

Please sign in to comment.