-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory: Swap probe_read to kernel or user version #2213
base: main
Are you sure you want to change the base?
Conversation
8c89a01
to
321abe4
Compare
In general this seems the right direction. Does BTF report if its a user ptr or not? The kernel annotates these so we could learn it from the BTF type I think. Might need some BTF extensions to do I'm not sure off-hand. Looks like it would require some BTF updates, but likely is the right solution IMO. |
321abe4
to
885041d
Compare
✅ Deploy Preview for tetragon ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
a1da1b6
to
6484d96
Compare
aafefc2
to
c297931
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general idea makes sense to me but there are lot of changes in this PR and it's difficult to review it. Can you please split it into multiple patches and add commit messages on why each change is needed?
I would also like to avoid introducing yet another variant of our object files if we can avoid it.
2b81a3c
to
78ea803
Compare
When we swap all probe_read()s to probe_read_kernel() or probe_read_user() (ditto for _str) users will need to be able to specify if an argument is userspace data or not. Signed-off-by: Kevin Sheldrake <[email protected]>
Add the UserspaceData flag to the meta value, taking into account the default value if it is missing from the arg in the policy. Modify how tracepoints use the meta value to permit adding this flag. Signed-off-by: Kevin Sheldrake <[email protected]>
78ea803
to
b6c6b94
Compare
We should always use the probe_read_kernel or probe_read_user helpers over the probe_read helper (ditto for _str versions). This commit changes most probe_read to either probe_read_kernel or probe_read_user (ditto for _str versions). Signed-off-by: Kevin Sheldrake <[email protected]>
Some calls to probe_read could be reading either kernel or user memory. Add some helpers that take a userspace boolean to determine which type of memory to read. Also introduce a masked version that masks the size variable with a given mask immediately before (using inline asm) the probe_read_*. This will prevent the test for the userspace boolean spilling our registers after the masking and so will retain their bounds. Signed-off-by: Kevin Sheldrake <[email protected]>
We copy bpf_core_read.h from libbpf. We now need to copy it again to take in updates that let us use probe_read helpers with CORE. This will fail our static tests because they warn about register reuse. While this is indeed a potential problem, it is unlikely to affect us in reality. It is better to keep bpf_core_read.h the same as the upstream version than to fix it locally (which is tricky). We could potentially attempt to fix the warnings and upstream the changes, but this is outside the scope of this PR. Hence, accept the warnings. Changes introduced by the updated bpf_core_read.h require macros for bpf_probe_read_kernel and bpf_probe_read_user. These were introduced in the previous commit in the new probe_read_kernel_or_user.h. This commit adds that include to source files that require access to these macros. Signed-off-by: Kevin Sheldrake <[email protected]>
The argument meta value indicates (among others) whether an argument is in userspace memory or in kernel memory. This generally works for kprobes, tracepoints and uprobes, but raw_syscall tracepoints are slightly different. The raw_syscall argument (an array of uint64) exists in kernel memory but the buffers the pointers reference might be userspace. This commit adds a raw_syscall flag to indicate to the BPF programs that the pointers are in kernel memory even if the buffers they point to are in userspace. Signed-off-by: Kevin Sheldrake <[email protected]>
Having introduced helpers that can read kernel or user memory depending on a boolean, and having also provided information over location of buffers, including in raw_syscalls, to the BPF programs, this commit uses the probe_read_kernel_or_user helpers to read the memory from the correct location. Signed-off-by: Kevin Sheldrake <[email protected]>
The key for the retprobe map includes the pid_tgid, but in cases where this is not available, it uses the FP register in its place. The retprobe_map_get_key function is used to get this ID. This commit changes the filter_args_reject function to use retprobe_map_get_key instead of just using the pid_tgid. Signed-off-by: Kevin Sheldrake <[email protected]>
The return arguments can be buffers and these can reside in kernel or userspace. This commit adds meta information to the return arguments (both function return arg and any params that should be read on return) so that the BPF programs know whether the data resides in kernel or user memory. Signed-off-by: Kevin Sheldrake <[email protected]>
The addition of code to optionally read memory from kernel or user addresses caused the retkprobe program to exceed the 4096 instruction limit required by some older kernels. This commit divides one retkprobe program into two and connects them together with a tailcal. Signed-off-by: Kevin Sheldrake <[email protected]>
In some raw_syscalls policies the selector indices are set to the list index of the relevant argument instead of the actual index of the relevant argument. This usually doesn't break anything because these are the actual indices that are used in the config with the BPF programs. There is a potential for error, however, as the code tries to match the selector index number with an argument's index number; on a match it rewrites the selector index with the argument's list index. Therefore if the first argument to a function isn't listed in the args section then an explicit list index in the selectors section could reference the wrong argument. In order to discourage the direct use of list indices, the examples and test code have been changed so the selectors use the arg index and not its list index. Signed-off-by: Kevin Sheldrake <[email protected]>
b6c6b94
to
179404a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for splitting the PR. FWIW, I still find it hard to navigate, but it's definitely better.
I'm adding some first comments. I did not go over the full PR and plan to do another pass soon.
static inline __attribute__((always_inline)) bool | ||
is_userspace_data(unsigned long argm) | ||
{ | ||
return (argm & ARGM_USERSPACE_DATA) != 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, just use argm & ARGM_USERSPACE_DATA
. No need for != 0
|
||
if arg.SizeArgIndex > 0 { | ||
if arg.SizeArgIndex > 15 { | ||
return 0, fmt.Errorf("invalid SizeArgIndex value (>15): %v", arg.SizeArgIndex) | ||
} | ||
meta = int(arg.SizeArgIndex) | ||
meta = meta | int(arg.SizeArgIndex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: meta |= ...
if arg.IsUserspaceData == nil { | ||
// If not set in policy, use the default. | ||
if userspaceDataDefault { | ||
meta = meta | argUserspaceDataBit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: meta |= ...
} else { | ||
// Otherwise, use the provided value. | ||
if *arg.IsUserspaceData { | ||
meta = meta | argUserspaceDataBit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: meta |= ...
} | ||
// set MetaArg's upper half-word equal to the number of bytes we need to copy | ||
out.MetaArg = out.MetaArg & 0xffff | ||
out.MetaArg = out.MetaArg | (nbytes << 16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a check and return an error if nbytes is cannot fit in the number of bits we have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did another pass of the patches I had not reviewed before. Thanks!
@@ -34,6 +34,7 @@ const ( | |||
argReturnCopyBit = 1 << 4 | |||
argMaxDataBit = 1 << 5 | |||
argUserspaceDataBit = 1 << 6 | |||
argRawSyscallsBit = 1 << 7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a different flag?
From what I can tell from the code the flag is used in
tetragon/bpf/process/types/basic.h
Lines 2697 to 2702 in 179404a
// the const_buf_type that represents an array of arguments for raw_syscalls | |
// is special, as the array contents are in kernel memory, but they point | |
// to userspace memory. In this case, they will be marked as userspace, but | |
// we actually want to read kernel memory. The raw_syscalls bit of the meta | |
// value indicates when the arg is special in this way. | |
probe_read_kernel_or_user_masked(args, size, 0x3ff, (char *)arg, userspace && !raw_syscalls); |
And set in
metaTp, err := getMetaValue(specArg, syscall, rawSyscalls && specArg.Index == 5 && (specArg.Type == "")) |
Cant' we just do
userspaceDataDefault = syscall && !(rawSyscalls && specArg.Index == 5 && (specArg.Type == ""))
getMetaValue(arg, userSpaceDataDefault)
And use only one flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to prepare for being able to filter on arguments in raw_syscalls, and therefore arrays of arguments which could potentially be pointers. I thought it was a bit much to include that in this PR, but I was already thinking ahead for a following PR. Can do as you suggest if you prefer though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do as you suggest if you prefer though.
Thanks! I would prefer to keep the code as simple as possible, even if we end up extending this for filtering.
I've also thought about filtering raw_syscalls
.
One way would be to do as you suggest, and use the arguments but this seems tricky because we need to pass types for everything.
I think a better approach would be to do two-level filtering where we in raw_syscall
event we to filter the type of the syscall and then we enable hooks on syscalls/ for filtering on the arguments. This seems simpler to me and it re-uses the arg filtering we already have. We would need to have a way to pass information from the raw_syscall
hook to the specific syscall hook, but we already have something similar Jiri's notify enforcer action.
In any case, that's for another PR.
We should always use the probe_read_kernel or probe_read_user helpers over the probe_read helper (ditto for _str versions).
This series changes all probe_read to either probe_read_kernel or probe_read_user (ditto for _str versions).
Note: bpf/libbpf/bpf_core_read.h was updated. It fails some of our static tests which I think can be ignored (the last one did too – it stops checking if the file isn't in the PR).