-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error handling improvements #159
Comments
See https://github.com/libbpf/libbpf/wiki/Libbpf-1.0-migration-guide for context. Related to aquasecurity#159
See https://github.com/libbpf/libbpf/wiki/Libbpf-1.0-migration-guide for context. Related to aquasecurity#159
See https://github.com/libbpf/libbpf/wiki/Libbpf-1.0-migration-guide for context. Related to aquasecurity#159
See https://github.com/libbpf/libbpf/wiki/Libbpf-1.0-migration-guide for context. Related to aquasecurity#159
See https://github.com/libbpf/libbpf/wiki/Libbpf-1.0-migration-guide for context. Related to aquasecurity#159
See https://github.com/libbpf/libbpf/wiki/Libbpf-1.0-migration-guide for context. Related to #159
This PR is based off [@kakkoyun's work](parca-dev#326) to use libbpf(go)'s batch APIs. **Context** The main issue we found while working with this API was that they were erroring with `EPERM`. After some debugging, we realised that libbpf wasn't handle with errors in the way we expected. The debugging write-up and more context can be found [here](aquasecurity/libbpfgo#159), and the fix is in [this PR](aquasecurity/libbpfgo#157). After landing these changes upstream, we pointed to the updated libbpfgo version, as well as added some [regression tests](parca-dev#381) to ensure that libbpfgo behaves as expected, and to make it easier in the future to write further compatibility tests. Note: the rest of the batch APIs error handling is still unfixed. Tracking in aquasecurity/libbpfgo#162.
This PR is based off [@kakkoyun's work](parca-dev#326) to use libbpf(go)'s batch APIs. **Context** The main issue we found while working with this API was that they were erroring with `EPERM`. After some debugging, we realised that libbpf wasn't handle with errors in the way we expected. The debugging write-up and more context can be found [here](aquasecurity/libbpfgo#159), and the fix is in [this PR](aquasecurity/libbpfgo#157). After landing these changes upstream, we pointed to the updated libbpfgo version, as well as added some [regression tests](parca-dev#381) to ensure that libbpfgo behaves as expected, and to make it easier in the future to write further compatibility tests. Note: the rest of the batch APIs error handling is still unfixed. Tracking in aquasecurity/libbpfgo#162.
This PR is based off [@kakkoyun's work](parca-dev#326) to use libbpf(go)'s batch APIs. **Context** The main issue we found while working with this API was that they were erroring with `EPERM`. After some debugging, we realised that libbpf wasn't handle with errors in the way we expected. The debugging write-up and more context can be found [here](aquasecurity/libbpfgo#159), and the fix is in [this PR](aquasecurity/libbpfgo#157). After landing these changes upstream, we pointed to the updated libbpfgo version, as well as added some [regression tests](parca-dev#381) to ensure that libbpfgo behaves as expected, and to make it easier in the future to write further compatibility tests. Note: the rest of the batch APIs error handling is still unfixed. Tracking in aquasecurity/libbpfgo#162.
This PR is based off [@kakkoyun's work](parca-dev#326) to use libbpf(go)'s batch APIs. **Context** The main issue we found while working with this API was that they were erroring with `EPERM`. After some debugging, we realised that libbpf wasn't handle with errors in the way we expected. The debugging write-up and more context can be found [here](aquasecurity/libbpfgo#159), and the fix is in [this PR](aquasecurity/libbpfgo#157). After landing these changes upstream, we pointed to the updated libbpfgo version, as well as added some [regression tests](parca-dev#381) to ensure that libbpfgo behaves as expected, and to make it easier in the future to write further compatibility tests. Note: the rest of the batch APIs error handling is still unfixed. Tracking in aquasecurity/libbpfgo#162.
This PR is based off [@kakkoyun's work](parca-dev#326) to use libbpf(go)'s batch APIs. **Context** The main issue we found while working with this API was that they were erroring with `EPERM`. After some debugging, we realised that libbpf wasn't handling errors in the way we expected. The debugging write-up and more context can be found [here](aquasecurity/libbpfgo#159), and the fix is in [this PR](aquasecurity/libbpfgo#157). After landing these changes upstream, we pointed to the updated libbpfgo version, as well as added some [regression tests](parca-dev#381) to ensure that libbpfgo behaves as expected, and to make it easier in the future to write further compatibility tests. Note: the rest of the batch APIs error handling is still unfixed. Tracking in aquasecurity/libbpfgo#162.
@javierhonduco on my last efforts I've faced that when using Lines 418 to 426 in 99b7ef7
As it was not the focus, I left it to analyse later. @josedonizetti I think we need to standardize all cgo returns by getting the |
Actually the Regarding the error handling, we're based on libbpf 1.0 API already and it enforced the strict mode since libbpf/libbpf@c261131. The current behaviour is: The err number is always being returned as the main CGO return but from I've been trying to standardize that by multiple dependent PRs:
If they get merged, let's close this. |
On a GetMapInfoByFD() failure, the comment was wrong. See: aquasecurity#159 (comment)
On a GetMapInfoByFD() failure, the comment was wrong. See: #159 (comment)
👋 . Been using libbpfgo recently and found some problems with the way errors are being handled. Would love to hear your opinion and comments on this subject 😄 .
Context
While helping @kakkoyun debug #147 to make our map retrieval and deletion path more efficient, we were puzzled as the kernel did not seem to return
-EPERM
in the code path we were executing.This can be seen with this bpftrace script:
As we can see here, the "raw" (without any massaging of the return code by libc) BPF syscall is returning
-2
(ENOENT
), while libbpf is seeing-1
(EPERM
). Why is this happening?libc's
syscall(2)
, which libbpf calls here changes the return code to-1
on error, and setserrno
to the actual return value of the "raw" syscall.The
syscall(2)
manpage:🤔 So it seems we thought we were getting
EPERM
when in reality, it was "just"-1
indicating that the function failed (*) and that we had to checkerrno
to see what the actual error is. Interpreting the return value aserrno
is not correct here.As far as I understand, cgo calls have two return values, the return code, and
errno
(https://pkg.go.dev/cmd/cgo). In several of libbpfgo's calls to libbpf, only one return value is checked, which if I am not mistaken, it would be the return code of the function.How this is affecting libbpfs APIs right now
Besides the batch APIs not returning the data correctly, even though we might have processed it just fine, this is also affecting some other APIs.
Let's take a look at this fragment extracted from our codebase. We want to delete keys from a map, but ignore if the key is non-existent. After taking a look at the bpf manpage we saw that
ENOENT
means exactly this, so let's ignore it and fail otherwise.Unfortunately, this code isn't correct and will return with an error, not just when the key doesn't exist. As
DeleteKey
is just returning one argument, the return value is-1
which indicates a generic failure. We are interpreting it as if it were anerrno
value (EPERM
), which is not correct.By reading
errno
and passing it to the callee, we would know which error this is and be able to handle it accordingly. A possible fix forDeleteKey
could be:The road to Libbpf 1.0
Another possibility to fix this is to turn on
LIBBPF_STRICT_DIRECT_ERRS
(https://github.com/libbpf/libbpf/blob/b69f8ee93ef6aa3518f8fbfd9d1df6c2c84fd08f/src/libbpf_internal.h#L492) withlibbpf_set_strict_mode
as described in https://github.com/libbpf/libbpf/wiki/Libbpf-1.0-migration-guide.While this might fix the problems we are experiencing, perhaps it would be a bit of a big change that could potentially break things, so careful testing should be done. For example, if we are currently checking for -1 for errors, we should check for negative values instead.
At the same time, at some point, the v1.0 migration will happen, so maybe it could be a good excuse to start migrating to the new behavior?
Related issues / PRs
Thanks!! Looking forward to your thoughts!
cc/ @kakkoyun @v-thakkar @Sylfrena
(*) There's one exception that I am aware of, batch APIs, which might return
-1
, but depending on theerrno
value we might have achieved a partial success. More context on #157.The text was updated successfully, but these errors were encountered: