-
Notifications
You must be signed in to change notification settings - Fork 302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aya: fix tc name limit #728
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for aya-rs-docs ready!Built without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify site configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please squash these commits.
use aya_bpf::{macros::classifier, programs::TcContext}; | ||
|
||
// This macro generates a function with arbitrary name | ||
macro_rules! generate_ebpf_function { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this macro needed? is this something forward-looking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the name of that function is the name of the ebpf program and we are trying to use a 256-byte-long name, the macro is just a way to avoid writing this:
#[classifier]
pub fn a..[256 times]..a(_ctx: TcContext) -> i32 {
0
}
I wonder if there is a way to dynamically generate that name inside the macro in this #![no_std] #![no_main]
environment, like what we would do in userspace side: "a".repeat(256)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you still write it =/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just found it cleaner to pass 256 'a's to a macro than writing a function with that name directly. If that's an issue, I can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tamird Macro was replaced by just a function.
// The buffer for attributes should be sized to hold at least 256 bytes, | ||
// based on `CLS_BPF_NAME_LEN = 256` from the kernel: | ||
// https://github.com/torvalds/linux/blob/02aee814/net/sched/cls_bpf.c#L28 | ||
attrs: [u8; 512], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at least 256, but then why 512?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we currently use around ~30 bytes of attributes in addition to name.
Rather than picking a "right sized buffer" for the payload (which is of varying length anyway) we use the next largest power of 2.
I've got netlink changes planned soon, so will look at making the buffer size variable then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the comment say that please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will update the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tamird Updated the comment.
@@ -13,6 +15,27 @@ use aya::{ | |||
const MAX_RETRIES: usize = 100; | |||
const RETRY_DURATION: time::Duration = time::Duration::from_millis(10); | |||
|
|||
#[test] | |||
fn tc_name_limit() { | |||
let _ = qdisc_add_clsact("lo"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it ok to drop this value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the clsact
qdisc attached to the interface before we can proceed, Actually a better approach would be a function that checks that so we can call qdisc_add_clsact("lo")
only when that interface doesn't have that qdisc.
I can add that and update the code...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a different type of probe we can use that doesn't have this requirement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tamird I have added a function to check if clsact
qdisc exists before trying to actually add it.
Now the pipeline is failing due to the change to the aya's public api.
I'd really appreciate it if you review the code and help me on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a different type of probe we can use that doesn't have this requirement?
By requirement, Do you mean the need to call qdisc_add_clsact("lo")
?
If yes, I don't think so.
As far as I know we can either use ingress
or clsact
if attaching to ingress path and to attach an ebpf program to egress path, the only option is to use clsact
:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1f211a1b929c804100e138c5d3d656992cfd5622
256 is the maximum length allowed by the kernel: | ||
https://github.com/torvalds/linux/blob/02aee814/net/sched/cls_bpf.c#L28 | ||
*/ | ||
generate_ebpf_function!(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if the program name is 257 bytes long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember the exact error right now but the kernel rejects that ebpf program and returns an error.
I can send the exact error in a few hours.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tamird Sorry for the delay
This is the error you get when the name is more that 256 bytes:
Error: netlink error while attaching ebpf program to tc
Caused by:
Invalid argument (os error 22)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we test that as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea, Macro is probably more useful now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tamird An integration test was added to make sure programs with a name longer than 256 bytes will fail to attach.
I'm happy with the change to up the buffer to the next nearest power of 2. |
@@ -13,6 +15,27 @@ use aya::{ | |||
const MAX_RETRIES: usize = 100; | |||
const RETRY_DURATION: time::Duration = time::Duration::from_millis(10); | |||
|
|||
#[test] | |||
fn tc_name_limit() { | |||
let _ = qdisc_add_clsact("lo"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a different type of probe we can use that doesn't have this requirement?
256 is the maximum length allowed by the kernel: | ||
https://github.com/torvalds/linux/blob/02aee814/net/sched/cls_bpf.c#L28 | ||
*/ | ||
generate_ebpf_function!(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we test that as well?
aya/src/sys/netlink.rs
Outdated
collections::HashMap, | ||
ffi::CStr, | ||
io, | ||
mem::{self}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{self} is redundant
// The buffer for attributes should be sized to hold at least 256 bytes, | ||
// based on `CLS_BPF_NAME_LEN = 256` from the kernel: | ||
// https://github.com/torvalds/linux/blob/02aee814/net/sched/cls_bpf.c#L28 | ||
attrs: [u8; 512], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the comment say that please?
use aya_bpf::{macros::classifier, programs::TcContext}; | ||
|
||
// This macro generates a function with arbitrary name | ||
macro_rules! generate_ebpf_function { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you still write it =/
|
||
let mut bpf = Bpf::load(crate::TC_NAME_LIMIT_TEST).unwrap(); | ||
|
||
// This magic number (256) is derived from the fact that the kernel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment is somewhat misplaced; this 256 is just defined by the fact that this is the name used in the probe code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to ensure that Aya does not impose a limit and allows names up to the size allowed by the kernel.
I'll add this sentence as well to be more specific about the goal.
@pooladkhay, this pull request is now in conflict and requires a rebase. |
@pooladkhay ping? |
@alessandrod pong! 😃 Life hasn't been easy lately, I should have notified earlier. |
hugs ❤️ and sorry if I made you feel rushed it wasn't my intention |
No worries at all 🙋♂️ |
Hey @alessandrod, this pull request changes the Aya Public API and requires your review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 5 files at r3, 1 of 2 files at r4, 6 of 6 files at r5, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @alessandrod, @dave-tucker, and @pooladkhay)
-- commits
line 2 at r5:
can you please squash these commits? I don't understand why 12 commits are needed.
aya/src/sys/netlink.rs
line 83 at r5 (raw file):
} pub(crate) unsafe fn netlink_clsact_qdisc_exists(if_index: i32) -> Result<bool, io::Error> {
i'm not familiar enough with netlink, @dave-tucker should review this.
test/integration-ebpf/src/tc_name_limit.rs
line 6 at r5 (raw file):
use aya_bpf::{macros::classifier, programs::TcContext}; /*
I don't think we use this comment style in this project. Can you please use // ....
style instead?
test/integration-ebpf/src/tc_name_limit_exceeded.rs
line 6 at r5 (raw file):
use aya_bpf::{macros::classifier, programs::TcContext}; /*
ditto
test/integration-test/src/tests/load.rs
line 27 at r2 (raw file):
Previously, pooladkhay (Mohammad Javad Pooladkhay) wrote…
The idea is to ensure that Aya does not impose a limit and allows names up to the size allowed by the kernel.
I'll add this sentence as well to be more specific about the goal.
I understand. I'm just saying that this repeats the comment that is present in the probe code. Here we just need to match the name of the probe. I think this comment can just be removed?
test/integration-test/src/tests/load.rs
line 65 at r5 (raw file):
let mut bpf = Bpf::load(crate::TC_NAME_LIMIT_EXCEEDED_TEST).unwrap(); /*
ditto about this comment
test/integration-test/src/tests/load.rs
line 87 at r5 (raw file):
// as a result, all arms except the desired Err() should be unreachable!(). match program.attach("lo", TcAttachType::Ingress) { Ok(_) => unreachable!(),
unreachable? this is just a test failure, isn't it? can you instead write:
assert_matches!(
program.attach("lo", TcAttachType::Ingress),
Err(ProgramError::TcError(TcError::NetlinkError { io_error })) => {
assert_eq!(io_error.raw_or_error(), Some(22))
}
);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @alessandrod, @dave-tucker, and @pooladkhay)
test/integration-test/src/tests/load.rs
line 19 at r2 (raw file):
#[test] fn tc_name_limit() {
Looks like these tests are failing:
failures:
---- tests::load::tc_name_limit stdout ----
thread 'tests::load::tc_name_limit' panicked at test/integration-test/src/tests/load.rs:31:32:
called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---- tests::load::tc_name_limit_exceeded stdout ----
thread 'tests::load::tc_name_limit_exceeded' panicked at test/integration-test/src/tests/load.rs:60:32:
called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }
It's the qdisc_add_clsact("lo").unwrap();
calls that seem to be failing.
The buffer for attributes should be sized to hold at least 256 bytes, based on `CLS_BPF_NAME_LEN = 256` from the kernel: https://github.com/torvalds/linux/blob/02aee814/net/sched/cls_bpf.c#L28 Fixes: aya-rs#610 aya: add netlink_clsact_qdisc_exists function public-api: add clsact_qdisc_exists integration-test: add tc_name_limit integration-test: add tc_name_limit_exceeded Signed-off-by: Mohammad Javad Pooladkhay <[email protected]>
Hey @alessandrod, this pull request changes the Aya Public API and requires your review. |
That's true, Is it possible that the loop back interface which I'm trying to attach an eBPF program to, is not available on that image? (hence the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r6, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @alessandrod, @dave-tucker, and @pooladkhay)
test/integration-test/src/tests/load.rs
line 19 at r2 (raw file):
Previously, pooladkhay (Mohammad Javad Pooladkhay) wrote…
That's true,
Two commands are being used to run the tests:
cargo xtask integration-test local
which succeeds, andcargo xtask integration-test vm <KERNEL_IMAGE>
which is the faulty one. Running it on my local machine fails too.Is it possible that the loop back interface which I'm trying to attach an eBPF program to, is not available on that image? (hence the
Os { code: 2, kind: NotFound, message: "No such file or directory" }
error)
Yep, seems likely that it is not available. Can we create a network namespace so that we aren't reliant on ambient resources?
Could you please give me more hints, and maybe an example? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @alessandrod, @dave-tucker, and @pooladkhay)
test/integration-test/src/tests/load.rs
line 19 at r2 (raw file):
Previously, pooladkhay (Mohammad Javad Pooladkhay) wrote…
Could you please give me more hints, and maybe an example?
See
aya/test/integration-test/src/tests/xdp.rs
Line 137 in b176967
let _netns = NetNsGuard::new(); |
I tried that. Namespace is created but the test still fails with the same error:
I wonder what is the purpose of this step in testing? I took a look and the CI file and realised Or maybe I'm missing something here... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @alessandrod, @dave-tucker, and @pooladkhay)
test/integration-test/src/tests/load.rs
line 19 at r2 (raw file):
Previously, pooladkhay (Mohammad Javad Pooladkhay) wrote…
I tried that. Namespace is created but the test still fails with the same error:
test tests::load::tc_name_limit ... Entered network namespace aya-test-168-0 thread 'tests::load::tc_name_limit' panicked at test/integration-test/src/tests/load.rs:623:32: called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" } note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace Exited network namespace aya-test-168-0 FAILED
I wonder what is the purpose of this step in testing?
What is the main difference betweencargo xtask integration-test vm
andcargo xtask integration-test local
if the tests are running on a linux machine?
Is this related to kernel versions? maybe being able to test on a specific version?I took a look and the CI file and realised
runs-on: ubuntu-22.04
is set at the top but there are also checks like this one:if: runner.os == 'macOS'
.
Considering those checks checkingrunner.os
, maybecargo xtask integration-test vm
is used in a scenario where the runner is not a linux image?
If that's the case, I'm unable to make sense of it because ofruns-on: ubuntu-22.04
.Or maybe I'm missing something here...
Yes, this is both for testing on non-Linux and for future testing on different kernel versions.
As for the error you're seeing, I don't know how to help you. It seems we explicitly set up a loopback device in the network namespace:
aya/test/integration-test/src/utils.rs
Lines 41 to 53 in 41351ec
let lo = CString::new("lo").unwrap(); | |
unsafe { | |
let idx = if_nametoindex(lo.as_ptr()); | |
if idx == 0 { | |
panic!( | |
"Interface `lo` not found in netns {}: {}", | |
netns.name, | |
io::Error::last_os_error() | |
); | |
} | |
netlink_set_link_up(idx as i32) | |
.unwrap_or_else(|e| panic!("Failed to set `lo` up in netns {}: {e}", netns.name)); | |
} |
Hey @alessandrod, this pull request changes the Aya Public API and requires your review. |
The buffer for attributes should be sized to hold at least 256 bytes, based on
CLS_BPF_NAME_LEN = 256
from the kernel:https://github.com/torvalds/linux/blob/02aee814/net/sched/cls_bpf.c#L28
This pull request also includes integration tests with an eBPF program of type classifier with a 256-byte-long name.
Fixes: #610