Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace scapy as the default choice in the PTF framework with alternative Apache-compatible #36

Open
fruffy opened this issue Feb 7, 2025 · 12 comments
Labels
mediumtask A task that appears to require a medium level of work

Comments

@fruffy
Copy link
Collaborator

fruffy commented Feb 7, 2025

Some context why this is desired: p4lang/ptf#120

Effectively, scapy has a GPL license, which is incompatible with the Apache 2.0 license P4 is using. Ideally, we should use a Apache-compatible packet sniffing framework as the default. The PTF framework has already been extended to provide support for different kind of packet frameworks. We just need to add one.

https://github.com/open-traffic-generator/snappi could be a viable alternative.

@chrispsommers how much of an drop-in replacement is snappi for scapy? What are the trade-offs? Do you know by any chance?

@fruffy fruffy added the mediumtask A task that appears to require a medium level of work label Feb 7, 2025
@jafingerhut
Copy link
Collaborator

jafingerhut commented Feb 7, 2025

I am not sure I agree with the statement "we should use a Apache-compatible packet sniffing framework as the default", given how little p4lang projects use scapy right now. The benefit to the project seems small to me, so if the time required is even "only" 16 person-hours of work, imagine what those 16 person-hours might achieve if spent working on a project that actually adds new features.

But, if it is the case that people would strongly prefer that and want to spend time on it, I will at least mention the Python package 'bf-pktpy' that Intel created in 2021, as an attempt to avoid using Scapy.

It is not a drop-in replacement for all of Scapy. If I recall correctly, it can be a drop-in replacement for the small subset of Scapy that was used by Intel Tofino's P4Studio test suite.

I do not know if the subset of Scapy functionality that it implements is large enough for the way p4lang projects use scapy.

https://github.com/p4lang/open-p4studio/tree/main/pkgsrc/ptf-modules/bf-pktpy

@fruffy
Copy link
Collaborator Author

fruffy commented Feb 9, 2025

I am not sure I agree with the statement "we should use a Apache-compatible packet sniffing framework as the default", given how little p4lang projects use scapy right now. The benefit to the project seems small to me, so if the time required is even "only" 16 person-hours of work, imagine what those 16 person-hours might achieve if spent working on a project that actually adds new features.

Well, we will effectively avoid tainting all of PTF scripts as we do now. And it would reduce the GPL notices we have to make. I consider that a significant win.

But, if it is the case that people would strongly prefer that and want to spend time on it, I will at least mention the Python package 'bf-pktpy' that Intel created in 2021, as an attempt to avoid using Scapy.

I am not sure whether we should use a homegrown packet library like that. We could consider adding it to the PTF framework now that it is open-source.

@jafingerhut
Copy link
Collaborator

jafingerhut commented Feb 9, 2025

I am not sure I agree with the statement "we should use a Apache-compatible packet sniffing framework as the default", given how little p4lang projects use scapy right now. The benefit to the project seems small to me, so if the time required is even "only" 16 person-hours of work, imagine what those 16 person-hours might achieve if spent working on a project that actually adds new features.

Well, we will effectively avoid tainting all of PTF scripts as we do now. And it would reduce the GPL notices we have to make. I consider that a significant win.

There are many PTF scripts in the open-p4studio repository. None of them need to be GPL, because they do not import scapy. They indirectly import bf_pktpy, via ptf. Or a user can choose to use scapy instead of bf_pktpy, but the effort that Intel went to means that they can release their PTF scripts as Apache-2.0. They do not have access to the full scapy functionality, only the subset implemented in bf_pktpy.

@fruffy
Copy link
Collaborator Author

fruffy commented Feb 9, 2025

There are many PTF scripts in the open-p4studio repository. None of them need to be GPL, because they do not import scapy. They indirectly import bf_pktpy, via ptf. Or a user can choose to use scapy instead of bf_pktpy, but the effort that Intel went to means that they can release their PTF scripts as Apache-2.0. They do not have access to the full scapy functionality, only the subset implemented in bf_pktpy.

Yes, but, as is, PTF is set to use scapy by default which means all the PTF scripts in P4C need to be GPL-licensed. This also applies to anyone else. We could get rid of this headache by making scapy non-default.

@jafingerhut
Copy link
Collaborator

FYI, as part of my worry about this proposal, and the depth of the rabbit hole one could end up going down:

Some statistics about the Scapy repository, as of a recent release version 2.6.0 (I'm just using wc here to get line counts, not something that ignores comments and blank lines like sloc, to give quick estimates):

  • 6111 commits, spanning from 2003 to 2024
  • 226,039 lines of Python source files with .py suffix
  • 81,590 lines of what I think are test files written in Python, with .uts suffix
  • 10,929 lines of .rst documentation files

Some corresponding statistics about the p4c repository, as of today:

  • 3,667 commits, spanning from 2016 to 2025
  • 536,543 C/C++ lines (358,887 .cpp + 31,073 .c + 146,583 .h)
  • 390,133 test programs and packets (385,342 .p4 + 4,791 .st), with a lot of copy and paste between .p4 test programs
  • 41,411 lines of .py source files, but I'd guess at least 50% of that was copied from one back end test driver to another

So as a very rough estimate, reproducing scapy from scratch, if done completely without reference to its code or any learning from it, seems like it is on the order of about 32% of the lines of code as the current p4c. Even cutting that in half because you can refer to current Scapy documentation and try to implement things from a working system's behavior, that is still about 1/6 of the effort used in creating today's p4c repository.

That is not a summer project for one intern. It is something like 5 person-years of effort, minimum.

If I go to the directory open-p4studio/pkgsrc/ptf-modules/bf-pktpy in the open-p4studio repository, I can't get a commit log history, since it was added in its latest state without that history.

  • 10,066 lines of .py source files

I am guessing that is something like 5% of Scapy's functionality, but it is probably 80-90% of the functionality that most P4 deveopers typically use. I suspect the main reason Scapy is so big is that its kitchen sink of already-implemented header formats is very wide and deep.

@jafingerhut
Copy link
Collaborator

There are many PTF scripts in the open-p4studio repository. None of them need to be GPL, because they do not import scapy. They indirectly import bf_pktpy, via ptf. Or a user can choose to use scapy instead of bf_pktpy, but the effort that Intel went to means that they can release their PTF scripts as Apache-2.0. They do not have access to the full scapy functionality, only the subset implemented in bf_pktpy.

Yes, but, as is, PTF is set to use scapy by default which means all the PTF scripts in P4C need to be GPL-licensed. This also applies to anyone else. We could get rid of this headache by making scapy non-default.

If someone wants to try replacing scapy with bf_pktpy in the p4c repository, and finding out what bf_pktpy might be missing today and estimate how much work it would take to add to bf_pktpy the things p4c tests need that is missing, that sounds like a summer intern project to me. Maybe even taking a stab at implementing some of that missing functionality, too. Maybe even completing that work. As of now, I don't have a good estimate of how much is missing from bf-pktpy that p4c tests might use.

@jafingerhut
Copy link
Collaborator

Sorry for all the noise on this issue, but note that there are 3 Python tests using the ptf library in p4c/testdata/p4_16_samples today:

p4_16_samples/pna-dpdk-add_on_miss0.py
p4_16_samples/pna-dpdk-small_sample.py
p4_16_samples/ternary2-bmv2.py

They all have these things in common:

  • They import the ptf package
  • They do not import the scapy package
  • Today, if p4c's CI is running these PTF tests (I have not confirmed that), it is probably doing that by having ptf import the scapy package.
  • I have not tried this, but it might very well be the case that if you run these PTF tests and have ptf import bf_pktpy instead of scapy, they might just work. None of these tests construct any packets by any means other than (a) raw sequences of bytes in the Python source code, probably created by p4testgen, and (b) calls to PTF functions like simple_tcp_packet, which Intel implemented using their bf_pktpy package.

@jafingerhut
Copy link
Collaborator

jafingerhut commented Feb 9, 2025

Wow. I was looking through the code of the ptf package itself, which was copied from a library called oftest in 2015 to start off, and one of the files copied in 2015, and still in ptf today, is this one:

So even though ptf has since 2021 been modified so it can use the user's choice of bf_pktpy (Apache license) or scapy (GPLv2), at the user's choice, and claimed to have been under the Apache 2.0 license, until we replace at least the code linked above, and any other parts of ptf that are derived from GPL'd code, ptf as a whole must be GPLv2 as well.

In looking further, this source file netutils.py is pretty short, and no other files in the ptf library are licensed under the GPL, so replacing this one file with a fresh implementation should not be difficult. Ideally it should be done by someone who has not looked at the code.

@chrispsommers
Copy link

chrispsommers commented Feb 10, 2025 via email

@jafingerhut
Copy link
Collaborator

jafingerhut commented Feb 12, 2025

Some early experimental results on using PTF without using Scapy:

Install the Python package bf_pktpy, included as part of the new open-p4studio repo. It can be installed by itself, without installing the entire open-p4studio repository code. In my case, I installed it into a Python virtual environment that I had created using my install-p4dev-v8.sh script for Python packages, rather than installing it in system-wide directories.

# I omit steps required to create a Python venv, but in my experiment it was already created and activated at this time
git clone https://github.com/p4lang/open-p4studio
cd pkgsrc/ptf-modules/bf-pktpy
pip install .
# There are at least several source files in bf-pktpy module that are licensed Apache-2.0.  Hopefully all of them are.
# Also install two Python packages that bf-pktpy depends upon.
# getmac is released under MIT license, which is perfectly acceptable to import from an
# Apache-2.0 or BSD-3-Clause project.
# https://pypi.org/project/getmac
pip install getmac
# scapy_helper is also released under MIT license.  It _can_ be used with Scapy,
# but it can also be used with bf-pktpy instead, I believe.
# https://github.com/NexSabre/scapy_helper
pip install scapy_helper

After those changes, there are several runptf.sh scripts in my repo https://github.com/jafingerhut/p4-guide that I modified to add these command line options to the ptf commands:

-pmm bf_pktpy.ptf.packet_pktpy

BEFORE I made the changes above, when I run my runptf.sh scripts, I see this output near the beginning of the ptf run while it is initializing:

Using packet manipulation module: ptf.packet_scapy

That is the sign that the ptf command is importing the scapy module.

AFTER I made the changes above, when I run the runptf.sh script and it starts ptf, that line of output is instead this:

Using packet manipulation module: bf_pktpy.ptf.packet_pktpy

That is the sign that the ptf command is importing the bf_pktpy module.

WIth this change, all of these runptf.sh tests in my p4-guide repo pass:

demo1/runptf.sh
demo2/runptf.sh
demo7/runptf.sh
idletimeout/runptf.sh
matchkinds/runptf.sh
packetinout/runptf.sh
registeraccess/runptf.sh

If you are NOT running the command ptf from a shell, but instead writing Python code that imports the ptf module, here is how you can choose whether to use scapy or bf_pktpy, demonstrated below in an interactive python3 session.

To use bf_pktpy:

$ python3
Python 3.12.3 (main, Jan 17 2025, 18:03:48) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ptf
>>> ptf.config["packet_manipulation_module"] = "bf_pktpy.ptf.packet_pktpy"
>>> import ptf.packet
Using packet manipulation module: bf_pktpy.ptf.packet_pktpy
>>> 

To use scapy explicitly (it is currently the default if you just import ptf.packet):

$ python3
Python 3.12.3 (main, Jan 17 2025, 18:03:48) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ptf
>>> ptf.config["packet_manipulation_module"] = "ptf.packet_scapy"
>>> import ptf.packet
Using packet manipulation module: ptf.packet_scapy
>>> 

@chrispsommers
Copy link

Makes perfect sense to me Andy, nice work. I recall looking at some of this stuff when I was using P4Studio and Intel switched to bf_pktpy, they did a good job of making a clean-room implementation of many of the important features of Scapy. If there are P4.org PTF test cases which exceed bf_pktpy's capabilities, they could probably be added woith some modest effort. SoC task?

@jafingerhut
Copy link
Collaborator

A bit more progress. If this PR is approved and merged for ptf, which enables one to use ptf after setting an environment variable that selects between Scapy and bf-pktpy modules:

then the following small changes to the p4c repo:

pass all p4c CI tests. They do so without installing Scapy at all for most of the tests. The only tests where they do install Scapy are for the EBPF back end tests that send packets through the P4-EBPF data plane in the tests, because 6 of those tests use Scapy directly. It might be possible to replace Scapy with bf-pktpy for some of those tests, too, but I suspect for at least several of them it would require significant enhancements to bf-pktpy, which I am not personally planning to develop any time soon.

Still, it does get rid of Scapy and GPL v2 code, even from tests, for all except the p4c EBPF back end tests, in the p4c repo code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mediumtask A task that appears to require a medium level of work
Projects
None yet
Development

No branches or pull requests

3 participants