Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression on v0.11.0 when re-enabling TracingPolicy with sensors API #1489

Closed
inliquid opened this issue Sep 18, 2023 · 4 comments · Fixed by #1562
Closed

Regression on v0.11.0 when re-enabling TracingPolicy with sensors API #1489

inliquid opened this issue Sep 18, 2023 · 4 comments · Fixed by #1562
Assignees
Labels
kind/bug Something isn't working

Comments

@inliquid
Copy link
Contributor

inliquid commented Sep 18, 2023

What happened?

I got a regression with Tetragon v0.10.0 -> v0.11.0, there is policy for tcp connect events:

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: connect
spec:
  kprobes:
  - args:
    - index: 0
      returnCopy: false
      type: sock
    call: tcp_connect
    return: false
    syscall: false
  - args:
    - index: 0
      returnCopy: false
      type: sock
    call: tcp_close
    return: false
    syscall: false
  - args:
    - index: 0
      returnCopy: false
      type: sock
    - index: 2
      returnCopy: false
      type: int
    call: tcp_sendmsg
    return: false
    syscall: false

And if I disable it with tetra sensors disable connect and try to re-enable: tetra sensors enable connect it returns an error and stays disabled:

# tetra sensors enable connect
failed to enable sensor connect: rpc error: code = Unknown desc = sensor gkp-sensor-1 from collection connect failed to load: failed prog /var/lib/tetragon/bpf_generic_kprobe_v53.o kern_version 328939 loadInstance: getting entry from genericKprobeTable failed with: invalid id (ID=0/invalid entry)

Tetragon logs:

time="2023-09-18T18:40:00Z" level=warning msg="Server EnableSensor request failed" error="sensor gkp-sensor-1 from collection connect failed to load: failed prog /var/lib/tetragon/bpf_generic_kprobe_v53.o kern_version 328939 loadInstance: getting entry from genericKprobeTable failed with: invalid id (ID=0/invalid entry)" sensor.name=connect

This all worked well with Tetragon v0.10.0, never had an issue with re-enabling this policy.

Tetragon Version

# tetra version
CLI version: v0.11.0

Kernel Version

$ uname -a
Linux mint-vm 5.4.0-155-generic #172-Ubuntu SMP Fri Jul 7 16:10:02 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

$ kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4", GitCommit:"fa3d7990104d7c1f16943a67f11b154b71f6a132", GitTreeState:"clean", BuildDate:"2023-07-19T12:20:54Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.3", GitCommit:"9e644106593f3f4aa98f8a84b23db5fa378900bd", GitTreeState:"clean", BuildDate:"2023-03-30T06:34:50Z", GoVersion:"go1.19.7", Compiler:"gc", Platform:"linux/amd64"}

Bugtool

tetragon-bugtool.tar.gz

Relevant log output

No response

Anything else?

No response

@inliquid inliquid added the kind/bug Something isn't working label Sep 18, 2023
@mtardy mtardy self-assigned this Sep 27, 2023
@mtardy
Copy link
Member

mtardy commented Oct 4, 2023

Hey, sorry I didn't reply immediately on this issue, we had a discussion in the Tetragon Slack channel.

I think this might be related to #1385, I overlooked the enable/disable interface I'm sorry, I'll try to fix it quickly.

mtardy added a commit that referenced this issue Oct 10, 2023
This should make sure the regression #1489 doesn't appear
anymore: disabling/enabling broke because we cleaned up the generic
kprobe table on unloading, which should have been done only on
destroying the probe.

Signed-off-by: Mahe Tardy <[email protected]>
mtardy added a commit that referenced this issue Oct 13, 2023
This should make sure the regression #1489 doesn't appear
anymore: disabling/enabling broke because we cleaned up the generic
kprobe table on unloading, which should have been done only on
destroying the probe.

Signed-off-by: Mahe Tardy <[email protected]>
mtardy added a commit that referenced this issue Oct 13, 2023
This should make sure the regression #1489 doesn't appear
anymore: disabling/enabling broke because we cleaned up the generic
kprobe table on unloading, which should have been done only on
destroying the probe.

Signed-off-by: Mahe Tardy <[email protected]>
mtardy added a commit that referenced this issue Oct 13, 2023
This should make sure the regression #1489 doesn't appear
anymore: disabling/enabling broke because we cleaned up the generic
kprobe table on unloading, which should have been done only on
destroying the probe.

Signed-off-by: Mahe Tardy <[email protected]>
mtardy added a commit that referenced this issue Oct 13, 2023
This should make sure the regression #1489 doesn't appear
anymore: disabling/enabling broke because we cleaned up the generic
kprobe table on unloading, which should have been done only on
destroying the probe.

Signed-off-by: Mahe Tardy <[email protected]>
kkourt pushed a commit that referenced this issue Oct 16, 2023
This should make sure the regression #1489 doesn't appear
anymore: disabling/enabling broke because we cleaned up the generic
kprobe table on unloading, which should have been done only on
destroying the probe.

Signed-off-by: Mahe Tardy <[email protected]>
@mtardy
Copy link
Member

mtardy commented Oct 16, 2023

Hey @inliquid, this should be fixed now in latest (and next release) and we should backport this to 0.11. Just to mention that I added the tetra tracingpolicy disable/enable so you can avoid using the whole sensor interface. Everything is available under the tracingpolicy interface (and actually more than sensor, since you can add tracing policy, and not sensors directly).

Sorry for the delay and I hope this helps.

@inliquid
Copy link
Contributor Author

inliquid commented Oct 16, 2023

Thanks @mtardy! Sounds great, I will re-implement enable/disable logic using updated interface then.

and we should backport this to 0.11

So should helm upgrade do the thing?

mtardy added a commit that referenced this issue Oct 17, 2023
[upstream commit 19babbb]

This should make sure the regression #1489 doesn't appear
anymore: disabling/enabling broke because we cleaned up the generic
kprobe table on unloading, which should have been done only on
destroying the probe.

Signed-off-by: Mahe Tardy <[email protected]>
@mtardy
Copy link
Member

mtardy commented Oct 17, 2023

Thanks @mtardy! Sounds great, I will re-implement enable/disable logic using updated interface then.

and we should backport this to 0.11

So should helm upgrade do the thing?

we don't have a release yet but this should be part of next release, and also of next 0.11.x release, see backport here #1604. To have the fix without waiting you need to run latest, so basically use ci images or built it yourself.

mtardy added a commit that referenced this issue Oct 17, 2023
[upstream commit 19babbb]

This should make sure the regression #1489 doesn't appear
anymore: disabling/enabling broke because we cleaned up the generic
kprobe table on unloading, which should have been done only on
destroying the probe.

Signed-off-by: Mahe Tardy <[email protected]>
mtardy added a commit that referenced this issue Oct 18, 2023
[upstream commit 19babbb]

This should make sure the regression #1489 doesn't appear
anymore: disabling/enabling broke because we cleaned up the generic
kprobe table on unloading, which should have been done only on
destroying the probe.

Signed-off-by: Mahe Tardy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants