-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tetragon: improve how we handle TIDs and GetProcessCopy() #1256
Conversation
367e603
to
0f6a5c5
Compare
So the argument here is exit should always be associated with the same TID as the execve? |
Yes this is the intention, so to sum up at execve tid==pid at our current hook time (late) there is only one single thread, then: We improved how to track last threads https://github.com/cilium/tetragon/blob/main/bpf/process/bpf_exit.c#L10 so technically we can report during exit TID != PID (TGID) But we still report TID==PID: https://github.com/cilium/tetragon/blob/main/bpf/process/bpf_exit.h#L59 So if you see some value in reporting TID != PID at exit, we can make it and drop the last patch of this PR? tid != pid at exit has some value but maybe only to expert users, plus the tid field is new! in the other hand if users start to correlate the exceve and exit with PID and TID better have them equal, and that's it (simple). For the related GetProcessCopy(): the long term goal is during kprobe if TID != PID then internally we just copy the shared fields during GetProcessCopy(), but for all the rest we have to set it at real time from BPF... this should reduce data copied during GetProcessCopy() but may add more cpu cycles for decoding non shared fields which we have to do anyway... and we already have an answer by masking some fields with --enable-process-cred --enable-process-ns flags to not decode nor report everything... So here I will start incrementally isolating fields and making it clear in the code what fields we share and what not and assert that execve==exit events have some equal fields, but for other events we do case by case... to improve process cache handling and make it easy for us to remember :-D ... |
Its not clear to me how to create "shared" fields until we rip the protobuf structs out of the core codes. I think we want to do this regardless though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I guess our model is that ProcessInternal is immutable, and if it needs to be modified, we do a copy?
@kkourt ProcessInternal can be mutable but *process can not be changed while being handled by JSON/GRPC writers or else we risk corrupting the output. And (not sure we do this now) ProcessInternal really shouldn't be written/read without using some atomic, read_once, or locking primitive because multiple go routines might be working on it. Or if its a new object not pushed into a cache yet its safe because it has just a single go routine with access to it. |
I merged the patch 1 from the other PR as we discuss this. |
Makes sense. Might worth reviewing the code and verifying that things work as above. ProcessInternal does have a |
49cb54d
to
53b9644
Compare
@jrfastab @kkourt so that mutex is used to serialize:
So not for processinternal but for the embedded We have to do several things here:
We may end up with:
This could make processInternal somehow immutable as pointed by @kkourt until we start to hook in the sensors call sites that may change those fields... |
issue here: #1290 |
@jrfastab what are the scenarios where processInternal can be mutable? just ensuring I didnt miss any. IIRC the matchCapabilitiesChange updates the bpf execve_map but not the userspace process cache and that's part of kprobe not a main sensor. I think if we want to reflect this we have first to hook by default the setuid() family as a main sensor too, update the credentials on the bpf execve_map then reflect this on the user space process cache too... we do it step by step, so we don't get tricked by some exploits etc... |
I'm probably missing something, but I'm getting bit lost on how the TID works now ;-) IIRC we have following events flow:
could we have comment like that somewhere explaining the current situation? also maybe we could remove some of the tid fields from the kernel->user events, |
indeed will do |
Its not clear to me what the issue created is about? In general IMO we rely to heavily on protobufs in lower edges of stack, but that is about optimizing performance and json encoder. The mutex in processInternal is there to serialize writers/readers of processInternal. Somehow the 3 bullets above were lost on me. |
53b9644
to
a4fa3fc
Compare
@olsajiri I did extensively last patch with commit log, please have a look
I agree that we should remove it as some point, but for now it is useful as it helps debug things and ensure our model is right, and we are still not finished with handling all threads please take a look at the last patch commit log and to this comment #1256 (comment) |
✅ Deploy Preview for tetragon ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
@jrfastab fair enough, I updated last patch with extensive commit log about what needs to be fixed, and when doing those we will fix it if we have to. |
52277cc
to
26d09a1
Compare
Ok @olsajiri updated based on all comments, and yes it turned more work with documenting all the stuff that it is still missing as you suggested. so basically we are not finished with all per thread transition hence some extra debug and warning messages... here and there. |
26d09a1
to
e44b014
Compare
e44b014
to
58a58ef
Compare
I find this commit message to be confusing,
My sort of understanding is something like,
|
"event.process.binary": binary, | ||
"event.process.exec_id": execID, | ||
"event.parent.exec_id": parentExecID, | ||
}).Warn("ExecveEvent: process PID and TID mismatch") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These warnings need to be metric error counters so we can actually find them. No one will notice a rare and random warning in the logs. (Note I wouldn't block this PR on the metric implementation but I do think we should get in soon to catch this.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So all fixed except for this adding metrics, will do in separate PR, so I don't mess up this one ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments aside looks good to me. Few small cleanups and I think we can get this merged.
Exit events should have TID==PID at same time we want to correlate the {TID,PID} of the exit event with the {TID,PID} pair from the exec event. They must match because its how we link the exit event to the exec one. If TID != PID or the Exit{TID,PID} pair does not match Exec{TID,PID} pair then this is a bug. This patch adds extra logic to WARN on conditions where TID!=PID to aid debugging and catch this unexpected case. Typically this indicates a bug either in BPF or userspace caching logic. When this condition is encountered we attempt to correct it by setting the TID=PID on Clone and Exec events. Signed-off-by: Djalal Harouni <[email protected]>
Remove the unnecessary GetProcessCopy() and use the previous refernce of the process. That reference if found in the process cache already has TID set when it was pushed during exec or clone. Signed-off-by: Djalal Harouni <[email protected]>
Our current rules now for how we collect thread IDs are: During bpf we send both TID and PID where: - fork TID == PID - exec TID == PID - [ku]probe or tracepoint TID could be different as per thread ID. - exit TID == PID => this is to match the exec evnet. At user space we have one entry that is the thread leader collected either during clone or exec: - fork TID == PID (asserts TID == PID received from BPF side) - exec TID == PID ditto - exit TID == PID => this is to match the exec evnet. - [ku]probe or tracepoint We make a copy of the process that is the thread leader in the process cache then update its TID that was recorded from bpf side. The copy is needed so we don't corrupt gRPC handling. Now this is by far complete, future fixes should include: - Record the capabilities and namespaces per thread in BPF side for [ku]probe and tracpoints and ensure to not overwrite the fields of the thread leader that are in the execve_map in bpf side or the user space process cache with fields of another thread, as that cache contains only one thread the leader. [Need to recheck sources] - Also ensure that [ku]probe and tracpoints events do use the per thread capabilities and namespaces fields collected from bpf side instead of the fields of the leader that is in the process cache which were collected during exec or during match filters... and not at current time. - Ensure that we always collect thread leader fields and we cache them in our shadow state in execve_map and user space process cache, reguardless of --enable-process-creds and a like flags. - When all these fixed and the model is clear, maybe by then we can remove the extra recording of TIDs from bpf side during fork/clone, exec and exit as we should by then asserted our model. Note that sending the TID that equals PID on clone exec and exit from BPF side and the assertion on the user space helps to catch errors for other Tetragon variants that use the OSS version as a base with custom sensors. The downside of this is we are just sending an extra 4bytes from bpf which is also fine. Signed-off-by: Djalal Harouni <[email protected]>
58a58ef
to
9c43b9d
Compare
Thanks @jrfastab I changed the commit log to what you suggested and also inside the code ;-) |
Goes on top of #1254