-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support the capturing of environment variables #327
base: main
Are you sure you want to change the base?
Conversation
Store Environment variables in ProcessManager and make it available in all vertical layers Support capturing env variables
type ProcessManagerConfig struct { | ||
extractEnvVars []string | ||
} | ||
|
||
var pm_cfg = ProcessManagerConfig{extractEnvVars: []string{ | ||
"PIPELINE_PPOID", | ||
"PIPELINE_SOFTWARENAME", | ||
"PIPELINE_JOBID"}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest using a map for faster (O(1)) lookup.
Also, this needs to be configurable, as you already said.
This map of env vars could be part of tracer/Config
and be passed to processmanager.New().
Then it needs a configuration option exposed to the user, see cli_flags.go
.
The scanning of the environ
file should only happen if that map of env variables is not empty.
👍 Having the option of reporting a selected set of env variables per process/sample makes sense to me. It might blow up the size of profiling messages, but IMO that can be discussed outside this PR.
IMO fits into the current implementation (ProcessManager.updatePidInformation().
A configuration file is an option, but might be over-engineered. If it just a small set of variable names, it can be configured as a comma-separated list of strings.
Reviewers will suggest improvements during the following review process.
Reviewers will ask you for tests and possibly missing parts. |
While there is interest to collect and forward more information about a process, I think, the core functionality of the eBPF profiler should stay around unwinding and providing stack traces. To enrich process information, there are several options. Did you consider
Overall, I think, it should not be the eBPF profilers task to collect as much environmental information as possible to forward it. |
It seems that the OpenTelemetry collector supports reading environment variables directly in its configuration file, and that can be leveraged from an existing processor, such as attributesprocessor to insert environment variables as attributes. So I agree with @florianl that this is probably the way to go, as we're trying to remove extraneous functionality that doesn't strictly depend on profiling internals. In some cases it's important to collect information inside the profiling agent, an example would be process name or executable name, as delayed collection may result in missing that metadata when the process exits.
To summarize, I think we should discuss this in the next Profiling SIG and decide if per-PID environment variables are something we want to collect as part of the agent. We would then also need to decide how to document this functionality, since it's not part of existing OTel semantic conventions (one option would be to make it part of profiles specification, another to introduce a new semantic convention key for process environment). |
processmanager/processinfo.go
Outdated
splittedVars := strings.Split(string(envVars), "\000") | ||
fmt.Println("EnvVars for PID" + strconv.Itoa(int(pid))) | ||
for _, envVar := range splittedVars { | ||
keyValuePair := strings.Split(envVar, "=") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct as values can themselves contain "=".
keyValuePair := strings.Split(envVar, "=") | |
keyValuePair := strings.SplitN(envVar, "=", 2) |
Thanks for all your feedback. Sadly OpenTelemetry Collector did not seem like a viable option. The OpenTelemetry Collector does not have to run on the same machine as this eBPF profiler, making it impossible to get the needed data in the OpenTelemetry collector. The eBPF profiler seemed the correct layer to implement this feature because of this. In HPC environments we usually have multiple executable (processes, executing different binaries) spread over multiple machines / pilots for the same Job. Being able to group and filter those to properly understand code paths, hotspots and bottlenecks environment variables on a PID level are necessary. |
To clarify: The eBPF profiler will run as part of the OTel collector (as a receiver) in the near future and this will be the standard supported configuration. It's not certain at this point in time whether we'll continue to support a standalone profiling agent. However, this doesn't really affect the implementation of collecting per-PID metadata, as a similar approach to the one implemented in this PR will need to be used if reliability is required. |
Can you be more precise on this? I think, in the mid to long run, there will be no independent eBPF profiler binary as it will be part of the OTel collector.
Can you point out, why using a similar approach, like parca-agent, which uses the functionality of the eBPF profiler but with custom extensions, is not a viable option? |
Ah I did not know that. I thought the ebpf profiler and the Otel collector will be two seperate processes / infrastructure layers.
I have tried parca, however it seemed to require a paid license for most functionality and I was unable to get it to work with another backend / frontend (other than parca). |
I'm not asking to use parca (but also feel free to use it if it fits your need), but more like following the approach of the parca-agent. parca-agent uses eBPF profiler as Go module and builds for their custom needs around it. Before open sourcing, the agent used to also collect more information about the environment. But this functionality got removed, see #90, as this can be done with other approaches. |
We need to distinguish between per-agent-process environment collection and per-trace-event-PID (partial)environment collection. Per-agent-process environment collection can take place in a post-processing step in delayed fashion, e.g. in an OTel collector processor. Per-trace-event-PID (partial)environment collection is time-sensitive and, if reliability is required, needs to be synchronized in the same manner as the other per-trace-event-PID bits of metadata the agent collects. EDIT: Added a discussion point to next week's Profiling SIG agenda |
You only need PID level env variables if you run the identical executable as multiple processes on the same machine at the same time. That's how I understood the PR description, but here you tell a slightly different story ("processes, executing different binaries"). Because you can already distinguish between different hosts and between different executables from the profiling data. Can you clarify? Are you running the same executable as several processes in parallel on a machine? |
If there is a need for more and custom time critical collection of PID information, then package |
Yes, often the same executable will be started hundreds if not even thousands of times with different parameters for the same job, this can happen on the same machine or be distributed over several machines. It depends on the how the HPC cluster is setup. Maybe to provide a use case: The ESA Euclid telescope gathers 600MP photos often containing thousands of galaxies, for each of those galaxies a new process is then kicked off to do calculations (very simply put). This can end up with thousands of processes being started with the same executable. Also different processing pipelines running at the same time may use the identical executables (for different patches of the sky for example, each patch being a processing job, each again kicking off thousands of new processes). |
Do note that in some scenarios environment may contain "secret" tokens and it probably is not good to send full unfiltered environment. I believe some users will be very sensitive on what data is collected and sent out. So it probably would be better to keep this off by default, and enable it by giving a whitelist of the environment variables that will be sent out. That should also limit the amount of data that gets sent out. |
That is what the PR does. It filters for a given set of variable names. If that list is empty, no data would be sent. |
This ensures that only 2 substrings are returned (in case the value has a "=" too)
Happy to take this conversation elsewhere (eg. Parca Discord), but wanted to chime in and say that Parca is in no way proprietary or needs a license for anything (everything under the Parca umbrella is Apache2 license with the exception of some of the eBPF bits which are GPL). Would love to understand what makes it seem like that's not the case and resolve that! On a related note, Parca-Agent already does support adding a variety of metadata on a per-sample basis. See the metadata labels documentation, happy to help you put together a configuration that suits your needs! |
Use case behind this PR
This is part of my bachelor thesis at the University of Applied Sciences North-Western Switzerland (FHNW).
We are exploring the usage of the opentelemetry-ebpf-profiler for HPC environments (specifically for the ESA Euclid mission). Since there are many different jobs running in parallel on HPC clusters, admins need a way to differentiate profiling data between missions and pipelines to figure out who or what is causing a congestion of the system.
For this purpose Environment variables are passed to executables in HPC environments (for example by SLURM or custom pipeline processing management systems). Being able to group and filter by some selected environment variables is therefor crucial for HPC clusters and developers that are running their code on said clusters.
Implementation
The environment variables are read from
/proc/<pid>
. I noticed that there is a way to extend captured profiling data with custom metadata using theSampleAttrProducer.ExtraSampleAttrs
. However, since theProcessManager
already captures PID specific data from/proc/<pid>
I thought it might be a good idea to capture the environment variable directly in there.Currently the list of captured environment variables is static at compile time and defined in the
ProcessManager
Todo's / questions