Support the capturing of environment variables #327

RononDex · 2025-01-28T10:53:15Z

Use case behind this PR

This is part of my bachelor thesis at the University of Applied Sciences North-Western Switzerland (FHNW).

We are exploring the usage of the opentelemetry-ebpf-profiler for HPC environments (specifically for the ESA Euclid mission). Since there are many different jobs running in parallel on HPC clusters, admins need a way to differentiate profiling data between missions and pipelines to figure out who or what is causing a congestion of the system.

For this purpose Environment variables are passed to executables in HPC environments (for example by SLURM or custom pipeline processing management systems). Being able to group and filter by some selected environment variables is therefor crucial for HPC clusters and developers that are running their code on said clusters.

Implementation

The environment variables are read from /proc/<pid>. I noticed that there is a way to extend captured profiling data with custom metadata using the SampleAttrProducer.ExtraSampleAttrs. However, since the ProcessManager already captures PID specific data from /proc/<pid> I thought it might be a good idea to capture the environment variable directly in there.

Currently the list of captured environment variables is static at compile time and defined in the ProcessManager

Todo's / questions

How does this implementation hold up with the existing architecture / philosophy of opentelemetry-ebpf-profiler?
What would be the best way to make the list of captured environment variables configurable (using a configuration file for example?)
How can my code be improved? (I am new to the language Go)
Currently only the otlp reporter was adjusted to include the env vars as attributes on a Sample level. Other reporters would need to be adjusted too.

Store Environment variables in ProcessManager and make it available in all vertical layers Support capturing env variables

linux-foundation-easycla · 2025-01-28T10:53:21Z

The committers listed above are authorized under a signed CLA.

✅ login: RononDex / name: RononDex (a4ace39, e565f5b)

rockdaboot · 2025-01-28T12:36:54Z

processmanager/processinfo.go

+type ProcessManagerConfig struct {
+	extractEnvVars []string
+}
+
+var pm_cfg = ProcessManagerConfig{extractEnvVars: []string{
+	"PIPELINE_PPOID",
+	"PIPELINE_SOFTWARENAME",
+	"PIPELINE_JOBID"}}


I'd suggest using a map for faster (O(1)) lookup.
Also, this needs to be configurable, as you already said.
This map of env vars could be part of tracer/Config and be passed to processmanager.New().
Then it needs a configuration option exposed to the user, see cli_flags.go.

The scanning of the environ file should only happen if that map of env variables is not empty.

rockdaboot · 2025-01-28T12:54:56Z

👍 Having the option of reporting a selected set of env variables per process/sample makes sense to me. It might blow up the size of profiling messages, but IMO that can be discussed outside this PR.

How does this implementation hold up with the existing architecture / philosophy of opentelemetry-ebpf-profiler?

IMO fits into the current implementation (ProcessManager.updatePidInformation().

What would be the best way to make the list of captured environment variables configurable (using a configuration file for example?)

A configuration file is an option, but might be over-engineered. If it just a small set of variable names, it can be configured as a comma-separated list of strings.

How can my code be improved? (I am new to the language Go)

Reviewers will suggest improvements during the following review process.

Currently only the otlp reporter was adjusted to include the env vars as attributes on a Sample level. Other reporters would need to be adjusted too.

Reviewers will ask you for tests and possibly missing parts.

florianl · 2025-01-28T13:07:34Z

While there is interest to collect and forward more information about a process, I think, the core functionality of the eBPF profiler should stay around unwinding and providing stack traces.

To enrich process information, there are several options. Did you consider

a costum process attribute processor - like https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor
a costum implementation of package reporter. Check out parca-agent and how they implement their individual package reporter to fit their backend.
implement and configure your own ExtraSampleAttrProd for the package reporter

Overall, I think, it should not be the eBPF profilers task to collect as much environmental information as possible to forward it.

christos68k · 2025-01-28T13:19:04Z

While there is interest to collect and forward more information about a process, I think, the core functionality of the eBPF profiler should stay around unwinding and providing stack traces.

To enrich process information, there are several options. Did you consider

It seems that the OpenTelemetry collector supports reading environment variables directly in its configuration file, and that can be leveraged from an existing processor, such as attributesprocessor to insert environment variables as attributes.

So I agree with @florianl that this is probably the way to go, as we're trying to remove extraneous functionality that doesn't strictly depend on profiling internals. In some cases it's important to collect information inside the profiling agent, an example would be process name or executable name, as delayed collection may result in missing that metadata when the process exits.

~~For environment variables however this should not be a problem.~~ I looked at the PR after writing this and noticed you're doing per-PID collection which will not work with the OTel collector configuration file (as that's per profiling agent process) and is subject to the same concerns (timing, process exit) as the other per-PID bits of metadata we collect. For reliable collection, the approach you're using in this PR would then be the way to go.

To summarize, I think we should discuss this in the next Profiling SIG and decide if per-PID environment variables are something we want to collect as part of the agent. We would then also need to decide how to document this functionality, since it's not part of existing OTel semantic conventions (one option would be to make it part of profiles specification, another to introduce a new semantic convention key for process environment).

christos68k · 2025-01-28T13:53:56Z

processmanager/processinfo.go

+			splittedVars := strings.Split(string(envVars), "\000")
+			fmt.Println("EnvVars for PID" + strconv.Itoa(int(pid)))
+			for _, envVar := range splittedVars {
+				keyValuePair := strings.Split(envVar, "=")


This is not correct as values can themselves contain "=".

Suggested change

keyValuePair := strings.Split(envVar, "=")

keyValuePair := strings.SplitN(envVar, "=", 2)

RononDex · 2025-01-28T15:30:27Z

Thanks for all your feedback. Sadly OpenTelemetry Collector did not seem like a viable option. The OpenTelemetry Collector does not have to run on the same machine as this eBPF profiler, making it impossible to get the needed data in the OpenTelemetry collector.

The eBPF profiler seemed the correct layer to implement this feature because of this.

In HPC environments we usually have multiple executable (processes, executing different binaries) spread over multiple machines / pilots for the same Job. Being able to group and filter those to properly understand code paths, hotspots and bottlenecks environment variables on a PID level are necessary.

christos68k · 2025-01-28T15:42:47Z

Thanks for all your feedback. Sadly OpenTelemetry Collector did not seem like a viable option. The OpenTelemetry Collector does not have to run on the same machine as this eBPF profiler, making it impossible to get the needed data in the OpenTelemetry collector.

The eBPF profiler seemed the correct layer to implement this feature because of this.

To clarify: The eBPF profiler will run as part of the OTel collector (as a receiver) in the near future and this will be the standard supported configuration. It's not certain at this point in time whether we'll continue to support a standalone profiling agent. However, this doesn't really affect the implementation of collecting per-PID metadata, as a similar approach to the one implemented in this PR will need to be used if reliability is required.

florianl · 2025-01-28T15:46:36Z

Sadly OpenTelemetry Collector did not seem like a viable option. The OpenTelemetry Collector does not have to run on the same machine as this eBPF profiler, making it impossible to get the needed data in the OpenTelemetry collector.

Can you be more precise on this? I think, in the mid to long run, there will be no independent eBPF profiler binary as it will be part of the OTel collector.

a costum implementation of package reporter. Check out parca-agent and how they implement their individual package reporter to fit their backend.

Can you point out, why using a similar approach, like parca-agent, which uses the functionality of the eBPF profiler but with custom extensions, is not a viable option?

RononDex · 2025-01-28T15:49:35Z

Can you be more precise on this? I think, in the mid to long run, there will be no independent eBPF profiler binary as it will be part of the OTel collector.

Ah I did not know that. I thought the ebpf profiler and the Otel collector will be two seperate processes / infrastructure layers.

Can you point out, why using a similar approach, like parca-agent, which uses the functionality of the eBPF profiler but with custom extensions, is not a viable option?

I have tried parca, however it seemed to require a paid license for most functionality and I was unable to get it to work with another backend / frontend (other than parca).

florianl · 2025-01-28T15:59:04Z

Can you point out, why using a similar approach, like parca-agent, which uses the functionality of the eBPF profiler but with custom extensions, is not a viable option?

I have tried parca, however it seemed to require a paid license for most functionality and I was unable to get it to work with another backend / frontend (other than parca).

I'm not asking to use parca (but also feel free to use it if it fits your need), but more like following the approach of the parca-agent. parca-agent uses eBPF profiler as Go module and builds for their custom needs around it.

Before open sourcing, the agent used to also collect more information about the environment. But this functionality got removed, see #90, as this can be done with other approaches.

christos68k · 2025-01-28T16:10:12Z

Before open sourcing, the agent used to also collect more information about the environment. But this functionality got removed, see #90, as this can be done with other approaches.

We need to distinguish between per-agent-process environment collection and per-trace-event-PID (partial)environment collection. Per-agent-process environment collection can take place in a post-processing step in delayed fashion, e.g. in an OTel collector processor.

Per-trace-event-PID (partial)environment collection is time-sensitive and, if reliability is required, needs to be synchronized in the same manner as the other per-trace-event-PID bits of metadata the agent collects.

EDIT: Added a discussion point to next week's Profiling SIG agenda

rockdaboot · 2025-01-28T16:23:52Z

In HPC environments we usually have multiple executable (processes, executing different binaries) spread over multiple machines / pilots for the same Job. Being able to group and filter those to properly understand code paths, hotspots and bottlenecks environment variables on a PID level are necessary.

You only need PID level env variables if you run the identical executable as multiple processes on the same machine at the same time. That's how I understood the PR description, but here you tell a slightly different story ("processes, executing different binaries").

Because you can already distinguish between different hosts and between different executables from the profiling data.

Can you clarify? Are you running the same executable as several processes in parallel on a machine?

florianl · 2025-01-28T16:26:48Z

Per-trace-event-PID (partial)environment collection is time-sensitive and, if reliability is required, needs to be synchronized in the same manner as the other per-trace-event-PID bits of metadata the agent collects.

If there is a need for more and custom time critical collection of PID information, then package processmanager should provide a hook, that is triggered on func (*ProcessManager) SynchronizeProcess(process.Process). In combination with reporter.ExtraSampleAttrProd this would allow catching, updating and reporting such custom information.

RononDex · 2025-01-28T16:28:40Z

Yes, often the same executable will be started hundreds if not even thousands of times with different parameters for the same job, this can happen on the same machine or be distributed over several machines. It depends on the how the HPC cluster is setup.

Maybe to provide a use case: The ESA Euclid telescope gathers 600MP photos often containing thousands of galaxies, for each of those galaxies a new process is then kicked off to do calculations (very simply put). This can end up with thousands of processes being started with the same executable.

Also different processing pipelines running at the same time may use the identical executables (for different patches of the sky for example, each patch being a processing job, each again kicking off thousands of new processes).

fabled · 2025-01-28T16:29:05Z

Do note that in some scenarios environment may contain "secret" tokens and it probably is not good to send full unfiltered environment. I believe some users will be very sensitive on what data is collected and sent out. So it probably would be better to keep this off by default, and enable it by giving a whitelist of the environment variables that will be sent out. That should also limit the amount of data that gets sent out.

rockdaboot · 2025-01-28T16:30:30Z

Do note that in some scenarios environment may contain "secret" tokens and it probably is not good to send full unfiltered environment. I believe some users will be very sensitive on what data is collected and sent out. So it probably would be better to keep this off by default, and enable it by giving a whitelist of the environment variables that will be sent out. That should also limit the amount of data that gets sent out.

That is what the PR does. It filters for a given set of variable names. If that list is empty, no data would be sent.

This ensures that only 2 substrings are returned (in case the value has a "=" too)

brancz · 2025-01-29T09:41:06Z

Happy to take this conversation elsewhere (eg. Parca Discord), but wanted to chime in and say that Parca is in no way proprietary or needs a license for anything (everything under the Parca umbrella is Apache2 license with the exception of some of the eBPF bits which are GPL). Would love to understand what makes it seem like that's not the case and resolve that!

On a related note, Parca-Agent already does support adding a variety of metadata on a per-sample basis. See the metadata labels documentation, happy to help you put together a configuration that suits your needs!

Support capturing environment variables

a4ace39

Store Environment variables in ProcessManager and make it available in all vertical layers Support capturing env variables

RononDex requested review from a team as code owners January 28, 2025 10:53

rockdaboot reviewed Jan 28, 2025

View reviewed changes

christos68k assigned christos68k and unassigned christos68k Jan 28, 2025

christos68k reviewed Jan 28, 2025

View reviewed changes

Use SplitN instead of Split

e565f5b

This ensures that only 2 substrings are returned (in case the value has a "=" too)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support the capturing of environment variables #327

Support the capturing of environment variables #327

RononDex commented Jan 28, 2025

linux-foundation-easycla bot commented Jan 28, 2025 •

edited

Loading

rockdaboot Jan 28, 2025

rockdaboot commented Jan 28, 2025

florianl commented Jan 28, 2025

christos68k commented Jan 28, 2025 •

edited

Loading

christos68k Jan 28, 2025

RononDex commented Jan 28, 2025

christos68k commented Jan 28, 2025

florianl commented Jan 28, 2025

RononDex commented Jan 28, 2025

florianl commented Jan 28, 2025

christos68k commented Jan 28, 2025 •

edited

Loading

rockdaboot commented Jan 28, 2025

florianl commented Jan 28, 2025

RononDex commented Jan 28, 2025

fabled commented Jan 28, 2025

rockdaboot commented Jan 28, 2025

brancz commented Jan 29, 2025 •

edited

Loading

	keyValuePair := strings.Split(envVar, "=")
	keyValuePair := strings.SplitN(envVar, "=", 2)

Support the capturing of environment variables #327

Are you sure you want to change the base?

Support the capturing of environment variables #327

Conversation

RononDex commented Jan 28, 2025

Use case behind this PR

Implementation

Todo's / questions

linux-foundation-easycla bot commented Jan 28, 2025 • edited Loading

rockdaboot Jan 28, 2025

Choose a reason for hiding this comment

rockdaboot commented Jan 28, 2025

florianl commented Jan 28, 2025

christos68k commented Jan 28, 2025 • edited Loading

christos68k Jan 28, 2025

Choose a reason for hiding this comment

RononDex commented Jan 28, 2025

christos68k commented Jan 28, 2025

florianl commented Jan 28, 2025

RononDex commented Jan 28, 2025

florianl commented Jan 28, 2025

christos68k commented Jan 28, 2025 • edited Loading

rockdaboot commented Jan 28, 2025

florianl commented Jan 28, 2025

RononDex commented Jan 28, 2025

fabled commented Jan 28, 2025

rockdaboot commented Jan 28, 2025

brancz commented Jan 29, 2025 • edited Loading

linux-foundation-easycla bot commented Jan 28, 2025 •

edited

Loading

christos68k commented Jan 28, 2025 •

edited

Loading

christos68k commented Jan 28, 2025 •

edited

Loading

brancz commented Jan 29, 2025 •

edited

Loading