Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MeasurementProcessor specification to Metrics SDK #4318

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open
Changes from 16 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
3f3186a
Add MeasurementProcessor specification to Metrics SDK
Blinkuu Dec 3, 2024
a8f5de3
Update TODO
Blinkuu Dec 3, 2024
1ce9e4d
Update specification/metrics/sdk.md
Blinkuu Dec 3, 2024
4b0a58d
Update specification/metrics/sdk.md
pellared Dec 3, 2024
449d2fb
Update specification/metrics/sdk.md
pellared Dec 3, 2024
60adbd3
Remove Shutdown and ForceFlush from MeasurementProcessor spec
Blinkuu Dec 6, 2024
9d60b12
Merge branch 'main' into add-measurement-processor-to-metrics-sdk-spec
Blinkuu Dec 6, 2024
17f650c
Merge branch 'main' into add-measurement-processor-to-metrics-sdk-spec
Blinkuu Dec 19, 2024
4cbc0ec
Merge branch 'main' into add-measurement-processor-to-metrics-sdk-spec
Blinkuu Jan 2, 2025
01bcc0a
feat: define built-in measurement processor
Blinkuu Jan 2, 2025
d28e993
Merge branch 'main' into add-measurement-processor-to-metrics-sdk-spec
Blinkuu Jan 7, 2025
90d331d
chore: update the wording
Blinkuu Jan 7, 2025
ee72e3f
chore: rename simple processor to default processor
Blinkuu Jan 7, 2025
225000a
feat: specify how we reference the next processor in the chain
Blinkuu Jan 7, 2025
8fbc341
Merge branch 'main' into add-measurement-processor-to-metrics-sdk-spec
Blinkuu Jan 22, 2025
f57fe00
Update specification of the `next` parameter and remove `NoopProcessor`
Blinkuu Jan 22, 2025
66c6926
Update specification/metrics/sdk.md
Blinkuu Jan 23, 2025
e585028
Update CHANGELOG.md
Blinkuu Jan 23, 2025
83a81a5
Update metrics/sdk.md
Blinkuu Jan 23, 2025
dbaffa1
Merge branch 'main' into add-measurement-processor-to-metrics-sdk-spec
Blinkuu Jan 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions specification/metrics/sdk.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,11 @@ linkTitle: SDK
* [Instrument advisory parameters](#instrument-advisory-parameters)
* [Instrument enabled](#instrument-enabled)
- [Attribute limits](#attribute-limits)
- [MeasurementProcessor](#measurementprocessor)
* [MeasurementProcessor operations](#measurementprocessor-operations)
+ [OnMeasure](#onmeasure)
* [Built-in processors](#built-in-processors)
+ [DefaultProcessor](#defaultprocessor)
- [Exemplar](#exemplar)
* [ExemplarFilter](#exemplarfilter)
+ [AlwaysOn](#alwayson)
Expand Down Expand Up @@ -984,6 +989,66 @@ Attributes which belong to Metrics are exempt from the
time. Attribute truncation or deletion could affect identity of metric time
series and the topic requires further analysis.

## MeasurementProcessor

Blinkuu marked this conversation as resolved.
Show resolved Hide resolved
**Status**: [Development](../document-status.md)

`MeasurementProcessor` is an interface which allows hooks when a `Measurement` is recorded by an `Instrument`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context of tracing, a processor is somewhat of a different thing (it's not a filterer of data, it sits on top of exporters).

Should we name this differently than Processor to differenciate the two?
Or should we make both behaviors similar?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context of tracing, a processor is somewhat of a different thing (it's not a filterer of data, it sits on top of exporters).

The MeasurementProcessor is more than a filter of data. It can modify the measurement, decide to drop it, or even emit it multiple times.

Should we name this differently than Processor to differentiate the two?
Or should we make both behaviors similar?

To me, they are semantically similar. The difference is in the architecture of the Metrics SDK, which makes it hard to fit and connect the new processor pipeline directly with exporters. We could revisit this in the future and try to align the behavior between Logs, Traces, and Metrics SDKs. To keep an open door for such ideas, I'm purposefully specifying that:

Built-in measurement processors are responsible for Measurement Processing.

Where Measurement Processing is currently vaguely defined in this spec by using the word SHOULD.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support @Blinkuu's reasoning above. In my thinking, we can use "processor" across OpenTelemetry when a component in a program receives and outputs the same type of data. Trace SDK processors input/output span data. Collector processors input/output pipeline data. Things that are not processors (e.g., Metric reader, Trace sampler, Exporters, Receivers) generally have different types of input and output.

Here, MeasurementProcessor is appropriate because the input and the output types are the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, MeasurementProcessor is appropriate because the input and the output types are the same.

+1


Built-in measurement processors are responsible for [Measurement Processing](#measurement-processing).

`MeasurementProcessors` can be registered directly on SDK `MeterProvider` and they are invoked in the same order as they were registered.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are invoked in the same order as they were registered.

Unclear how is this part works with the later wording "The SDK MUST inject the OnMeasure function from the next processor in the chain into the current processor"

Who is invoking the MeasurementProcessors in the order? Is that MeterProvider/SDK ? Or is it the responsibility of each MeasurementProcessor to invoke the next one in the chain?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear how is this part works with the later wording "The SDK MUST inject the OnMeasure function from the next processor in the chain into the current processor"

Who is invoking the MeasurementProcessors in the order? Is that MeterProvider/SDK ? Or is it the responsibility of each MeasurementProcessor to invoke the next one in the chain?

I have changed the wording around the next parameter, as the previous spec was confusing. The next parameter is a callback that MUST NOT require a reference to the next processor in the chain. On the implementation level, it will most likely end up as a closure. I see it as the SDK's responsibility to ensure processors are executed in the order they were configured on the MeterProvider.

That being said, every processor is able to decide whether it wants to invoke a chain of processors that follow it (by calling next(context, measurement)). Each processor can decide to call next either 0 (drop measurement), 1 (default flow), or N (custom use cases) times.


Each processor registered on the `MeterProvider` is part of a pipeline.

SDK MUST allow users to implement and configure custom processors.

The following diagram shows `MeasurementProcessor`'s relationship to other components in the SDK:

```plaintext
+------------------+
| MeterProvider | +----------------------+ +-----------------+
| Meter A | Measurements... | | Measurements... | |
| Instrument X |-----------------> MeasurementProcessor +-----------------> In-memory state |
| Instrument Y + | | | |
| Meter B | +----------------------+ +-----------------+
| Instrument Z |
| ... | +----------------------+ +-----------------+
| ... | Measurements... | | Measurements... | |
| ... |-----------------> MeasurementProcessor +-----------------> In-memory state |
| ... | | | | |
| ... | +----------------------+ +-----------------+
+------------------+
```

### MeasurementProcessor operations

#### OnMeasure

`OnMeasure` is called when a `Measurement` is recorded. This method is called synchronously on the thread that emitted the `Measurement`, therefore it SHOULD NOT block or throw exceptions.

**Parameters:**

* `context` - the resolved `Context` (the explicitly passed `Context` or the current `Context`)
reyang marked this conversation as resolved.
Show resolved Hide resolved
* `measurement` - a [Measurement](./api.md#measurement) that was recorded
* `next` - a callback to invoke `OnMeasure` on the next processor in the chain. It MUST be callable without a reference to the next processor.
Blinkuu marked this conversation as resolved.
Show resolved Hide resolved

**Returns:** Void

For a `MeasurementProcessor` registered directly on SDK `MeterProvider`, the `measurement` mutations MUST be visible in next registered processors.

Comment on lines +1038 to +1039
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we allow the processor to "drop" the measurement (e.g. the processor decided that it doesn't want the measurement) or other operations beyond modifications on the value and attributes?

Copy link
Member

@pellared pellared Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related question (thus decided to put it here).
Shouldn't the processor also be used when evaluating Enabled?
Shouldn't we also add an OnEnabled hook?

Related comment in other issue:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To allow processors to "drop" measurements, they must be somehow connected to the MetricsReader. I agree that it would be a cool feature to have, providing great flexibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Lightstep Metrics SDK implements a MeasurementProcessor interface which was narrowly scoped to allow modifying the set of attributes for a measurement. In that use-case, we would take the incoming gRPC metadata from the context, look up specific headers, and apply header values as attribute values.

I admit I am not sure what reasons a user would have to modify measured values. Are there well-known use-cases? I found @jack-berg mentioned "unit conversion" here, but I am not sure how that would work--the measurement processor does not change the instrument definition, and the measurement does not include a unit. Are there really use-cases for modifying the value?

That SDK does not permit dropping measurements. Speaking also to @pellared's question about Enabled and whether measurement processors should intercept Enabled calls, I would recommend No. See my position on passing context to the metrics enabled method, #4256 (comment), which states the same. I am nervous about letting measurement processors change measurements and selectively enable/disable call sites because IMO it will make interpreting the resulting data very difficult.

As an example, suppose we have a measurement processor that is designed to redact sensitive attribute values. IMO it would be better to change attributes, not to drop events, because otherwise a user can be easily misled. Suppose we have a counter which counts requests with an attribute for success (boolean) and a client ID (string). We have a policy that says client IDs should not resemble e-mail addresses, otherwise they are invalid. The two options are to redact the client ID (e.g., give it a value like "redacted") or to drop the measurement. If we drop the measurement, all sorts of queries might be impacted. What's my success rate? I have no idea because an unknown number of redacted measurements were dropped.

Therefore, I would propose that measurement processors can only modify attributes, not values, and not drop events.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the measurement processor does not change the instrument definition, and the measurement does not include a unit. Are there really use-cases for modifying the value?

Providing this feature without the ability to do unit conversion or drop measurements would be a miss. Can solve the lack of knowledge about unit by providing the processor access to instrument metadata. I think it could make sense to allow measurements processors to be configurable at the view level, in which case we might also consider allowing views to modify the unit of the resulting stream. Users could then compose a view which: 1. Adds a processor for unit conversion. 2. Adjusts the resulting stream's unit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll come around on this topic. I see how dropping metric events is a useful feature, despite the potential for difficult consequences. Dropping metric events is not very different than sampling traces at 0%. Just like 0% sampling (which we call "non probabilistic"), there is a loss of information, but that is intentional.

@jack-berg Given your statement, I think it means that the Measurement type should be defined as a 3-tuple (Value, Attributes, Instrument). This model works for me--and it resembles the OpenCensus "stats" API. Tangentially, I see a potential for us to form new APIs (like OpenCensus) which accept a list of measurements atomically and apply a single timestamp (e.g., or process the dynamic context once for multiple events).

Let me pose a thought experiment. What does a MeasurementProcessor do better than you could achieve simply by wrapping a MeterProvider with a new instance containing the desired logic? I'm looking at the complexity trade-off here. I see how the desire to modify units comes about -- especially with the base-2 exponential histogram -- we see a desire to change seconds to/from milliseconds w/o loss of information as a compelling use-case. In the wrapped-MeterProvider scenario, the units-conversion wrapper would ("simply") register a new instrument with the delegate MeterProvider having different units and divide/multiply the value on its way through.

I thought of another case that I'm aware of, which calls for modifying the instrument kind, i.e., more than just a change of unit. I'm aware of use-cases for synchronous UpDownCounter instruments where the user would like to separate positive from negative values as two Counters. In this case, the two absolute value instruments convey the rate of ups and down as separate information. Still, the input-to-output mapping is 1:1.

I prefer to think of MeasurementProcessor as something like syntactic sugar for the example I described above, meaning that it can be defined abstractly as a wrapper of meter providers with a per-instrument event translation rule. There seems to be a potential -- do we know any use-cases? -- for one metric API event to translate into more than one metric API event on the wrapped meter provider. In this sense, we could define MeasurementProcessor as a per-instrument function that maps one input measurement into a list of zero or more output measurements, enabling both dropping and proliferation of events.

Copy link
Author

@Blinkuu Blinkuu Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it means that the Measurement type should be defined as a 3-tuple (Value, Attributes, Instrument). This model works for me--and it resembles the OpenCensus "stats" API.

@jmacd I think this makes sense. Having access to an Instrument inside the processor makes it very powerful.

I think it could make sense to allow measurements processors to be configurable at the view level, in which case we might also consider allowing views to modify the unit of the resulting stream. Users could then compose a view which: 1. Adds a processor for unit conversion. 2. Adjusts the resulting stream's unit.

@jack-berg I'm reading the View specification, which explicitly mentions that views work on the "metric" level. Therefore, configuring processors on the Views (instead of on MeterProvider) would require updating the View specification as well, unless I'm misunderstanding something.


Regarding dropping Measurements, changing instrument kinds, modifying the value, or even creating new Measurements on the fly (e.g., split UpDownCounter into two counters), we could make the proposed Measure() method return an array of Measurements instead of Void.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Blinkuu About view-attached processors: I am generally wary of making the metrics SDK more complex, and the idea of making measurement processors view-specific has me vaguely worried. One fear is that this will limit the potential for MeasurementProcessors to be optimized.

If there are multiple readers and multiple measurement processors, do we evaluate the chain of processors per reader or once? I would prefer once.


I fear we're letting implementation details into the specification, if we dictate the use of a "next" processor here. I don't think a next processor is required. A better API for the processor IMO would be to return a measurement, so a signature like

type Processor interface {
  Process(context.Context, Measurement) (_ Measurement, valid bool)
}

In other words, the return value is an optional Measurement. I suggest the specification use pseudo-code such as the following, to answer the question raised in https://github.com/open-telemetry/opentelemetry-specification/pull/4318/files#r1915514480 about the order of operations, and this makes the drop-behavior clear:

func (sdki *sdkInstrument) onEvent(ctx context.Context, m Measurement) {
  for _, processor in sdki.processors() {
    if mr, valid := processor.Process(ctx, m); valid {
      m = mr
    } else {
      // measurement was dropped
      return
    }
  }
  // instrument-specific logic for final processed measurement `m`
  // ...
}

If there's an argument in favor of view-specific measurement processors, I would suggest surveying the implementors of the various metrics SDKs view mechanisms for their opinion. Otherwise, I think it's much simpler to explain to users what's happening with measurement processors: they literally change the events that enter the SDK and are seen by all metric readers alike.


@Blinkuu regarding an array of measurements. This is another question that could impact performance. Unless we need it, I don't think one measurement should be allowed to become >1 measurement, in other words. In the example I gave of translating an UpDownCounter into two Counters, one input event becomes one output event.

A `MeasuremenetProcessor` MAY freely modify `measurement` for the duration of the `OnMeasure` call.

A `MeasurementProcessor` SHOULD invoke `next`. A `MeasurementProcessor` MAY decide to drop the `Measurement` by not invoking the next processor.

### Built-in processors

The standard OpenTelemetry SDK MUST implement default processor as described below.

#### DefaultProcessor

This is an implementation of `MeasurementProcessor` which calculates an in-memory state from incoming `Measurements`.
reyang marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the SDK ensure that this is the last processor in the chain?


## Exemplar

**Status**: [Stable](../document-status.md)
Expand Down
Loading