Define syntax for escaping declarative config env var substitution #4375

jack-berg · 2025-01-17T23:17:38Z

Resolves #3914.

An environment variable substitution reference which starts with $$ (i.e. $${API_KEY}) is escaped. Implementations should strip the leading $ character and not perform substitution (i.e. $${API_KEY} => ${API_KEY}).

See java implementation here: open-telemetry/opentelemetry-java#7033

tsloughter · 2025-01-22T12:14:24Z

Why $ and not \? The latter being what I'd expect and try first when using this since it is what I'd do in a shell script.

Also doesn't this break being able to do something like $${NUM_OF_DOLLARS}?

tsloughter

Approving despite my comments since $$ is what the collector uses and need for $${DOLLARS} seem unnecessary in an SDK configuration :)

marcalff

The escape sequence is not fully specified.

If I want ${API_KEY} in the environment variable value, this says it should be encoded as $${API_KEY}, ok.

If I want $${API_KEY} in the value, how should this be encoded ?

If '$' is used to as an escape character, then the spec must have a well defined way to print a $ non escaping character, so that any string containing $ can be represented.

marcalff · 2025-01-22T13:12:22Z

The escape sequence is not fully specified.

If I want ${API_KEY} in the environment variable value, this says it should be encoded as $${API_KEY}, ok.

If I want $${API_KEY} in the value, how should this be encoded ?

If '$' is used to as an escape character, then the spec must have a well defined way to print a $ non escaped character, so that any string containing $ can be represented.

Resolution could be to say that $${API_KEY} is encoded as $$${API_KEY}, and so forth, but this should be in the spec and not open to interpretation.

pellared

Maybe we should do it simply in the same way as docker compose. See https://docs.docker.com/reference/compose-file/interpolation/

You can use a $$ (double-dollar sign) when your configuration needs a literal dollar sign.

Is also looks that it is what OTel Collector is doing as well:

https://github.com/open-telemetry/opentelemetry-collector/blob/461a3558086a03ab13ea121d12e28e185a1c79b0/confmap/internal/e2e/testdata/expand-escaped-env.yaml

jack-berg · 2025-01-22T15:31:20Z

Maybe we should do it simply in the same way as docker compose. See https://docs.docker.com/reference/compose-file/interpolation/

You can use a $$ (double-dollar sign) when your configuration needs a literal dollar sign.

Is also looks that it is what OTel Collector is doing as well:

This is the syntax proposed in this PR.

pellared · 2025-01-22T16:12:25Z

This is the syntax proposed in this PR.

I disagree. According to this PR something like "a $$ b" is evaluated to a $$ b whereas for OTel Collector and Docker Compose it will be evaluated to a $ b. Meaning in both OTel Collector and Docker Compose $$ is always escaped to $ and not only when it is matching the env var substitution regex.

jack-berg · 2025-01-22T18:24:33Z

@tsloughter:

Approving despite my comments since $$ is what the collector uses

Yes exactly. Matching collector syntax is the motivation here.

@pellared:

I disagree. According to this PR something like "a $$ b" is evaluated to a $$ b whereas for OTel Collector and Docker Compose it will be evaluated to a $ b. Meaning in both OTel Collector and Docker Compose $$ is always escaped to $ and not only when it is matching the env var substitution regex.

I see. That's fair.

When I was writing this PR I did look at the collector's language, which is similar to docker's, and which I found to be overly simplistic:

Use $$ to indicate a literal $.

Its missing guidance on how $$ interacts with environment variable substitution - specifically, the ordering.

For example, suppose we have: $${FOO} and FOO=bar.

If we perform env env vars substitution after $$ substitution, we get $${FOO} => ${FOO} => bar, which is not what the user wants.

Alternatively, env var substitution might take place before $$ substitution, we get $${FOO} => $bar, which is not what the user wants.

So $$ subsitution can't happen before or after env var substitution. Instead, it must happen simultaneously as a part of env var substitution. Hence, my inclusion of it in the env var regular expression.

You also gave an example of "a $$ b" evaluating to "a $ b". What is the use of escaping $ outside of references to env var substitution? A user who wishes to resolve "a $ b" can just enter that value as is - no escape needed.

@marcalff

If I want $${API_KEY} in the value, how should this be encoded ?
Resolution could be to say that $${API_KEY} is encoded as $$${API_KEY}, and so forth, but this should be in the spec and not open to interpretation.

My understanding is that the regular expression / logic as proposed in this PR will indeed resolve $$${API_KEY} to be $${API_KEY}.

The reason is that the regular expression \${1,2}\{(?:env:)?(?<ENV_NAME>[a-zA-Z_][a-zA-Z0-9_]*)(:-(?<DEFAULT_VALUE>[^\n]*))?\} matches exactly 1 OR 2 $ characters. So for $$${API_KEY}, the first $ is unmatched and left alone. The following characters, $${API_KEY}, matches the expression, but since it starts with $$, it is escaped instead with the first $ dropped instead of replaced. So in total we have $ + ${API_KEY} = $${API_KEY}.

I added an additional test case to my java prototype implementation of this. Note - no changes to the implementation were required to make this test pass.

More examples in the specification might be helpful.

tsloughter · 2025-01-22T18:27:07Z

Put this comment in the wrong repo:

Oh, now I see the collector uses $$. Then I suppose this is the way to go. May want to also mention additional escaping like allowing $${DOLLARS} to be $10?

pellared · 2025-01-22T18:37:16Z

@jack-berg, I find the Collector and and Docker Compose docs clearer. $${FOO} is evaluated to ${FOO}.

You also gave an example of "a $$ b" evaluating to "a $ b". What is the use of escaping $ outside of references to env var substitution? A user who wishes to resolve "a $ b" can just enter that value as is - no escape needed.

This will provide different experience for users who have experience with Collector and Docker Compose who are used to always type $$ when they want to add a $ literal. For instance, $$$${FOO} should be evaluated to $${FOO}.

I agree with the intention to match with Collector (and Docker Compose) syntax 👍

jack-berg · 2025-01-22T19:30:35Z

I find the Collector and and Docker Compose docs clearer. $${FOO} is evaluated to ${FOO}.

Its seems clear on paper until you try to implement it. The docker and collector specs are incomplete.

Why does $${FOO} resolve to ${FOO} and not $${FOO} => $bar (env var before $$) or $${FOO} => ${FOO} => bar (env var after $$)?

If you try to implement this, and treat $$ => $ substitution separately from env var substitution, then you need to answer the question of ordering: which comes first? And as I noted above, both possible answers of ordering yield incorrect results.

I would prefer to match the algorithm of the collector, but looking for advice on how to describe this in such a way that 11 language implementations will get it right. Despite the seemingly simple syntax as described in docs, the collector's substitution logic is quite complex / nuanced. The current env var substitution allows for simpler implementations that can leverage regular expressions to do the heavy lifting.

I considered trying to get the both of both worlds (i.e. collector behavior and a regular expression based spec), by changing the regular expression to match:

\${1,2}(\{(?:env:)?(?<ENV_NAME>[a-zA-Z_][a-zA-Z0-9_]*)(:-(?<DEFAULT_VALUE>[^\n]*))?\})?

I.e. make the whole {API_KEY:-default} part optional so the expression still matches against just $$, but this still has ordering challenges. Consider @marcalff's example of wanting a resolved string of $${API_KEY}. How do you accomplish this? With the updated regular expression and assuming API_KEY=value, $$${API_KEY}, resolves to $value. This is because the expression matches twice:

the first $$ matches and resolves to $
the second ${API_KEY} matches and resolves to value

I'm actually not sure there's a way to accomplish the collector's behavior without requiring implementations to carefully recreate its complex recursive algorithm which statefully walks values, character by character. (Not sure how I would approach trying to to specify this.) And I think a regular expression based approach is essential to having a low enough complexity bar to implement.

pellared · 2025-01-22T20:07:32Z

Docker composer also does it recursively: https://github.com/compose-spec/compose-go/blob/main/interpolation/interpolation.go

looking for advice on how to describe this in such a way that 11 language implementations will get it right

I think "specification by example" will help achieving it. Meaning specifying expected results for given input.

regular expression based spec

Maybe this would help: https://github.com/compose-spec/compose-go/blob/main/template/template.go

collector's substitution logic is quite complex / nuanced

I'm actually not sure there's a way to accomplish the collector's behavior without requiring implementations to carefully recreate its complex recursive algorithm which statefully walks values, character by character.

As far as I remember, I was calling it out long ago during some SIG meeting that env var substitution is complex. I think that I even said that each language should have fuzz tests for such parsing because of its complexity.

I was proposing that the users could simply use envsubst so that each SDK could make the parsing of the files a lot simpler and less bug-prone.

However, the feedback was that the Configuration SIG wants to try (challenge) it given the assumption that users need it.

If this won't be working like in OTel Collector then I think we will be giving bad user experience.

Given that OTel Collector is still not stable, the alternative way could be simplifying the env var substitution in both Collector config and SDK config.

pellared · 2025-01-22T20:21:04Z

I was proposing that the users could simply use envsubst so that each SDK could make the parsing of the files a lot simpler and less bug-prone.

We could also provide a seperate OTel tool which does the env var substitution in the same way as the OTel Collector.
This way we would do not have to implement it in each language, but just reuse the OTel Collector code.
We could add env var interpolation as an opt-in feature in the SDKs (enabled via an env var of course 😉).
I think this would be a more pragmatic way of introducing YAML configs to the SDKs.

jack-berg · 2025-01-22T20:54:24Z

As far as I remember, I was calling it out long ago during some SIG meeting that env var substitution is complex.

It doesn't have to be complex. With a regex-based specification, the java implementation is only 28 lines long, plus a little more boiler plate to make sure that logic is invoked by the YAML parser.

We could also provide a seperate OTel tool which does the env var substitution in the same way as the OTel Collector.

We actually already have this. The configuration validator is a tool written in go, and available as a binary or docker container, which accepts a target YAML config file and:

performs env var substiution
validates the contents against the configuration JSON schema

The problem is, not all ecosystems will want or be able to bundle in a binary specify to a particular platform / architecture. So this tool is available to help implementations, but we cannot rely on it being used everywhere.

If this won't be working like in OTel Collector then I think we will be giving bad user experience.
Given that OTel Collector is still not stable, the alternative way could be simplifying the env var substitution in both Collector config and SDK config.

As of this PR, the differences I know of are:

collector substitutes $$ for $ even if not part of an env var substitution ref
collector supports reference values from other provides besides env vars, which configuration has punted on to reduce maintenance burden.

So there are already differences between collector and language implementations. Its probably safe to say that the env var substitution for declarative config is a subset of the collector. And this seems fine to me. Especially if its a strict subset (i.e. declarative config takes care not to introduce new things the collector will not add).

Not sure if @open-telemetry/collector-maintainers would be open to it, but it doesn't seem like much is lost by removing generic support for replacing $$ with $ outside of env var substitution references. WDYT?

I get the appeal "docker does this" argument, but docker env var substitution breaks from collector substitution and declarative configuration is other key ways. All try to have a syntax inspired by shell-style substitution, but each supports shell-style to varying extents, and the collector extends it with the provider concept which is distinctly different.

marcalff · 2025-01-23T09:09:14Z

So in total we have $ + ${API_KEY} = $${API_KEY}.

I see.

This is not trivial when reading the regexp that checks for 1 or 2 $ characters, as in \${1,2}, but I agree it works as intended then, but only with a twist missing in the spec:

The ${API_KEY} -> BAR substitution only happens in a specific context, where there are no preceding dollar characters.

Having examples so that every language implements the proper behavior is critical here.

marcalff · 2025-01-23T10:05:47Z

For example, suppose we have: $${FOO} and FOO=bar.

If we perform env env vars substitution after $$ substitution, we get $${FOO} => ${FOO} => bar, which is not what the user wants.

Alternatively, env var substitution might take place before $$ substitution, we get $${FOO} => $bar, which is not what the user wants.

So $$ subsitution can't happen before or after env var substitution. Instead, it must happen simultaneously as a part of env var substitution. Hence, my inclusion of it in the env var regular expression.

This is not how parser works, the second dollar character can not be used to reduce two different rules at the same time.

Either $$ is reduced first:

.$${FOO} -> $.${FOO} -> $$.{FOO} -> <$>.{FOO} -> <$>{.FOO} -> ... -> <$>{FOO}.

<$> represents an escaped $ character (a non terminal in grammar speak), which is different from a terminal $ character.

Or $ . ${FOO} is reduced to $ BAR .:

.$${FOO} -> $.${FOO} -> $$.{FOO} -> $${.FOO} -> ... -> $${FOO}. -> $BAR.

This is a typical shift/reduce conflict in the grammar that needs to be resolved by reformulating the grammar (so that $$ takes priority as intended).

The nasty part here is to convey sufficient information in the spec without having to produce a full parser grammar and reference litterature.

Proposal:

(1)

The dollar character is used as an escape sequence, so that $$ in the input yaml data is translated into a single $ character in the resolved output string.

(2)

Expression matching the regular expression:

${(?:env:)?(?<ENV_NAME>[a-zA-Z_][a-zA-Z0-9_])(:-(?<DEFAULT_VALUE>[^\n]))?}

are replaced with an environment variable substitution

(3)

The yaml input string is consumed from left to right, with rule (1) taking precedence other rule (2).

Data printed in the output string does not participates in further parsing.

Examples, with FOO=BAR

(a)

${FOO} -> BAR

(b)

$${FOO} -> ${FOO}, parsed as an escaped dollar sign followed by {FOO}.

(c)

$$${FOO} -> $BAR, parsed as an escaped dollar sign followed by a substitution.

(d)

$$$${FOO} -> $${FOO}, parsed as two escaped dollar signs followed by {FOO}.

pellared · 2025-01-23T15:31:16Z

The problem is, not all ecosystems will want or be able to bundle in a binary specify to a particular platform / architecture. So this tool is available to help implementations, but we cannot rely on it being used everywhere.

I created a separate issue: #4384
I drifted away too much from the original issue and PR. Sorry for that 🙏 (no bad intentions)

Regarding this PR I just expect the same env var substitution behavior from all OTel Components.

jack-berg · 2025-01-23T16:36:22Z

Nice proposal @marcalff - I implemented this in my prototype PR in this commit. Its only slightly more complex that a pure regex based approach.

The nasty part here is to convey sufficient information in the spec without having to produce a full parser grammar and reference litterature.

I agree, but I think your proposal balances it nicely. Let me take a shot at updating the text in this PR accordingly.

TylerHelmuth · 2025-01-23T17:09:45Z

I agree with @marcalff's proposal at #4375 (comment) as this is the collector's current behavior. Additionally, the collector will escape $$ anywhere in a string including the value return from expanding an env var.

See https://github.com/open-telemetry/opentelemetry-collector/blob/main/confmap/internal/e2e/expand_test.go as a good list of examples for how the collector handles expansion/escaping (input for the test can be found here). If you're wondering how the collector handles a specific scenario that is not already covered in those tests let me know and I'll add it.

Dissmissing as I will be OOO next week and I see that this in going into the right direction

Dismissing review to avoid blocking, will take a look at changes.

jack-berg · 2025-01-24T16:49:20Z

Ok I've pushed a commit to reflect @marcalff's proposal, including some pseudocode of what an implementation might look like. I've also converted the examples section to a markdown table and added more examples per this conversation. The re-org improves readability as the set of examples grows.

Let me know what you think.

jack-berg · 2025-01-24T17:01:11Z

specification/configuration/data-model.md

@@ -75,6 +75,29 @@ more non line break characters (i.e. any character except `\n`). If a referenced
 environment variable is not defined and does not have a `DEFAULT_VALUE`, it MUST
 be replaced with an empty value.

+The `$` character is an escape sequence, such that `$$` in the input is


TODO: add changelog entry before merging

Define syntax for escaping declarative config env var substitution

e7f6017

jack-berg requested review from a team as code owners January 17, 2025 23:17

jack-berg mentioned this pull request Jan 17, 2025

Add support for escaping declarative config env var substitution open-telemetry/opentelemetry-java#7033

Open

trask added the changelog.opentelemetry.io label Jan 21, 2025

carlosalberto approved these changes Jan 22, 2025

View reviewed changes

tsloughter approved these changes Jan 22, 2025

View reviewed changes

marcalff previously requested changes Jan 22, 2025

View reviewed changes

pellared previously requested changes Jan 22, 2025

View reviewed changes

marcalff self-requested a review January 23, 2025 09:10

mx-psi requested a review from TylerHelmuth January 23, 2025 11:15

pellared mentioned this pull request Jan 23, 2025

[Configuration] Environment variable substitution opt-in via OTEL_CONFIG_INTERPOLATE #4384

Closed

Convert examples to table, adopt marcalffs proposal

92391d9

jack-berg added 3 commits January 24, 2025 10:53

typo

760426a

Add double quotes to output

20121c4

original => input

e50dbb6

jack-berg commented Jan 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define syntax for escaping declarative config env var substitution #4375

Define syntax for escaping declarative config env var substitution #4375

jack-berg commented Jan 17, 2025 •

edited

Loading

tsloughter commented Jan 22, 2025

tsloughter left a comment

marcalff left a comment •

edited

Loading

marcalff commented Jan 22, 2025

pellared left a comment •

edited

Loading

jack-berg commented Jan 22, 2025

pellared commented Jan 22, 2025 •

edited

Loading

jack-berg commented Jan 22, 2025 •

edited

Loading

tsloughter commented Jan 22, 2025

pellared commented Jan 22, 2025

jack-berg commented Jan 22, 2025

pellared commented Jan 22, 2025

pellared commented Jan 22, 2025 •

edited

Loading

jack-berg commented Jan 22, 2025

marcalff commented Jan 23, 2025 •

edited

Loading

marcalff commented Jan 23, 2025 •

edited

Loading

pellared commented Jan 23, 2025 •

edited

Loading

jack-berg commented Jan 23, 2025

TylerHelmuth commented Jan 23, 2025 •

edited

Loading

jack-berg commented Jan 24, 2025

jack-berg Jan 24, 2025

Define syntax for escaping declarative config env var substitution #4375

Are you sure you want to change the base?

Define syntax for escaping declarative config env var substitution #4375

Conversation

jack-berg commented Jan 17, 2025 • edited Loading

tsloughter commented Jan 22, 2025

tsloughter left a comment

Choose a reason for hiding this comment

marcalff left a comment • edited Loading

Choose a reason for hiding this comment

marcalff commented Jan 22, 2025

pellared left a comment • edited Loading

Choose a reason for hiding this comment

jack-berg commented Jan 22, 2025

pellared commented Jan 22, 2025 • edited Loading

jack-berg commented Jan 22, 2025 • edited Loading

tsloughter commented Jan 22, 2025

pellared commented Jan 22, 2025

jack-berg commented Jan 22, 2025

pellared commented Jan 22, 2025

pellared commented Jan 22, 2025 • edited Loading

jack-berg commented Jan 22, 2025

marcalff commented Jan 23, 2025 • edited Loading

marcalff commented Jan 23, 2025 • edited Loading

pellared commented Jan 23, 2025 • edited Loading

jack-berg commented Jan 23, 2025

TylerHelmuth commented Jan 23, 2025 • edited Loading

jack-berg commented Jan 24, 2025

jack-berg Jan 24, 2025

Choose a reason for hiding this comment

jack-berg commented Jan 17, 2025 •

edited

Loading

marcalff left a comment •

edited

Loading

pellared left a comment •

edited

Loading

pellared commented Jan 22, 2025 •

edited

Loading

jack-berg commented Jan 22, 2025 •

edited

Loading

pellared commented Jan 22, 2025 •

edited

Loading

marcalff commented Jan 23, 2025 •

edited

Loading

marcalff commented Jan 23, 2025 •

edited

Loading

pellared commented Jan 23, 2025 •

edited

Loading

TylerHelmuth commented Jan 23, 2025 •

edited

Loading