Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/transform] Add support for flat configuration style #37444

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

edmocosta
Copy link
Contributor

Description

This PR is part of #29017, and a split from #36888. It changes the transformprocessor, adding support for flat configuration styles.

Change log:

  • It now supports an additional configuration style, where statements are expressed as a list of strings, being the path's context required, and the context inferred from them thanks to the context inferrer & ottl.ParserCollection. For example:
    log_statements:
     - set(log.body, "bear") where log.attributes["http.path"] == "/animal"
     - set(resource.attributes["name"], "bear")
  • It does support mixed configuration styles.
  • The context's cache values are only shared among flat statements
  • Structured configuration cache values are still isolated, which means that a cache written using a structured configuration style will only be available for that configuration group's statements, and won't be shared with flat statements and/or other structured configuration groups, for example:
    log_statements:
    - set(resource.cache["flat"], "value")
    
    -  statements:
       - set(resource.cache["name"], "bear")
       - set(resource.attributes["name"], resource.cache["name"]) # OK
       - set(resource.attributes["name"], resource.cache["flat"]) # Fail(not set by this group of statements)
    
    - set(resource.attributes["name"], resource.cache["name"]) # Fail(not set by a flat statement)
    - set(resource.attributes["flat"], resource.cache["flat"]) # OK
    
    -  statements:
       - set(resource.attributes["name"], resource.cache["name"]) # Fail(set by another group)

Link to tracking issue

#29017

Testing

Unit tests

// Although it's configurable via `mapstructure`, users won't be able to set it on their
// configurations, as it's currently meant for internal use only, and it's validated by
// the transformprocessor Config unmarshaller function.
SharedCache bool `mapstructure:"shared_cache"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's currently being set programmatically, and does not allow users to configure it on their configurations (https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/37444/files#diff-1e527186a992bb04852a9e8cd6fe43ef611d0e071360c4e40a1432a30efc1d38R89).

That's a conservative approach to keep the behavior the same, but there's no technical reason to not allow it.
if you folks also think it might be useful, we could make this setting available, so users would be able to control which statement's groups are using the shared cache.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets be opinionated and hide it for now. Config support can be added later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we unexport it and/or remove the mapstructure tags? That would mean the unmarshal function doesn't have to worry about users trying to set it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I couldn't find a clean solution for this field, so I ended up with this approach, considering there's a possibility of making this setting available to users in the future.

Given we're still relying on mapstructure to unmarshal the configuration, unexporting this field would require both, a custom unmarshalling function for common.ContextStatements to set the field value, and some mechanism to pass this information down from the transformprocessor.Config Unmarshal function (which is the one who knows its value). Unexported fields are ignored by mapstructure as it's not possible to set their values using reflection.

I was able to unexport it and make it work by passing the extra shared_cache key here (as it's currently doing), and an extra confmap.WithIgnoreUnused() option here (otherwise mapstructure returns an error), then with that key in the conf map, we just need to read it and set the field value on the common.ContextStatements unmarshaller function. The problem with this approach is that invalid keys are not validated anymore, and we would need to validate them manually, which IMO, is not ideal.

Finally, another option would be removing the mapstructure tag and keep it exported, so we wouldn't need to worry about users trying to set it on their configurations. To set it internally, we would need to use reflection, as I initially implemented on the draft (see 498f9b1).

Do you have any thoughts or ideas on how to work it around?

Copy link
Member

@TylerHelmuth TylerHelmuth Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping that since we're using a custom unmarshaller we it could be the definitive source of whether that value should be true or false. In my head we'd be able to identify if the user is using the flat style and then set c.sharedCache ourselves in the Unmarshall func.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the implementation again, really any time we're doing map[string]any manipulation in the Unmarshall function it would be great to work directly on the c *Config if we can.

Copy link
Contributor Author

@edmocosta edmocosta Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping that since we're using a custom unmarshaller we it could be the definitive source of whether that value should be true or false. In my head we'd be able to identify if the user is using the flat style and then set c.sharedCache ourselves in the Unmarshall func.

We're still relying on mapstructure to unmarshall the configuration, the current logic is only normalizing the flat configuration style yaml map so it can be properly unmarshalled as it was configured using the structured configuration style. That's why we're manipulating map[string]any values instead of the Config struct.

Here is an example of the otlpreceiver doing something similar: https://github.com/open-telemetry/opentelemetry-collector/blob/2447a81885fc580860860bd6a8768422a70c99f8/receiver/otlpreceiver/config.go#L63-L90

In that case it has a 1:1 relation, the yaml config map is compatible with the target structure, so it can call conf.Unmarshal on the very beginning as it's doing. It does not apply to us, as the flat configuration styles is not compatible with the Config struct.

Looking at the implementation again, really any time we're doing map[string]any manipulation in the Unmarshall function it would be great to work directly on the c *Config if we can.

If we move some code around, we can unexport the field and have a hybrid approach without using reflection. After calling conf.Unmarshal, we can iterate over the context statements setting the sharedCache value. For that, we would need to put both transformprocessor.Config and common.ContextStatements into the same package, which I guess wouldn't be an issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we move some code around, we can unexport the field and have a hybrid approach without using reflection. After calling conf.Unmarshal, we can iterate over the context statements setting the sharedCache value. For that, we would need to put both transformprocessor.Config and common.ContextStatements into the same package, which I guess wouldn't be an issue.

ya something like this sounds like a good idea to try.

processor/transformprocessor/config.go Outdated Show resolved Hide resolved
processor/transformprocessor/config.go Show resolved Hide resolved
processor/transformprocessor/config.go Show resolved Hide resolved
processor/transformprocessor/internal/common/cache.go Outdated Show resolved Hide resolved
// Although it's configurable via `mapstructure`, users won't be able to set it on their
// configurations, as it's currently meant for internal use only, and it's validated by
// the transformprocessor Config unmarshaller function.
SharedCache bool `mapstructure:"shared_cache"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we unexport it and/or remove the mapstructure tags? That would mean the unmarshal function doesn't have to worry about users trying to set it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
processor/transform Transform processor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants