Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate config Go code from schema #10694

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ptodev
Copy link

@ptodev ptodev commented Jul 22, 2024

Description

This is an attempt to generate the config.go files from Json schema. The goal of the PR is to delete all the config.go files for each component and replace them with some sort of Yaml or Json schema, which will most likely be sourced from the metadata.yaml file for each component. This PR uses an earlier PR as a starting point.

There are many unknowns related to this project. I will start threads on various problems via file comments and line comments, so that we can keep the discussion organised.

Link to tracking issue

Fixes #9769
Related to #5223

SendBatchMaxSize int `json:"send_batch_max_size,omitempty" yaml:"send_batch_max_size,omitempty" mapstructure:"send_batch_max_size,omitempty"`

// SendBatchSize corresponds to the JSON schema field "send_batch_size".
SendBatchSize int `json:"send_batch_size,omitempty" yaml:"send_batch_size,omitempty" mapstructure:"send_batch_size,omitempty"`
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go-jsonschema doesn't seem to use uint. In theory, the minimum: 0 in the schema would be a good enough workaround but it looks like it's not supported right now.

Would you be happy if instances of uint are replaced with int + a validation that the number is >= 0?

Copy link
Contributor

github-actions bot commented Aug 7, 2024

This PR was marked stale due to lack of activity. It will be closed in 14 days.

Comment on lines 60 to 57
if v, ok := raw["timeout"]; !ok || v == nil {
var err error
plain.Timeout, err = time.ParseDuration("200ms")
if err != nil {
return err
}
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I cheated a little bit. The original code was:

	if v, ok := raw["timeout"]; !ok || v == nil {
		plain.Timeout = "200ms"
	}

I edited it a bit so that it compiles. I will have to update the upstream PR for time.Duration support to generate this kind of code correctly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose that we add files like this whenever we need to augment the auto-generated code with some additional behaviour. For example, in this file we added Validate() and createDefaultConfig() functions.

send_batch_size:
type: integer
minimum: 0
default: 8192
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting a default value is important, because otherwise g--jsonschema will autogenerate an *int. Similarly, for string properties which don't have a default value it'd generate a *string. If we do set a default value, we will get an int and a string, with no pointers.

Timeout time.Duration `mapstructure:"timeout"`
// MetadataCardinalityLimit corresponds to the JSON schema field
// "metadata_cardinality_limit".
MetadataCardinalityLimit int `mapstructure:"metadata_cardinality_limit,omitempty"`
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to check whether having the omitempty is a problem. OTel doesn't use it much right now, but it looks like go-jsonschema always puts it in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think omitempty is used by mapstructure at all.

// Default value is 0, that means no maximum size.
SendBatchMaxSize uint32 `mapstructure:"send_batch_max_size"`
// SendBatchMaxSize corresponds to the JSON schema field "send_batch_max_size".
SendBatchMaxSize int `mapstructure:"send_batch_max_size,omitempty"`
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care much that the uint is now an int?
I suppose we don't, because we can add a check to make sure it's not negative. Also, I doubt that the integers people configure are so large that an int wouldn't fit them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do - at the very least we cannot change types like this. We need to deprecate and move over with a set of changes over multiple releases.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could update my fork of go-jsonschema to use uint whenever there is a minimum: 0 or an exclusiveMinimum: 0? It seems like a reasonable feature - maybe it'll also get accepted upstream.

send_batch_max_size:
type: integer
default: 0
minimum: 0
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, go-jsonschema's main branch doesn't do anything when it sees constraints such as minimum and maximum. However, I patched my own fork of go-jsonschema so that it supports it. You can see in the config.go that there are checks to make sure some integers are not negative.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can keep uint32 and we won't have to set such constraints?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Json schema itself doesn't seem to have an unsigned integer type. It only has integer and number (float) types, which can have various constraints.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @atoulme @ptodev , go-jsonschema maintainer here. I have just merged a pr that adds support for numeric validation (multipleOf, maximum, exclusiveMaximum, minimum, exclusiveMinimum) and string patterns in main. But there is still no support for uint types. I am open to discuss it if it is needed, maybe we can fit it in with a bit of work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's neat and appreciated.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @omissis, thank you so much for taking a look! I just realised that go-jsonschema already has a goJSONSchema property which allows users to override the type. I think it should be sufficient for our purposes.

Hi @atoulme! I updated the PR to use things like squashing structs, using uint, and using opaque string types. You can do it with:

goJSONSchema:
  type: uint32

And:

goJSONSchema:
  type: configopaque.String
  imports:
  - go.opentelemetry.io/collector/config/configopaque

There are examples in the metadata.yaml files.

Are you happy with the way the uints and opaque strings are set up? If you are, I'll resolve this GitHub discussion.


// Config defines configuration for OTLP/HTTP exporter.
type Config struct {
confighttp.ClientConfig `mapstructure:",squash"` // squash ensures fields are correctly decoded in embedded struct.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I can't find a good way to schematise squashed fields. This is my main blocker right now.

I was hoping that we can use something like anyOf, but judging by some examples, it also looks awkward.

For now, I just listed all the squashed properties explicitly in metadata.yaml. Does anyone know of a better way? I'm not sure what folks working with Go code normally do in such situations.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @atoulme! Good news! 🥳 I was able to find a way to create squashed structs with go-jsonschema. I will update this PR next week, once I polish my code a bit. The new generated code will look a lot more like the current OTel code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@@ -100,3 +113,5 @@ replace go.opentelemetry.io/collector/internal/globalgates => ../../internal/glo
replace go.opentelemetry.io/collector/consumer/consumerprofiles => ../../consumer/consumerprofiles

replace go.opentelemetry.io/collector/consumer/consumertest => ../../consumer/consumertest

replace github.com/atombender/go-jsonschema => github.com/ptodev/go-jsonschema v0.0.0-20240813163654-5518ba93ee84
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to fork go-jsonschema, because it didn't contain some of the features which I needed. I hope we can upstream those.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would be best, especially if the maintainer of go-jsonschema is involved in this PR

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @ptodev now that the support for numeric comparison has landed, do you think we can look at the duration? so that would be one less blocker here ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to rewrite these shared configs in yaml.

@@ -65,7 +65,7 @@ func (e *baseExporter) start(ctx context.Context, host component.Host) (err erro
e.metricExporter = pmetricotlp.NewGRPCClient(e.clientConn)
e.logExporter = plogotlp.NewGRPCClient(e.clientConn)
headers := map[string]string{}
for k, v := range e.config.ClientConfig.Headers {
for k, v := range e.config.Headers {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this change

if cfg.SendBatchMaxSize > 0 && cfg.SendBatchMaxSize < cfg.SendBatchSize {
return errors.New("send_batch_max_size must be greater or equal to send_batch_size")
// UnmarshalJSON implements json.Unmarshaler.
func (j *Config) UnmarshalJSON(b []byte) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use UnmarshalJSON at all. We use the Validate function to validate.

}

func (c *Config) sanitizedEndpoint() string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this functionality achieved?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment it is not. The migration of otlpexporter and otlphttpexporter to Json schema isn't complete, due to a blocker I hit with migrating squashed fields. I've been hoping we can find a solution to that blocker before we I fix the functionality of otlpexporter and otlphttpexporter.

But in general, we could put functions like this in .go helper files which are not generated by Json schema.

@atoulme
Copy link
Contributor

atoulme commented Aug 13, 2024

This PR is too big - please start with something much smaller.

Copy link
Author

@ptodev ptodev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is too big - please start with something much smaller.

Hi, @atoulme, thank you for your review! I agree that the PR is very large. The main reason for this is to demonstrate how using shared configuration could work between components. This was raised by @yurishkuro in the previous PR. The HTTP server/client settings are the primary example of shared config that I could find in the codebase.

Unfortunately, while working on shared configs, I realised that I don't know how to represent squashed fields in a clean way. I've been hoping that some of the OTel maintainers may have experience with Json schema and that the solution may be obvious to those folks.

}

func (c *Config) sanitizedEndpoint() string {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment it is not. The migration of otlpexporter and otlphttpexporter to Json schema isn't complete, due to a blocker I hit with migrating squashed fields. I've been hoping we can find a solution to that blocker before we I fix the functionality of otlpexporter and otlphttpexporter.

But in general, we could put functions like this in .go helper files which are not generated by Json schema.

// Default value is 0, that means no maximum size.
SendBatchMaxSize uint32 `mapstructure:"send_batch_max_size"`
// SendBatchMaxSize corresponds to the JSON schema field "send_batch_max_size".
SendBatchMaxSize int `mapstructure:"send_batch_max_size,omitempty"`
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could update my fork of go-jsonschema to use uint whenever there is a minimum: 0 or an exclusiveMinimum: 0? It seems like a reasonable feature - maybe it'll also get accepted upstream.

send_batch_max_size:
type: integer
default: 0
minimum: 0
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Json schema itself doesn't seem to have an unsigned integer type. It only has integer and number (float) types, which can have various constraints.

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Sep 14, 2024
@omissis
Copy link

omissis commented Sep 18, 2024

@ptodev @atoulme if this PR can still be considered, I am willing to help to fix go-jsonschema's shortcomings to drive this home. feel free to ping me if needed!

@github-actions github-actions bot removed the Stale label Sep 19, 2024
goJSONSchema is a feature which already exists upstream.
goType doesn't exist upstream.
I added it, because I didn't know goJSONSchema exists.
Copy link
Author

@ptodev ptodev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @omissis, thank you very much for helping, and for maintaining such a useful project! I think the most important things from a Collector point of view are:

Apologies, I realise this is a lot 😅 There is no urgency, and I'd be happy to help! I'm happy to polish my go-jsonschema PRs, as long as we agree that they are going in the right direction :)

Hi @atoulme - I think the most important thing we need to decide right now is how we'd like to generate the SetDefaults and Validate functions. Currently, go-jsonschema doesn't generate such functions. Please feel free to opine on the thread.

send_batch_max_size:
type: integer
default: 0
minimum: 0
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @omissis, thank you so much for taking a look! I just realised that go-jsonschema already has a goJSONSchema property which allows users to override the type. I think it should be sufficient for our purposes.

Hi @atoulme! I updated the PR to use things like squashing structs, using uint, and using opaque string types. You can do it with:

goJSONSchema:
  type: uint32

And:

goJSONSchema:
  type: configopaque.String
  imports:
  - go.opentelemetry.io/collector/config/configopaque

There are examples in the metadata.yaml files.

Are you happy with the way the uints and opaque strings are set up? If you are, I'll resolve this GitHub discussion.

Comment on lines +62 to +64
// SetDefaults sets the fields of Config to their defaults.
// Fields which do not have a default value are left untouched.
func (c *Config) SetDefaults() {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @atoulme, would you be happy with an autogenerated SetDefaults() function such as this one?

I think we will also need a func (c *Config) Validate() function. It could check for constraints such as whether a number is within a certain range. I'm not yet sure how (and if) it would check for required properties though.

cc @omissis

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Oct 17, 2024
@ptodev
Copy link
Author

ptodev commented Oct 29, 2024

Hi @atoulme 👋 Would you be open to using a forked version of go-jsonschema? We haven't had traction on the upstream PRs yet. There are many things we need to upstream, and it will almost certainly take a while for that to happen. If we use a forked version, I could make a smaller PR which is mergable, and then gradually will try to expand it to more and more components.

Such a change would be low-risk - in case of issues with the code generation, commenting out the schema in metadata.yaml should disable it.

The main issue with a forked repo is that it is not clear who should host it and maintain it. I could host it on my own GitHub profile? Or we could use the OpenTelemetry one?

Alternatively, I could try to embed the generation code into the Collector, similarly to the earlier PR? I've been trying to avoid doing that though, since it makes it harder to upstream changes.

@yurishkuro
Copy link
Member

We haven't had traction on the upstream PRs yet.

What do you mean by that? I spot checked a couple PRs, they are in Draft state, sometimes with CI breaking / no tests. I can understand your comment if you're waiting on maintainers to approve the direction of those PRs before investing more time. But if the alternative is to use a fork, then those PR still need to be completed to pass CI and have the required quality before we would consider taking a dependency on such fork in the collector.

@ptodev
Copy link
Author

ptodev commented Oct 29, 2024

Hi @yurishkuro, yes, I agree that I'd have to polish these PRs by adding more tests, etc. I haven't done that yet because I'd prefer to firstly get feedback from the maintainers on whether I'm going in the right direction with those changes. I sent a reminder message to the upstream maintainer to ask about one of the small (but essential) features that we need.

I'm honestly not sure to what extent we can keep a "good quality" fork of go-jsonschema. Some of the features are complicated, and we need some fundamental things like support for subschemas.

Since we're just using go-jsonschema to autogenerate code which then has to pass manual review, I hope that might be ok. If the code generation doesn't run correctly, we could just disable it by commenting out the schema. Then we can fix the code generation issue in our own time without blocking Collector PRs. I don't know if this would be acceptable for you though? It could be a problem if a lot of tooling starts to depend on the Collector's schema files...

The alternative is to wait for go-jsonschema to support all the things we need. If we want to go with this approach, we could close this PR and then reopen it one day if go-jsonschema has the required features?

@yurishkuro
Copy link
Member

@ptodev agreed, less concerns about high quality of it's only code gen. Still, a fork of a reasonably complex project is something to avoid if possible.

@omissis
Copy link

omissis commented Oct 30, 2024

Hey all. I would like to avoid a fork if possible, and cooperate to incorporate the needed changes so that opentelemetry can use them. I do have some time Friday I can dedicate to a first round of reviews, so let's see what comes out of it. Thanks.

@ptodev
Copy link
Author

ptodev commented Oct 31, 2024

Thank you, @omissis! I will be on vacation for the next two weeks, but I will take a look at your feedback when I'm back.

@github-actions github-actions bot removed the Stale label Nov 1, 2024
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve otel collector configuration w/ JSON schema
4 participants