Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not possible to inject @metadata using fields section of config #142

Open
jgough opened this issue Sep 17, 2021 · 9 comments · Fixed by #143
Open

Not possible to inject @metadata using fields section of config #142

jgough opened this issue Sep 17, 2021 · 9 comments · Fixed by #143
Labels

Comments

@jgough
Copy link

jgough commented Sep 17, 2021

Related to #126

With the fix for above now merged into master I'm building from master and trying to inject nested data into @metadata that would normally be added by the s3 plugin (which adds [@metadata][s3][key])

If I try and inject @metadata then it does not get set correctly. For example a test with:

fields:
  "[a][b]": "foo"
  "[@metadata][c][d]": "bar"
export_metadata: true

outputs:

{
  "a": {
    "b": "foo"
  },
  "d": "bar"
}

But I expected

{
  "a": {
    "b": "foo"
  },
  "@metadata": {
    "c": {
      "d": "bar"
    }
  }
}

Maybe instead I should be looking to mock the input plugin instead?

@breml breml added the bug label Sep 20, 2021
@breml
Copy link
Collaborator

breml commented Sep 20, 2021

@jgough I will have a look at this. It looks like it is resolvable, because passing @metadata works for me in the fields in testcases (nested instead of Logstash syntax) like this:

{
  "export_metadata": true,
  "testcases": [
    {
      "fields": {
        "@metadata": {
          "key": "value"
        }
      }
    }
  ]
}

@breml
Copy link
Collaborator

breml commented Sep 20, 2021

I looked into this a little bit more and it looks like we have two issues:

  1. addressing @metadata fields with Logstash config syntax ([@metadata][field]) does not work.
  2. Nested objects in global fields and per test case fields are not merged.

For the first, I think I can come up with a fix.
For the second the question is, if we should support deep-merging or not. If we decided to not support deep-merging, then I think we just need to add this to the documentation. Otherwise an other fix is necessary to support deep-merging.

@magnusbaeck What do you think about deep-merging?

@jgough
Copy link
Author

jgough commented Sep 23, 2021

May I add my thoughts on deep merging? Interesting problem.

I kind of personally feel that there are use cases for setting [@metadata][field1] and [@metadata][field2] separately in global and test fields. But I feel that anything set on a per-test basis should overwrite global fields. I also feel that the top-level key in the testcase fields (be it a logstash format or a json/yaml format) should completely overwrite anything from the global fields, as the intent could be argued to set specifically that key (and anything underneath is data for that key)

So I feel the following would make most sense:

Case 1:

{
  "fields": {
    "[data][field1]": "foo"
  },
  testcases: {
    "fields": {
      "[data][field1]": "bar",
      "[data][field2]": "baz"
    }
  }
}

Should result in

"[data][field1]": "bar"
"[data][field2]": "baz"

Case 2:

{
  "fields": {
    "data": {
      "field1": "foo"
    }
  },
  testcases: {
    "fields": {
      "[data][field2]": "baz"
    }
  }
}

Result:

"[data][field1]": "foo"
"[data][field2]": "baz"

Case 3:

{
  "fields": {
    "[data][field1]": "foo"
  },
  testcases: {
    "fields": {
      "data": {
        "field2": "baz"
      }
    }
  }
}

Result:

"[data][field2]": "baz"

(data is overwritten entirely by testcase fields)

Case 4:

{
  "fields": {
    "data": {
      "field1": "foo"
    }
  },
  testcases: {
    "fields": {
      "data": {
        "field2": "baz"
      }
    }
  }
}

Result:

"[data][field2]": "baz"

(data is overwritten entirely by testcase fields)

@magnusbaeck
Copy link
Owner

There are legitimate use cases for both behaviors (deep merging and shallow merging) so we'll probably upset someone whatever we choose. The [field][subfield] syntax could result in deep merging while specifying a full object would overwrite the whole inherited value. That makes intuitive sense to me. Doing so should make everyone happy except that the two forms of specifying a field would no longer be equivalent. I suspect it would also be harder to implement. What do you think?

@breml
Copy link
Collaborator

breml commented Sep 24, 2021

OK, if I got this right, the proposal from @jgough is exactly what @magnusbaeck has sumarized in

The [field][subfield] syntax could result in deep merging while specifying a full object would overwrite the whole inherited value.

I will have a look at the current implementation in order to figure out the effort to achieve this. Of course, this distinction between json objects and Logstash syntax would need to be prominently highlighted in the documentation.

@breml breml reopened this Sep 24, 2021
@magnusbaeck
Copy link
Owner

Yes, you're right! I apparently didn't fully internalize the examples.

But yes, this behavior would indeed have to be well-documented. Technically it would also be a backwards-incompatible change, so if we don't get it in for v2 we'd have to wait for v3.

@breml
Copy link
Collaborator

breml commented Sep 24, 2021

I think the generalized algorithm for this would go like this:

  1. start with an empty target object
  2. for all "non Logstash syntax" keys in the global fields object, set the key in the target object to the respective value.
  3. for all "non Logstash syntax" keys in the global fields object, evaluate the target key and replace this key in the target object with the respective value.
  4. for all "non Logstash syntax" keys in the testcase fields object, set the key in the target object to the respective value.
  5. for all "non Logstash syntax" keys in the testcase fields object, evaluate the target key and replace this key in the target object with the respective value.

I think, this algorithm would handle all the test cases listed above by @jgough correctly. It is worth mentioning, that the value could be a "basic value" (e.g. string, number) or again an object (or an array). So it would be possible to do something like this:

{
  "fields": {
    "data": {
      "field1": "global1",
      "parent1": {
        "child1": "globalchild1"
      },
      "parent2": {
        "child2": "globalchild2"
      }
    }
    "[data][field2]": "global2"
  },
  testcases: {
    "fields": {
      "[data][parent1]": "parent1",
      "[data][parent2][child3]": "child3",
      "[data][parent3]": {
        "child4": "child4"
      },
      "[data][field2]": "field2",
    }
  }
}

which would result in

"[data][field1]": "global1"
"[data][parent1]": "parent1"
"[data][parent2][child2]": "globalchild2"
"[data][parent2][child3]": "child3"
"[data][parent3][child4]": "child4"
"[data][field2]": "field2"

or again in JSON, the result would look like this:

"data": {
  "field1": "global1",
  "parent1": "parent1",
  "parent2": {
    "child2": "globalchild2",
    "child3": "child3"
  },
  "parent3": {
    "child4": "child4"
  },
  "field2": "field2"
}

What do you think?

@breml
Copy link
Collaborator

breml commented Sep 24, 2021

I did a first iteration to implement the above specification, but it is not as straight forward as I initially though. There are a lot of edge cases which we did not yet discuss, e.g. what logstash config fields are nested in regular objects?

e.g.

{
  "fields": {
    "data": {
      "[field1]": "value",
      "[parent1][child]": "globalchild1"
    }
    "[data][field2]": "global2"
  },
...  ]
}

Currently I have the feeling, that this could end up in a bigger refactoring of the test case processing logic and I see the risk, that the we need to duplicate this code in order to preserve backwards compatibility with the standalone (V1) mode.

@jgough
Copy link
Author

jgough commented Sep 27, 2021

I've had some further thoughts on this. Maybe it would just be best to keep it simple and forbid mixing the two formats? Use one or the other - but not both. I was unaware of the non-logstash config way of doing it until recently. There's no reason I can see that anyone should need to mix them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants