-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[teleport] Update event-groups ingest pipeline to manage cloud fields if already present #12851
Conversation
1449981
to
69b1050
Compare
🚀 Benchmarks reportTo see the full report comment with |
# This could fail if `cloud.account.id` field already exists in the doc. Forced to override the value to avoid throw an exception. | ||
override: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cloud.instance.id
could be set in three different processors using different source fields:
- teleport.audit.aws_host
- teleport.audit.db_gcp_instance_id
- teleport.audit.instance_id
- rename: | ||
field: teleport.audit.region | ||
target_field: cloud.region | ||
ignore_missing: true | ||
ignore_failure: true | ||
# This could fail if `cloud.region` field already exists in the doc. Forced to override the value to avoid throw an exception. | ||
override: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cloud.region
could be set in three different processors using different source fields:
- teleport.audit.aws_region
- teleport.audit.db_aws_region
- teleport.audit.region
@@ -1426,11 +1438,14 @@ processors: | |||
field: teleport.audit.account_id | |||
target_field: cloud.account.id | |||
ignore_missing: true | |||
# This could fail if `cloud.account.id` field already exists in the doc. Forced to override the value to avoid throw an exception. | |||
override: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no other processor updating cloud.account.id
.
- rename: | ||
field: teleport.audit.aws_service | ||
target_field: cloud.service.name | ||
ignore_missing: true | ||
# This could fail if `cloud.service.name` field already exists in the doc. Forced to override the value to avoid throw an exception. | ||
override: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No other processor updating cloud.service.name
.
- rename: | ||
field: teleport.audit.aws_host | ||
target_field: cloud.instance.id | ||
ignore_missing: true | ||
# This could fail if `cloud.instance.id` field already exists in the doc. Forced to override the value to avoid throw an exception. | ||
override: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding override: true
into these rename processors makes that the corresponding cloud field values will be set with the value of the latest rename processor executed.
Is that ok/expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Direct link to the discussion in the original PR: #12624 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we enabled an approach like the solution that was implemented for the qualys case (user can specify what they are interested in), or a hard "use the data from the data source" approach, then those two fields cease to be an issue. The remaining fields,
cloud.region
andcloud.instance.id
look like they would then just fall out (depending on how teleport handles theteleport.audit.{region,instance_id}
cases (are they disjoint with the explicitly named fields,teleport.audit.{aws,db_aws}
andteleport.audit.{aws_host,db_gcp_instance_id}
respectively. Unfortunately the test cases don't give any indication about this and I cannot find any reference for it in the teleport documentation; my feeling is that the more generic case should be overridden by the more specific case, but if an override occurs, then an error should be written in and the pipeline be continued. In the case that the we see apparently inconsistent fields being set (e.g.teleport.audit.aws_host
,teleport.audit.db_gcp_instance_id
) this would just be an error. I'm not sure what the behaviour ofteleport.audit.aws_region
v.teleport.audit.db_aws_region
should be; are these mutually exclusive?
So, I don't have neither the context about teleport
to answer that question nor to provide an alternative solution for the errors found in this ingest pipeline 😞
According to what you're suggesting (or at least I understand about it), do you mean to add an on_failure
action in each rename processor modified here?
However, how could it be distinguished that the value to be override is the one set by filebeat
or the one set by a previous processor?
Could I get some help here to move forward? 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option, it could be done the opposite, add ignore_failure: true
instead of overwrite
. IIUC that would keep the values set by filebeat
instead, losing the values from teleport
data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The answer to the question is unfortunately a question. The issue is one of whether we know what the user wants from the data set. In the case of Qualys it was fairly clear that at least some users wanted the data source's values for these fields rather than the values that are added by the agent processor. ISTM that this is likely to be the case here too, but I don't know this for a fact.
go.mod
Outdated
k8s.io/utils v0.0.0-20241104100929-3ea5e8cea738 // indirect | ||
sigs.k8s.io/json v0.0.0-20241010143419-9aa6b5e7a4b3 // indirect | ||
sigs.k8s.io/kustomize/api v0.18.0 // indirect | ||
sigs.k8s.io/kustomize/kyaml v0.18.1 // indirect | ||
sigs.k8s.io/structured-merge-diff/v4 v4.4.2 // indirect | ||
sigs.k8s.io/yaml v1.4.0 // indirect | ||
) | ||
|
||
replace github.com/elastic/elastic-package => github.com/elastic/elastic-package v0.109.2-0.20250220090015-bbd8bcadb505 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be removed before merging.
The same for the changes in files under .buildkite
folder.
These are required to test with the validation based on mappings.
Pinging @elastic/security-service-integrations (Team:Security-Service Integrations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today, we know that no overrides are happening because that would result in an error.
Instead of adding override: true
, which could change the behavior of cloud
fields, how about adding a guard that checks if the target field exists to prevent the error (e.g., if: ctx.cloud?.x == null
)?
I think the ideal solution is the one that was discussed, where the user gets to make the choice (i.e., what was added to Qualys). If we can't get this solution in time to unblock @mrodm's work on enhancing elastic-package, then I propose we replace override: true with an if, and then open an issue for the ideal solution.
side-note: We need to audit every system test that uses canned logs to find the ones that are missing assert.hit_count
.
Ok, I'll replace those As a note, there would be some kind of change of behavior due to the fact that until now that pipeline was interrupted with an error if there ware |
Applied changes to use Created issue #12918 |
if: ctx.cloud?.region == null | ||
- remove: | ||
field: teleport.audit.region | ||
ignore_missing: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using if
conditions, as the source fields is not renamed, it must be added a new remove
processor for each one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be in a single remove processor block, but I think this is less prone to future damage.
Co-authored-by: Dan Kortschak <[email protected]>
💚 Build Succeeded
History
cc @mrodm |
|
@efd6 @andrewkroh could I get another review for this PR? 😊 Thanks in advance!! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
if: ctx.cloud?.region == null | ||
- remove: | ||
field: teleport.audit.region | ||
ignore_missing: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be in a single remove processor block, but I think this is less prone to future damage.
Package teleport - 1.2.2 containing this change is available at https://epr.elastic.co/package/teleport/1.2.2/ |
Proposed commit message
Update event-groups ingest pipeline to manage cloud fields (like cloud.region, cloud.istance.id...) if they were already set by filebeat or other processors.
These errors were detected while testing the validation based on mappings: https://buildkite.com/elastic/integrations/builds/22313
But sometimes it was not detected, and it was due to
elastic-package
was not waiting to get all the docs defined in the test folder to be ingested. After being ingested a few documents,elastic-package
started the validation of those docs found.This PR has two goals:
_dev/deploy/docker/sample_logs/
.adding theoverride
setting into the rename processors to avoid throwing exceptions.IMPORTANT: The corresponding cloud fields will be set with the value of the last rename processor.if
setting into the rename processors to avoid throwing exceptions if those fields were already present in the document.Errors found when
elastic-package
waits for all 270 documents (buildkite link - pr link):In another test running just validation based on mappings, these were the fields undefined (buildkite link):
Related discussions from previous PR:
Checklist
changelog.yml
file.Author's Checklist
elastic-package
enabled mappings https://buildkite.com/elastic/integrations/builds/22587.buildkite
folder andgo.mod
/go.sum
files.How to test this PR locally
Related issues