Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[teleport] Update event-groups ingest pipeline to manage cloud fields if already present #12851

Merged
merged 12 commits into from
Mar 4, 2025

Conversation

mrodm
Copy link
Contributor

@mrodm mrodm commented Feb 20, 2025

Proposed commit message

Update event-groups ingest pipeline to manage cloud fields (like cloud.region, cloud.istance.id...) if they were already set by filebeat or other processors.

These errors were detected while testing the validation based on mappings: https://buildkite.com/elastic/integrations/builds/22313

But sometimes it was not detected, and it was due to elastic-package was not waiting to get all the docs defined in the test folder to be ingested. After being ingested a few documents, elastic-package started the validation of those docs found.

This PR has two goals:

  • adding the required configuration to change the behavior of the system tests to wait for the 270 documents defined in _dev/deploy/docker/sample_logs/.
  • adding the override setting into the rename processors to avoid throwing exceptions.
    • IMPORTANT: The corresponding cloud fields will be set with the value of the last rename processor.
  • adding the if setting into the rename processors to avoid throwing exceptions if those fields were already present in the document.

Errors found when elastic-package waits for all 270 documents (buildkite link - pr link):

[0] found error.message in event: [Processor rename with tag  in pipeline logs-teleport.audit-1.2.1-event-groups failed with message: field [cloud.instance.id] already exists]
[1] found error.message in event: [Processor rename with tag  in pipeline logs-teleport.audit-1.2.1-event-groups failed with message: field [cloud.region] already exists]

In another test running just validation based on mappings, these were the fields undefined (buildkite link):

[0] field "teleport.audit.account_id" is undefined: field definition not found
[1] field "teleport.audit.aws_host" is undefined: field definition not found
[2] field "teleport.audit.aws_region" is undefined: field definition not found
[3] field "teleport.audit.aws_service" is undefined: field definition not found
[4] field "teleport.audit.exit_code" is undefined: field definition not found
[5] field "teleport.audit.instance_id" is undefined: field definition not found
[6] field "teleport.audit.region" is undefined: field definition not found
[7] field "teleport.audit.status" is undefined: field definition not found
[8] field "teleport.audit.target" is undefined: field definition not found

Related discussions from previous PR:

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

Author's Checklist

How to test this PR locally

Related issues

@mrodm mrodm self-assigned this Feb 20, 2025
@mrodm mrodm force-pushed the update_teleport_pipeline branch from 1449981 to 69b1050 Compare February 20, 2025 11:22
mrodm added a commit to mrodm/integrations that referenced this pull request Feb 20, 2025
@elastic-vault-github-plugin-prod
Copy link

elastic-vault-github-plugin-prod bot commented Feb 20, 2025

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

Comment on lines 1420 to 1421
# This could fail if `cloud.account.id` field already exists in the doc. Forced to override the value to avoid throw an exception.
override: true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cloud.instance.id could be set in three different processors using different source fields:

  • teleport.audit.aws_host
  • teleport.audit.db_gcp_instance_id
  • teleport.audit.instance_id

Comment on lines 1443 to 1448
- rename:
field: teleport.audit.region
target_field: cloud.region
ignore_missing: true
ignore_failure: true
# This could fail if `cloud.region` field already exists in the doc. Forced to override the value to avoid throw an exception.
override: true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cloud.region could be set in three different processors using different source fields:

  • teleport.audit.aws_region
  • teleport.audit.db_aws_region
  • teleport.audit.region

@@ -1426,11 +1438,14 @@ processors:
field: teleport.audit.account_id
target_field: cloud.account.id
ignore_missing: true
# This could fail if `cloud.account.id` field already exists in the doc. Forced to override the value to avoid throw an exception.
override: true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no other processor updating cloud.account.id.

- rename:
field: teleport.audit.aws_service
target_field: cloud.service.name
ignore_missing: true
# This could fail if `cloud.service.name` field already exists in the doc. Forced to override the value to avoid throw an exception.
override: true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No other processor updating cloud.service.name.

- rename:
field: teleport.audit.aws_host
target_field: cloud.instance.id
ignore_missing: true
# This could fail if `cloud.instance.id` field already exists in the doc. Forced to override the value to avoid throw an exception.
override: true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding override: true into these rename processors makes that the corresponding cloud field values will be set with the value of the latest rename processor executed.

Is that ok/expected?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direct link to the discussion in the original PR: #12624 (comment).

Copy link
Contributor Author

@mrodm mrodm Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we enabled an approach like the solution that was implemented for the qualys case (user can specify what they are interested in), or a hard "use the data from the data source" approach, then those two fields cease to be an issue. The remaining fields, cloud.region and cloud.instance.id look like they would then just fall out (depending on how teleport handles the teleport.audit.{region,instance_id} cases (are they disjoint with the explicitly named fields, teleport.audit.{aws,db_aws} and teleport.audit.{aws_host,db_gcp_instance_id} respectively. Unfortunately the test cases don't give any indication about this and I cannot find any reference for it in the teleport documentation; my feeling is that the more generic case should be overridden by the more specific case, but if an override occurs, then an error should be written in and the pipeline be continued. In the case that the we see apparently inconsistent fields being set (e.g. teleport.audit.aws_host, teleport.audit.db_gcp_instance_id) this would just be an error. I'm not sure what the behaviour of teleport.audit.aws_region v. teleport.audit.db_aws_region should be; are these mutually exclusive?

So, I don't have neither the context about teleport to answer that question nor to provide an alternative solution for the errors found in this ingest pipeline 😞

According to what you're suggesting (or at least I understand about it), do you mean to add an on_failure action in each rename processor modified here?
However, how could it be distinguished that the value to be override is the one set by filebeat or the one set by a previous processor?

Could I get some help here to move forward? 🙏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option, it could be done the opposite, add ignore_failure: true instead of overwrite. IIUC that would keep the values set by filebeat instead, losing the values from teleport data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The answer to the question is unfortunately a question. The issue is one of whether we know what the user wants from the data set. In the case of Qualys it was fairly clear that at least some users wanted the data source's values for these fields rather than the values that are added by the agent processor. ISTM that this is likely to be the case here too, but I don't know this for a fact.

go.mod Outdated
k8s.io/utils v0.0.0-20241104100929-3ea5e8cea738 // indirect
sigs.k8s.io/json v0.0.0-20241010143419-9aa6b5e7a4b3 // indirect
sigs.k8s.io/kustomize/api v0.18.0 // indirect
sigs.k8s.io/kustomize/kyaml v0.18.1 // indirect
sigs.k8s.io/structured-merge-diff/v4 v4.4.2 // indirect
sigs.k8s.io/yaml v1.4.0 // indirect
)

replace github.com/elastic/elastic-package => github.com/elastic/elastic-package v0.109.2-0.20250220090015-bbd8bcadb505
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be removed before merging.
The same for the changes in files under .buildkite folder.

These are required to test with the validation based on mappings.

@mrodm mrodm marked this pull request as ready for review February 20, 2025 17:55
@mrodm mrodm requested review from a team as code owners February 20, 2025 17:55
@mrodm mrodm added Integration:teleport Teleport Team:Security-Service Integrations Security Service Integrations Team [elastic/security-service-integrations] labels Feb 21, 2025
@elasticmachine
Copy link

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today, we know that no overrides are happening because that would result in an error.

Instead of adding override: true, which could change the behavior of cloud fields, how about adding a guard that checks if the target field exists to prevent the error (e.g., if: ctx.cloud?.x == null)?

I think the ideal solution is the one that was discussed, where the user gets to make the choice (i.e., what was added to Qualys). If we can't get this solution in time to unblock @mrodm's work on enhancing elastic-package, then I propose we replace override: true with an if, and then open an issue for the ideal solution.


side-note: We need to audit every system test that uses canned logs to find the ones that are missing assert.hit_count.

@mrodm
Copy link
Contributor Author

mrodm commented Feb 27, 2025

Today, we know that no overrides are happening because that would result in an error.

Instead of adding override: true, which could change the behavior of cloud fields, how about adding a guard that checks if the target field exists to prevent the error (e.g., if: ctx.cloud?.x == null)?

Ok, I'll replace those overrides by if conditions, thanks!

As a note, there would be some kind of change of behavior due to the fact that until now that pipeline was interrupted with an error if there ware cloud.* fields defined.

@mrodm
Copy link
Contributor Author

mrodm commented Feb 27, 2025

I think the ideal solution is the one that was discussed, where the user gets to make the choice (i.e., what was added to Qualys). If we can't get this solution in time to unblock @mrodm's work on enhancing elastic-package, then I propose we replace override: true with an if, and then open an issue for the ideal solution.

Applied changes to use if in 3e84dbd

Created issue #12918
Feel free to update the description/title if you think it is missing any information.

Comment on lines +1462 to +1465
if: ctx.cloud?.region == null
- remove:
field: teleport.audit.region
ignore_missing: true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using if conditions, as the source fields is not renamed, it must be added a new remove processor for each one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be in a single remove processor block, but I think this is less prone to future damage.

@mrodm mrodm requested review from efd6 and andrewkroh February 27, 2025 14:03
Co-authored-by: Dan Kortschak <[email protected]>
@elasticmachine
Copy link

💚 Build Succeeded

History

cc @mrodm

@mrodm mrodm requested a review from efd6 February 28, 2025 14:18
@mrodm
Copy link
Contributor Author

mrodm commented Mar 3, 2025

@efd6 @andrewkroh could I get another review for this PR? 😊

Thanks in advance!!

Copy link
Contributor

@efd6 efd6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Comment on lines +1462 to +1465
if: ctx.cloud?.region == null
- remove:
field: teleport.audit.region
ignore_missing: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be in a single remove processor block, but I think this is less prone to future damage.

@mrodm mrodm merged commit 7439dfd into elastic:main Mar 4, 2025
7 checks passed
@mrodm mrodm deleted the update_teleport_pipeline branch March 4, 2025 09:42
@elastic-vault-github-plugin-prod

Package teleport - 1.2.2 containing this change is available at https://epr.elastic.co/package/teleport/1.2.2/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Integration:teleport Teleport Team:Security-Service Integrations Security Service Integrations Team [elastic/security-service-integrations]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants