Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a re-open of #11245, that had to be closed because a full commit history rewrite in the integration main.
Background
Elastic Cloud Security Team has been focusing, this past year, on Cloud Detection and Response (CDR). One of the first steps towards the CDR vision is to enhance investigation workflows for the Cloud Security use-case in SIEM.
As part of enhancing investigation workflows it's necessary to be able to correlate events and entities. Meaning, if an alert is triggered on the ec2 instance i-000000000, it is of great value to easily be able to search all the events related to that entity, across multiple indices, with one query. Therefore we are working on extracting entities and enabling them to be correlated.
What is an entity?
An "entity" in our context refers to any discrete component within an IT environment that can be uniquely identified and monitored. This broad term encompasses both managed and unmanaged elements.
The term "entity" is broader than the current set of available fields under related. Although ip, user and hosts can be identities, there is a lack of space to represent messaging queues, load balancers, storage systems, databases and others. Therefore the proposal to add a new field.
The proposed structure
There are two fields being added on this PR:
actor.entity.id
captures entities that started the event, the actorstarget.entity.id
captures entities that were affected by the event. Being that created, updated, listed. We try to do as much as possible with the data present in the event.Decisions made on the Painless Script
Structure
The painless script turned very large. There are essentially three parts to it:
related
,actor
andtarget
).requestParameters
andrepsonseElements
, there is, usually a somewhat coherent structure per AWS service. I believe such separation brings better reading, creates a better headspace once working in a specific service and also breaks down the hugeif else
chain present in the previous state of the codeWhy TreeSet as datastructure to hold
related
,actor
,target
.There are two properties that this script must have:
Previously I had ensured both properties on "post processing", at the end of the script. Now it's ensured by the data structure itself.
I have not performance tested myself, but the usage of TreeSet should improve the time complexity of the algorithm, since we sort data on
add
, and previously we had tosort
afterwards. I couldn't find a reliable source for time complexity ofTreeSet.add
vsCollections.sort
- and honestly, the size of the list is so small that might not even matter.Amount of tests
The testing was essential to me to validate what I was doing, to verify each output. And I would like to keep the tests for future reference and ensuring we are not changing anything by mistake. But the tests are starting to get slow. Specially if you compare with other integrations, such as
okta
.