[FR] Generate investigation guides #4358

Mikaayenson · 2025-01-08T21:16:30Z

Pull Request

Issue link(s): https://github.com/elastic/ia-trade-team/issues/160

Summary - What I changed

Added a unit test that enforces investigation guides for Elastic prebuilt rules
Adds a disclaimer to the rules for ones that are genai generated per @approksiu
Note: This will add a large number of rule changes which impacts the number of assets shipped cc. @banderror
Note: Does not relocate setup info that was originally in the note field as our build process will automatically migrate rules to the setup field on build time.
Normalized the header Triage and analysis
Leveraged a bit of code to automate the process.

Details

import tomlkit
from tomlkit import string
from pathlib import Path

for rule in rules:
    if (rule.contents.data.note is not None and ("## Triage and analysis" not in rule.contents.data.note and "## Triage and Analysis" not in rule.contents.data.note)) or rule.contents.data.note is None:
        note = rule.contents.data.note
        updated_date_field = "2025/01/08"

        # Add in generated guide
        rule_uuid = rule.id

        generated_guide_path = Path(f"results/guides/{rule_uuid}_guide.md")
        with generated_guide_path.open() as f:
            new_guide = f.read()

        if note:
            note = f"{new_guide}\n\n{note}"
        else:
            note = new_guide

        with rule.path.open("r", encoding="utf-8") as g:
            toml_data = tomlkit.parse(g.read())

            # update the date
            toml_data["metadata"]["updated_date"] = updated_date_field

            # update the note field
            toml_data["rule"]["note"] = string(note, multiline=True)

        # save the toml_data back to the file
        with rule.path.open("w", encoding="utf-8") as g:
            g.write(tomlkit.dumps(toml_data))

    else:
        print(rule.path)
        print(f"Skipping {rule.id}")


# Updating the date (useful after multiple tweaks)

import subprocess
from pathlib import Path
import tomlkit

# Get the list of modified files using Git
modified_files = subprocess.check_output(
    ["git", "diff", "--name-only", "--relative"],
    text=True
).strip().split("\n")

# Filter only the relevant files (rules and building blocks)
rule_files = [
    Path(file) for file in modified_files
    if file.startswith(("rules/", "rules_building_block/")) and file.endswith(".toml")
]

updated_date_field = "2025/01/08"

for rule_file in rule_files:
    with rule_file.open("r", encoding="utf-8") as f:
        toml_data = tomlkit.parse(f.read())

    # Update the `metadata.updated_date` field
    if "metadata" in toml_data and isinstance(toml_data["metadata"], dict):
        toml_data["metadata"]["updated_date"] = updated_date_field

    with rule_file.open("w", encoding="utf-8") as f:
        f.write(tomlkit.dumps(toml_data))

    print(f"Updated: {rule_file}")

How To Test / Review

Review the content for general consistency
Unit tests should pass
Import the rules into the UI to confirm the guides are formatted properly 8.18-consolidated-rules.ndjson.txt. Note: The number of rules is getting large so a few were removed from the ndjson so that the UI would import them.
Would like someone from security-docs to review cc. @jmikell821

Sample UI: AWS IAM Password Recovery Requested

Important

Please make the changes and use the "suggest changes" feature in lieu of comments. That way the changes can be accepted as a batch change.

Checklist

Added a label for the type of pr: bug, enhancement, schema, maintenance, Rule: New, Rule: Deprecation, Rule: Tuning, Hunt: New, or Hunt: Tuning so guidelines can be generated
Added the meta:rapid-merge label if planning to merge within 24 hours
Secret and sensitive material has been managed correctly
Automated testing was updated or added to match the most common scenarios
Documentation and comments were added for features that require explanation

protectionsmachine · 2025-01-08T21:16:44Z

Mikaayenson · 2025-01-08T21:17:21Z

detection_rules/rule_formatter.py

+                    if 'query' in osquery_item and isinstance(osquery_item['query'], str):
+                        # Transform instances of \ to \\ as calling write will convert \\ to \.
+                        # This will ensure that the output file has the correct number of backslashes.
+                        osquery_item['query'] = osquery_item['query'].replace("\\", "\\\\")


I noticed this when trying to run toml-lint on our rules. Without this accounted for, it breaks the loader on formatting issues.

Mikaayenson · 2025-01-08T21:18:08Z

tests/test_all_rules.py

+
+        for rule in self.all_rules:
+            if not rule.contents.data.is_elastic_rule:
+                continue  # Don't enforce on non-Elastic rules


Question if we should go ahead an enforce on all rules.

w0rk3r · 2025-01-08T22:57:23Z

tests/test_all_rules.py

+    def test_note_contains_triage_and_analysis(self):
+        """Ensure the note field contains Triage and analysis content for Elastic rules."""


Added a unit test that enforces investigation guides for Elastic prebuilt rules

Is this needed? If we are going to go all in for the generated guides, couldn't we have a weekly/monthly automation that generates guides for new rules and open a PR?

My preference is have the complete rule at initial release, so users get full picture, and we do not introduce additional updates. I can understand there might be exceptions for urgent releases.

IMO: That adds more complexity when reviewing, potentially distracting reviewers from what matters the most: rule logic

Plus, it adds one more step to get the rules done, as authors would need to fix the generated guide before pushing the PR

This question is related to #4358 (comment) FWIW, we now have a GitHub action job that can run on a detection-rules branch to generate a guide (given the rule uuid), so the idea would be to generate the guide assuming it follows the same general guidance as this PR.

On another vein, an alternative idea would be to remove the unit test and make generating guides a step of the release where guides are created in a single PR prior to shipping, but ideally the original authors would still review those anyway. It may be easier to review in the context of the PR when the rule was created.

Aegrah

This is very good, saving us lots of time, while still providing some very basic guidance to junior SOC personel. My few cents:

I think we should avoid providing OSQuery queries, or specifically mention somewhere that the OSQueries that are generated might be wrong. I understand this is part of the initial disclaimer, but I still think it's a bit silly to provide wrong queries if we state that "these have been reviewed to improve its accuracy and relevance". Another option would be to just refer to the OSquery documentation, and mention the pre-built OSQuery packs + our hunting queries.
AI sometimes provides duplicate information or useless/too broad investigation steps. I'm curious whether working with a 2-step model would work. Provide input from this AI into a new AI with more strict rules to filter out these mistakes/inconsistencies. This might also filter out the cases where AI is writing about paths that do not exist for example.
In general this is great to start. I am just curious whether the prompt could be altered to make more specific investigations for certain activity, because it remains very broad. I understand the more specific you go, the more it will get wrong. Curious to see what options we could have here.
I also wonder the same thing as @w0rk3r does with regards to unit testing enforcements. Having CI/CD handle that for us would be convenient.

rules/linux/collection_linux_clipboard_activity.toml

Aegrah · 2025-01-09T08:54:43Z

rules/linux/credential_access_ssh_backdoor_log.toml

+- Review the alert details to identify the specific file change event, including the file name, file path, and process executable involved, to understand the context of the suspicious activity.
+- Verify the legitimacy of the process executable by checking the hash of the binary at the path "/usr/sbin/sshd" or "/usr/bin/ssh" against known good hashes to detect any unauthorized modifications.
+- Examine the file name and extension involved in the alert to determine if it matches any known patterns of backdoor logging files, such as unusual extensions like "in", "out", "ini", or temporary file patterns like "*~".
+- Investigate the file path to determine if it matches any of the suspicious directories listed in the query, such as "/private/etc/ssh/.sshd_auth" or "/usr/lib/*.so.*", which may indicate unauthorized file placement.


This right here is the main downside of GenAI, it's talking about /private/etc/ssh/.sshd_auth, which AFAIK either does not exist, or is a MacOS path.

Can we build in checks when prompting it to only provide information when it is 100% sure its correct? Or will it hallucinate anyways?

yea the issue is because it takes the context from the query. which has that path included. =) good tuning opportunity @Aegrah

rules/linux/defense_evasion_chattr_immutable_file.toml

rules/linux/discovery_pspy_process_monitoring_detected.toml

rules/linux/persistence_kernel_driver_load.toml

banderror · 2025-01-09T12:17:14Z

Note: This will add a large number of rule changes which impacts the number of assets shipped cc. @banderror

@Mikaayenson Thank you for the heads up. How many new rule versions are we adding -- 902? cc @xcrzx

It's awesome that we're adding investigation guides to the remaining prebuilt rules!

sodhikirti07

Reviewed the ML investigation guides, and overall, the content looks good. I've added a few minor comments regarding Osquery and some descriptive inconsistencies in the investigation steps. Wonder if we could make the guides more concise by removing duplicate information or having the model summarize it further?

rules/integrations/beaconing/command_and_control_beaconing.toml

approksiu · 2025-01-10T14:11:22Z

@Mikaayenson @w0rk3r Regarding OSQuery queries, how many are suggested for these new guides? Would it be possible to have the interactive osqueries, like here:

We'd potentially need manual review/fixes for them.
What do you think?

susan-shu-c

Left some comments on SecML packages. Thanks for doing this!
For potential updates in the future, I'm wondering if we could generate less points in each section (such as "response and remediation") so that the LLM generates the most relevant ones? From my first impression, it feels that there are some less important points that it brought up just to fulfill some sort of bullet point count. But it's mostly a first impression.

rules/integrations/beaconing/command_and_control_beaconing.toml

rules/integrations/beaconing/command_and_control_beaconing_high_confidence.toml

rules/integrations/ded/exfiltration_ml_high_bytes_destination_geo_country_iso_code.toml

rules/integrations/ded/exfiltration_ml_high_bytes_destination_ip.toml

sodhikirti07

Suggesting some small changes.

rules/integrations/ded/exfiltration_ml_high_bytes_destination_geo_country_iso_code.toml

rules/integrations/ded/exfiltration_ml_high_bytes_destination_ip.toml

rules/integrations/ded/exfiltration_ml_high_bytes_destination_region_name.toml

rules/integrations/dga/command_and_control_ml_dga_high_sum_probability.toml

rules/integrations/lmd/lateral_movement_ml_spike_in_connections_to_a_destination_ip.toml

Mikaayenson · 2025-01-11T03:17:43Z

Update Jan 10 - Regenerated Guides

I took all the feedback and regenerated guides. Essentially making these changes:

Removed OSQuery recommendations because some were incorrect and were being manually removed based on the suggested changes. Most of the feedback thus far was about ossuary issues.
Attempted to deduplicate investigative steps from response and remediation steps
Modified the prompts to try to generate more specific and useful steps vs appearing to fill a number requirement
Updated the disclaimer to match other notes used in the guide.

cc. folks who've already provided feedback so far @susan-shu-c @sodhikirti07 @Aegrah @w0rk3r

[FR] Generate investigation guides

265d8cc

Mikaayenson added enhancement New feature or request Security Content labels Jan 8, 2025

Mikaayenson requested review from w0rk3r, approksiu, DefSecSentinel, imays11, Samirbous, Aegrah and terrancedejesus January 8, 2025 21:16

Mikaayenson self-assigned this Jan 8, 2025

Mikaayenson commented Jan 8, 2025

View reviewed changes

Normalize the investigation guide header

736b7c5

w0rk3r reviewed Jan 8, 2025

View reviewed changes

Aegrah reviewed Jan 9, 2025

View reviewed changes

sodhikirti07 reviewed Jan 9, 2025

View reviewed changes

rules/integrations/beaconing/command_and_control_beaconing.toml Outdated Show resolved Hide resolved

susan-shu-c reviewed Jan 10, 2025

View reviewed changes

sodhikirti07 reviewed Jan 10, 2025

View reviewed changes

Prep - reset files to main for the next bulk update

114a4b4

nastasha-solomon self-requested a review January 10, 2025 20:46

Mikaayenson and others added 2 commits January 10, 2025 20:31

Merge branch 'main' into gen_investigation_guides

d128721

regenerate guides

c6b8239

Mikaayenson requested review from sodhikirti07, Aegrah and w0rk3r January 11, 2025 03:22

Mikaayenson requested a review from susan-shu-c January 11, 2025 03:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FR] Generate investigation guides #4358

[FR] Generate investigation guides #4358

Mikaayenson commented Jan 8, 2025 •

edited

Loading

protectionsmachine commented Jan 8, 2025

Mikaayenson Jan 8, 2025

Mikaayenson Jan 8, 2025

w0rk3r Jan 8, 2025

approksiu Jan 10, 2025

w0rk3r Jan 10, 2025

Mikaayenson Jan 11, 2025 •

edited

Loading

Aegrah left a comment •

edited

Loading

Aegrah Jan 9, 2025

Mikaayenson Jan 10, 2025

banderror commented Jan 9, 2025

sodhikirti07 left a comment

approksiu commented Jan 10, 2025 •

edited

Loading

susan-shu-c left a comment

sodhikirti07 left a comment

Mikaayenson commented Jan 11, 2025

		def test_note_contains_triage_and_analysis(self):
		"""Ensure the note field contains Triage and analysis content for Elastic rules."""

[FR] Generate investigation guides #4358

Are you sure you want to change the base?

[FR] Generate investigation guides #4358

Conversation

Mikaayenson commented Jan 8, 2025 • edited Loading

Pull Request

Summary - What I changed

How To Test / Review

Checklist

protectionsmachine commented Jan 8, 2025

Enhancement - Guidelines

Documentation and Context

Code Standards and Practices

Testing

Additional Checks

Mikaayenson Jan 8, 2025

Choose a reason for hiding this comment

Mikaayenson Jan 8, 2025

Choose a reason for hiding this comment

w0rk3r Jan 8, 2025

Choose a reason for hiding this comment

approksiu Jan 10, 2025

Choose a reason for hiding this comment

w0rk3r Jan 10, 2025

Choose a reason for hiding this comment

Mikaayenson Jan 11, 2025 • edited Loading

Choose a reason for hiding this comment

Aegrah left a comment • edited Loading

Choose a reason for hiding this comment

Aegrah Jan 9, 2025

Choose a reason for hiding this comment

Mikaayenson Jan 10, 2025

Choose a reason for hiding this comment

banderror commented Jan 9, 2025

sodhikirti07 left a comment

Choose a reason for hiding this comment

approksiu commented Jan 10, 2025 • edited Loading

susan-shu-c left a comment

Choose a reason for hiding this comment

sodhikirti07 left a comment

Choose a reason for hiding this comment

Mikaayenson commented Jan 11, 2025

Update Jan 10 - Regenerated Guides

Mikaayenson commented Jan 8, 2025 •

edited

Loading

Mikaayenson Jan 11, 2025 •

edited

Loading

Aegrah left a comment •

edited

Loading

approksiu commented Jan 10, 2025 •

edited

Loading