Skip to content

Latest commit

 

History

History
505 lines (444 loc) · 25.6 KB

ODEF-README.md

File metadata and controls

505 lines (444 loc) · 25.6 KB

Open Detection Engineering Framework - ODEF

Introduction

The Framework focuses on using business goals and outcomes to drive and guide cybersecurity activities to deliver detections, improve visibility, minimize vendor dependencies and ultimately improve the organization security posture. The Framework in its core provides the principles that guide efficient and effective detection engineering practices and also provides three maturity levels to measure the organization performance. Each phase of the framework’s core aims to describe the detection lifecycle and uses phase functions to focus the effort of the detection engineer and guide them through the process. The three maturity levels provide a high level mechanism for organizations to view and evaluate their approach for detection engineering and focus on areas of improvement. The Framework enables organizations – regardless of size, degree of cybersecurity risk, or cybersecurity sophistication – to apply the principles and best practices of detection engineering and to improve their security posture.

High level goals

The framework high level goals are to:

  • Provide guidance on how to be systematic, repeatable and predictable when building hunts and detections
  • Ensure that high visibility is achieved throughout the organization
  • Convert insights to retainable and actionable Knowledge and promote knowledge sharing
  • Introduce continuous vigilance
  • Introduce detection validation through testing
  • Facilitate a knowledge driven environment

Framework Core

The Framework Core provides a set of activities to achieve specific cybersecurity outcomes and references examples of how to achieve those outcomes. The Core comprises three lifecycle phases: Sunrise, Midday, Sunset: These phases describe the life of a detection and the functions, guidelines and goals for each phase. Functions, goals, guidelines help the detection engineer to have north star focus and deliver a detection with exceptional quality.

Phase 1️⃣ Sunrise 🌅

Sunrise is the first phase of the detection lifecycle. It marks the inception, development and deployment of the detection. During that phase there are 6 core functions that should be addressed:

  • Research
  • Prepare (Logging)
  • Build (Detection Content)
  • Validate
  • Automate
  • Share (Knowledge)

High level goals for the Sunrise phase

  • Build high fidelity detection
  • Ensure detection validation
  • Create documentation
  • Integrate and automate in the environment
  • Socialize the detection with the security organization

Functions Goal Description Guidelines
Research Opportunity Identification It can be triggered from analyzing threat intelligence reports, or OSINT, or internal knowledge for a particular security gap. Document the use case and the goals of the detection as part of the opportunity identification process.
  • Document the use case that you’re building and set goals.
  • Is the TTP already covered by an existing alert or detection?
  • Is there sufficient knowledge to start building or additional research would be required?
  • What are sources of information that will assist the research?
Prioritize Detection engineering work has to be prioritized and tracked. Work prioritization can be based on urgency and priority. Backlog of detections and security posture activities is desirable and recommended. Prioritization criteria:
  • Criticality of the system

  • Highest level of threat to the organization

  • Ease of Exploitation

  • Past incidents

Develop Research Questions Write your research questions that while answering you will gain understanding of the topic. Examples:
  • Write down what you already know or don't know about the topic.
  • Use that information to develop questions. Use probing questions. (why? what if?).
  • Avoid "yes" and "no" questions
Information Gathering Research and collect sufficient information in order to start understanding the detection Provides a good overview of the topic if you are unfamiliar with it.
  • Identify important facts, dates, events, history, organizations, etc. (in case the detection is a response to a past incident.)
  • Find bibliographies which provide additional sources of information (include in the Appendix section detection document)
Technical Context Create and understand technical context around the detection
  • Start putting technical writeup by summarizing the most important information from technical aspect
  • Research the technology associated with the technique to help understand the use cases, related data sources, and detection opportunities
  • Note: Defenders often create superficial detections because they lack an understanding of the technology involved. In case of uncertainties it is best to engage the team or engineer responsible for the management of the technology
Prepare
Identify Dataset Identify the log source that will be used for the detection Know your environment
  • Understand the data source and document it by creating a data dictionary.
  • The data dictionary should grow and contain sources of data and their corresponding schemas. It can later be used to quickly refer to.
Visibility Check Ensure there is sufficient logging, retention and visibility in order to successfully build the detection and satisfy the use case
  • Use the accumulated technical knowledge to identify source and identify the events required to build detection
  • Use any historical events in order to validate that there is sufficient visibility
Improve(optional) Once the data is explored we can identify opportunities for improvements such as:
  • Collecting additional logs or change logging levels
  • Create additional attributes (parsing of raw logs)
  • Consolidation of distinct logs
Improvement initiatives and requests should be communicated to the responsible for the dataset in question team. For that purpose it makes sense to maintain a contact list that provides quick reference to technology, support/engineering teams and contact details.
Build & Enrich Detection Creation Create a detection query against the identified dataset Having a good understanding of the technical context and the data source begin building queries to narrow down the data to actionable insight.
Manual Testing Perform a manual testing and ensure the query works syntax and logical perspective
  • Ensure the query does not have any syntax errors
  • In case the detection is build in response to past incident ensure that the query is indeed catching true positive events
Baseline development Develop a baseline (if needed) that will improve the detection fidelity
  • Baselines are sets of known and verified good behaviors and events present in the organization. Those events are normally excluded from the detection logic.
  • Baselines decisions and considerations should be documented and clearly states in the ADS
  • Baselines are included in the hunt.yml/tf/hcl or alert.yml/tf/hcl files
Unittest Development The unittest development is dependent on the type of devops pipeline. Simple goals are provided. Goals for the unittesting:
  • Changes or missing data
  • Syntax errors
  • To confirm detection logic by performing true positive detection
Enrich Enrich with additional data source if required
  • Each hunt could have different enrichment requirements. In some cases HR database could be used in order to understand if a person is on vacation, other trivial cases could be lookup of a hash, ip or domain in an threat intelligence repository etc.
Document
  • Create KB Document
  • Complete the ADS
  • Mitre minefield update
  • Central knowledge base repository is required in order to mature the detection engineering program. This can be a github repository with controlled access that provides on a need to know basis the security teams members with access.
  • Each hunt should have a corresponding README.MD file that provides sufficient information and context. Consider an SOC analyst or Incident Responder responding to an event from your detection. By looking at the documentation they should be easily briefed on the premise and technicalities of the detection.
Validate Confirm unittests Confirm unittest are working Confirmation of the unittests can be done by inspecting the implemented devops pipeline and ensuring that the actions (in the case of github) for unittests are running
True Positive validation Validate true positive event against real dataset using the query developed earlier. True positive validation can be achieved by:
  • Using historical event that exists in the central data repository
  • Emulation of the TTP by executing it in a controlled environment
False Positive Validation Ensure no FP are produced by the query when ran against the prod dataset.
  • False positive events are good known events which are produced as output results by the detection/hunt query.
  • If baseline is used it should be validated that the baseline is catching those good known events. Splunk example: Splunk you can use makeresult command to create fake results and test your baseline and how you handle false positives.
Automate Automation & deployment This step is entirely dependant on the environment and should follow the standard ci/cd or automation practices of the organization. Integrate with devops pipeline and enable continuous deployment
Share Socialize the new detection A notification process is required and it should be created. The process can be in the form of newsletter or slack channel notification, preferably automated one. Follow a process to communicate the newly created detection with the Security Teams and inform them about it
Update Sec Dependency Tree This document is actually part of the repository and can be shared with data engineering and security teams. The goal of sharing it is to promote care mentality where teams would check before they change. Meaning, if data engineer is about to rename an index they should first check if the index is being used. Having dependency document as part of the repository makes it easy and seamless for them to check. Update organization wide document showing dependencies for the detections

Sunrise phase Process Flow

graph TD;
Research1(Opportunity Identification) -->Research2(Prioritize);
Research2 -->Research3(Develop Research Questions);
Research3 -->Research4(Information Gathering);
Research4 -->Research5(Collect Technical Context);
Research5 -->Prepare1(Identify Dataset);
Prepare1 -->Prepare2(Visibility Check);
Prepare2 -->Prepare3{Improve};
Prepare3 --> |yes| cis[Start security improvement initiative];
Prepare3 --> |no| Build1(Detection Query Creation);
Build1 --> Build2(Manual Testing);
Build2 --> Build3(Baseline development);
Build3 --> Build4(Automated Unittest Development);
Build4 -->Build5(Enrich);
Build5 --> Build6(Document);
Build6 -->  Validate1(Confirm unittests);
Validate1 -->val2(True/False Positive validation);
val2-->automate(Automation & deployment);
automate --> share(Socialize the new detection);
share -->share1(Update Sec Dependency Tree);
Loading

Phase 2️⃣ Midday ☀️

The “Midday” phase is normally the longest phase from the detection lifecycle, during which the detection has been engineered and commissioned to production. The phase monitors the detection during its operation and aims to improve it if needed. High level goals for the Midday phase:

  • Operate and monitor the detection for FP or TP
  • Improve the detection logic in case of influx of FP
  • Perform systematic reviews to ensure relevancy
Functions Goal Description Guidelines
Monitor Run as per defined schedule Detection is configured to run on pre-defined schedule or real time if applicable Detections will run based on the schedule set during the sunrise phase.
Confirm unittest passing Monitoring is configured to notify the responsible team in case the automation for the detection is not running properly Suggested approach: github actions - before deployment ensuring proper syntax
Work detections Once detection is running it should be monitored for any TP or potential influx of FP TP events should be triaged, investigated and responded on by following an agreed IR process.
FP events should be investigated, proved as FP and documented as part of the baseline. Once the baseline is changed in the documentation the query can be updated and improved.
Measure Measure detection efficacy Enable metrics for the detection based on which areas for improvement can be identified.
Mitre Attack weakness
Success/failure of automating detections
Services covered
Each detection that covers particular TTP can be marked in the Mitre ATT&CK Navigator. Looking at percentage of covered tactics and techniques can be a metric.
Success or Failure in detection automation or influx of FP metric can be used to identify detections that require improvement.
Detection runtime length is a metric which can identify poorly written queries. For example, query too open that collects way too many events and chunks too much data only to spend even more time to filter by using custom logic.
Improve (optional) Improve detection fidelity Once improvement opportunities have been identified during the operations or periodic review an improvement is triggered The goal of this function is to improve any detections which are with poor health (slow runtime, causing errors) and improve them by revisiting the detention logic.
Review Perform periodic review Review detections to identify improvement opportunities or decommission requirements Detection can become irrelevant and thus decommissioned whe:
The risk that it is compensating is far smaller than the cost of running the detection
The technology used for the detection is no longer present in the company

Midday phase Process Flow

graph TD;
Monitor1(Run per schedule) -->Monitor2(Receive alerts);
Monitor2(Respond to alerts) --> Monitor3{False Positives?} ;
Monitor3 --> |no| Measure[Document TP];
Measure --> Review(Perform periodic review)
Monitor3 --> |yes| Improve(Improve);
Improve --> Monitor1;

Loading

Phase 3️⃣ Sunset 🌆

During the “Sunset” phase the detection is taken out of commission. The phase wants to ensure that resources are not spent for outdated detections that are no longer applicable and at the same time leave sufficient trace of the existence of the detection.

High level goals for the Sunset phase:

  • Decommission the detection and leave it in a state that it can be resumed anytime
  • Preserve knowledge
Functions Goal Description Guidelines
Decommission Decommission the detection The goal is to decommission the detection by following process that provides visibility In order to decommission a detection simply change the status field to "Sunset" in the .yml file. Assuming your devops pipeline is configured correctly, this should effectively disable the detections and prevent it from running.
Note: Do not remove anything from the repository as detections can be reused in future.
Knowledge base update Create an adequate indication in the KB document that the detection is no longer active and socialize the change with your security teams. Update Mitre coverage map by removing the coverage that the detection was providing

Sunset phase Process Flow

graph TD;
Review1[Review completed] --> Review2;
Review2{detection ready to decom} -->|no| End[end];
Review2{detection ready to decom} -->|yes| Preserve(Preserve knowledge);
Preserve --> Decommission(Decommission the detection);
Loading

Detection Engineering Maturity Model (DEMM)

Maturity is strictly a self-evaluation process by the team, the framework just provides some guidance and structure and assures that all of the relevant areas are covered. This review process gives a baseline, it helps the teams to create a common understanding about their way of working, and it also helps to figure out where to start any improvement activities. The self assessment is in no way calibrated.

Maturity Levels

Level 1 - Partial

  • Threat Detection Content
    • Organizational threat identification practices rely solely on external vendors to provide security content.
    • Assurance and context around alerts and detections is not provided or sufficient.
    • Risk is managed in an ad hoc and often reactive manner by relying on third parties.
  • Assurance
    • There is some limited awareness of cybersecurity threat detection capabilities at the organizational level.
    • The organization implements threat validation and verification on an irregular, case-by-case basis due to varied experience or information gained from outside sources.
    • Assurance through continuous validation is not present.
  • Knowledge sharing
    • The organization may not have processes to enable cybersecurity information sharing.
    • Documentation is rarely written and shared only on ad-hoc basis and it is scattered across teams.

Level 2 - Adequate

  • Threat Detection Content
    • Organizational threat identification practices rely on internal teams and external vendors to provide security content.
    • Context around alerts and detections is provided. Specialized teams are able to introduce new detections and security content.
    • Some security teams have a better understanding of security posture than others.
  • Assurance
    • There is some awareness of cybersecurity threat detection capabilities as the organization is now building custom detections to compensate for gaps.
    • The custom detections are use case driven and validated during the detection development process.
    • Continuous validation is not enabled and the organization still relies on suppliers for most of the detection capabilities.
  • Knowledge sharing
    • The organization is starting to enable knowledge sharing and promotes documentation efforts.
    • There is a central detection information repository.

Level 3 - Enabled (Proactive)

  • Threat Detection Content
    • Organization maintains continuous practices that provide excellent internal insights and knowledge. Context around alerts and detections is provided.
    • Any team is encouraged and capable to introduce new detection components and thus improve the security posture.
    • The security posture of the environment is well understood across the security teams.
  • Assurance
    • The organization possesses a detection coverage map and covers a big percentage with in-house built detections. The organization does not rely on vendors to provide security content.
    • Automation is provided to continuously validate and run the detection use cases.
    • Additional assurance is achieved by running red team exercises and automation frameworks.
  • Knowledge sharing
    • Organizations possess practices to create and maintain high quality records and appropriately control and manage the access to the information.
    • Processes for socializing detections are automated and teams are informed of the development of new detections.

Operational Maturity

Maturity Review Process (MRP)

The process of evaluating the maturity:

  • Collect - Collect information about your processes,people and tools. Identify changes to any. The goal is to gain a holistic understanding of the organization's security teams, tools and processes. Based on that information various posture improvements can be identified.
  • Analize - Based on the data that you have collected and find the corresponding maturity level. Finding where in the maturity level the organization is important for understanding the impact and importance of each identified security improvement initiative and thus prioritize accordingly.
  • Prioritize - Prioritize and decide which is the next low hanging fruit that can be improved. Not all security issues are equally important, prioritization should focus on those initiatives that influence and change the security posture and introduce the most maturity.
  • Improve - Create an initiative or a project for improving the identified gap.

Security Improvement Initiative

Security improvement initiatives are likely outcomes of the MRP process. The goal of the security improvement initiative is to address identified visibility gaps in the organization's security posture. For example, during the review process or detection engineering we may identify that our application is not providing sufficient logging in order to detect particular behavior or ttp of interest. That is a good candidate for a security improvement initiative. The goal of the initiative would be to deliver the visibility needed and notify back the Detection Engineer so that they can proceed with the detection creation. Depending on the size of the organization and internal processes, this process might be driven by the Detection Engineer or completely separate team.

DEMM Cadence

Evaluating the maturity of the organization and striving to improve it is no one time effort or activity. For that best results can be achieved by:

  • Set a regular schedule for reevaluating and revisiting the DEMM.
  • Ensure that the security improvement initiatives are targeted with a timeline and aligned with the overall organizational security strategy.
  • Framework Mindmap