From bd9e565b34b57b0027e42ed387c7a5a1399bac86 Mon Sep 17 00:00:00 2001 From: Konrad Weiss Date: Mon, 18 Nov 2024 14:36:12 +0100 Subject: [PATCH 1/3] Addin parts extracted from the wiki pages --- docs/docs/DesignPrinciples/index.md | 34 +++++ docs/docs/GettingStarted/query_examples.md | 150 +++++++++++++++++++++ 2 files changed, 184 insertions(+) create mode 100644 docs/docs/DesignPrinciples/index.md create mode 100644 docs/docs/GettingStarted/query_examples.md diff --git a/docs/docs/DesignPrinciples/index.md b/docs/docs/DesignPrinciples/index.md new file mode 100644 index 00000000000..32fc1cc060f --- /dev/null +++ b/docs/docs/DesignPrinciples/index.md @@ -0,0 +1,34 @@ +# Design Principles + +## The CPG represents the code's ... + +* Structure/Syntax +* Data Flows +* Execution Order/Control Flow +* Variable Usage +* Calls +* The Type System + +## The CPG should parse ... + +* Incomplete code +* Code with missing toolchains +* With resilience to incorrect code +* Language heterogeneous projects + + +## CPG-Library users should be able to ... + +* Load projects and single files +* Visualize and analyze code +* Implement and register new Language Frontends +* Extends and modify existing components, e.g., passes +* Parse code incrementally + +## The CPG-Transformation should be ... +* Language independent: Allow for language independent and cross-language queries +* Information-rich: contain language-specific information in generalized structures +* Fast (enough). + * Small Projects/Development projects should be analyzable in real-time, at most some seconds. + * Large libraries should take no longer than a few hours. + * About 5 to 10 times as long as the compilation process. diff --git a/docs/docs/GettingStarted/query_examples.md b/docs/docs/GettingStarted/query_examples.md new file mode 100644 index 00000000000..96e0478a0bd --- /dev/null +++ b/docs/docs/GettingStarted/query_examples.md @@ -0,0 +1,150 @@ +# Query Examples + +We want to create a way to create "rules" or "checks", that check for certain patterns in the code. Therefore we need to decide, if we want to have a "algorithmic" or a "descriptive" way to declare such as check. + +Syntax explanation: `|x|` means, that `x` should be "resolved", either through constant propagation or other fancy algorithms. + +The following examples check that no such bug is present. + +## Array out of bounds exception + +Part of: CWE 119 +``` +result.all(mustSatisfy = { max(it.subscriptExpression) < min(it.size) && min(it.subscriptExpression) >= 0 }) +``` + +## Null pointer dereference (CWE 476) + +``` +result.all(mustSatisfy={it.base() != null}) +``` + +## Memcpy too large source (Buffer Overflow) +Part of CWE 120: Buffer Copy without Checking Size of Input ('Classic Buffer Overflow') --> do we also need to find selfwritten copy functions for buffers? +``` +result.all({ it.name == "memcpy" }, { sizeof(it.arguments[0]) >= min(it.arguments[2]) } ) +``` + +## Memcpy too small source + +``` +result.all({ it.name == "memcpy" }, { sizeof(it.arguments[1]) <= max(it.arguments[2]) } ) +``` + +## Division by 0 (CWE 369) + +``` +result.all({ it.operatorCode == "/" }, { !(it.rhs.evaluate(MultiValueEvaluator()) as NumberSet).maybe(0) } ) +``` + +## Integer Overflow/Underflow (CWE 190, 191, 128) + +For assignments: +``` +result.all({ it.target?.type?.isPrimitive == true }, { max(it.value) <= maxSizeOfType(it.target!!.type) && min(it.value) >= minSizeOfType(it.target!!.type)}) +``` +For other expressions, we need to compute the effect of the operator + +## Use after free + +Intuition: No node which is reachable from a node `free(x)` must use `x`. Use EOG for reachability but I'm not sure how to say "don't use x". This is the most basic form. +``` +result.all({ it.name == "free" }) { outer -> !executionPath(outer) { (it as? DeclaredReferenceExpression)?.refersTo == (outer.arguments[0] as? DeclaredReferenceExpression)?.refersTo }.value } +``` + +## Double Free + +``` +result.all({ it.name == "free" }) { outer -> !executionPath(outer) { ((it as? CallExpression)?.name == "free" && ((it as? CallExpression)?.arguments?.getOrNull(0) as? DeclaredReferenceExpression)?.refersTo == (outer.arguments[0] as? DeclaredReferenceExpression)?.refersTo }.value } +``` + +## Format string attack + +arg0 of functions such as `printf` must not be user input. Since I'm not aware that we have a general model for "this is user input" (yet), we could say that all options for the argument must be a Literal (not sure if the proposed notation makes sense though). +``` +vuln_fcs = ["fprint", "printf", "sprintf", "snprintf", "vfprintf", "vprintf", "vsprintf", "vsnprintf"]; +forall (n: CallExpression): n.invokes.name in vuln_fcs => forall u in |backwards_DFG(n.arguments[0])|: u is Literal +``` + +Since many classical vulns. (injection) are related to user input, we probably need a way to specify sources of user input (or "sources" in general). To reduce FP, we probably also want to check some conditions over the path between the source and the sink (e.g. some checks are in place to check for critical characters/substrings, do escaping, etc.). Problem: There are tons of options. + +## Access of Uninitialized Pointer (CWE 824) + +## Access of Invalid Memory Address + +## Unsecure Default Return Value + +Sounds like this always depends on the program? What is an insecure return value? + +E.g.: +* Authorization: instead of assuming successful authorization (`authorized = true`) and checking for the contrary; start with assuming unauthorized (`authorized = false`) and check for authorization + +## Missing Return Value Validation (Error checking) + +CWE 252 + +I can't think of a simple query here which does not introduce too many findings because it often depends "what happens afterwards". Example: logging an error value is typically not problematic. Also, the return values can have very different meanings which makes it hard to find a solution for all issues. + +Simple idea 1: There has to be at least a check for the return value (probably for a given list of functions and respective error indicating return values). + +## Command Injection + +* Perform data flow analysis and check if unchecked user input reaches function calling system commands + + +## Proper Nulltermination of Strings (C specific) + +## Improper Certificate Validation (CWE 306) + +=> Use codyze? + +## Use of Hard-coded Credentials (CWE 798) + +Idea: when crypto API is known, we could follow to input argument for passwords / keys ... + +``` +relevant_args = {"function": "arg0"} +forall (n: CallExpression): n.invokes.name in relevant_args.keys => forall u in |backwards_DFG(relevant_args(n.invokes.name))|: u !is Literal +``` + +## Scribbles + +### Test arguments of call expression +``` +result.all({ it.name == "" }) { it.arguments[].value!! == const() } +``` + +### Track return value of call expression + +``` +forall (n1: CallExpression, n2: CallExpression): n1.invokes.name == "" && n2.invokes.name == "" => data_flow(n1.returnValue, n2.arguments[]) +``` + +### Ensure path property +``` +forall (n: CallExpression, v: Value) : n.invokes.name == "" && data_flow(v, n.arguments[]) => inferred_property(v, ) +``` + +Example: +``` +val algo = read_from_file(/* some file */); +if (val != "AES") { + throw Exception(); +} +val cipher = initialize_cipher(algo); // at this point one can infer that algo must have the value "AES" +``` + +## https://cwe.mitre.org/data/definitions/1228.html + +Should be easy by simply maintaining a list of the dangerous, inconsistent, obsolete, etc. functions and checking all `CallExpression`s + + +# Which analyses do we need? + +* Integer range +* Buffer size of constant sized arrays (mem size, no elements) +* Data flow analysis (intraproc: DFG edges, interproc: missing) +* Reachability (intraproc: EOG edges, interproc: missing) +* Points-to information +* Taint analysis +* Constant propagation \ No newline at end of file From 262a09d756b8b187022ae822ff03f2b249bd45ec Mon Sep 17 00:00:00 2001 From: Konrad Weiss Date: Mon, 18 Nov 2024 15:00:46 +0100 Subject: [PATCH 2/3] Add new pages to index --- docs/mkdocs.yaml | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/mkdocs.yaml b/docs/mkdocs.yaml index e73c12e7522..909995840d5 100755 --- a/docs/mkdocs.yaml +++ b/docs/mkdocs.yaml @@ -157,6 +157,7 @@ nav: - "Usage as library": GettingStarted/library.md - "Using the Interactive CLI": GettingStarted/cli.md - "Using the Query API": GettingStarted/query.md + - "Query Examples": GettingStarted/query_examples.md - "Shortcuts to Explore the Graph": GettingStarted/shortcuts.md - "Specifications": - CPG/specs/index.md @@ -170,6 +171,8 @@ nav: - "Scopes and Symbols": CPG/impl/scopes.md - "Passes": CPG/impl/passes.md - "Symbol Resolution": CPG/impl/symbol-resolver.md + - "Design Principles": + - Design Principles/index.md - "Contributing": - "Contributing to the CPG library": Contributing/index.md # This assumes that the most recent dokka build was generated with the "main" tag! From 98725ce723ced8e2b5fb26b6c51c000e0e3f1327 Mon Sep 17 00:00:00 2001 From: Konrad Weiss Date: Mon, 18 Nov 2024 22:46:23 +0100 Subject: [PATCH 3/3] Removed the query exampel from navigation, moved principles to implementation --- .../index.md => CPG/impl/design_principles.md} | 0 docs/mkdocs.yaml | 4 +--- 2 files changed, 1 insertion(+), 3 deletions(-) rename docs/docs/{DesignPrinciples/index.md => CPG/impl/design_principles.md} (100%) diff --git a/docs/docs/DesignPrinciples/index.md b/docs/docs/CPG/impl/design_principles.md similarity index 100% rename from docs/docs/DesignPrinciples/index.md rename to docs/docs/CPG/impl/design_principles.md diff --git a/docs/mkdocs.yaml b/docs/mkdocs.yaml index 909995840d5..2d3b11b3e1f 100755 --- a/docs/mkdocs.yaml +++ b/docs/mkdocs.yaml @@ -157,7 +157,6 @@ nav: - "Usage as library": GettingStarted/library.md - "Using the Interactive CLI": GettingStarted/cli.md - "Using the Query API": GettingStarted/query.md - - "Query Examples": GettingStarted/query_examples.md - "Shortcuts to Explore the Graph": GettingStarted/shortcuts.md - "Specifications": - CPG/specs/index.md @@ -167,12 +166,11 @@ nav: - "Evaluation Order Graph (EOG)": CPG/specs/eog.md - "Implementation": - CPG/impl/index.md + - "Design Principles": CPG/impl/design_principles.md - "Language Frontends": CPG/impl/language.md - "Scopes and Symbols": CPG/impl/scopes.md - "Passes": CPG/impl/passes.md - "Symbol Resolution": CPG/impl/symbol-resolver.md - - "Design Principles": - - Design Principles/index.md - "Contributing": - "Contributing to the CPG library": Contributing/index.md # This assumes that the most recent dokka build was generated with the "main" tag!