Merge pull request #396 from nfejzic/add-thesis-nfejzic

cksystemsteaching · Sep 6, 2024 · 57ec1e6 · 57ec1e6
2 parents c7bc87e + c0165b9
commit 57ec1e6
Show file tree

Hide file tree

Showing 27 changed files with 2,552 additions and 6 deletions.
diff --git a/theses/README.md b/theses/README.md
@@ -1,5 +1,43 @@
 # Bachelor Theses on Selfie
 
+## Exploring Reasoning Performance of RISC-V Software Models in BTOR2 by Nadir Fejzić, University of Salzburg, Austria, 2024 ([PDF](https://github.com/cksystemsteaching/selfie/blob/main/theses/bachelor_thesis_fejzic.pdf), [Release](https://github.com/cksystemsteaching/selfie/releases/tag/bachelor_thesis_fejzic))
+
+When working with software programs, we want to ensure that they do not
+reach certain unsafe states. Various techniques exist for this purpose,
+such as testing. However, testing requires running the program with every
+possible input, which is often not practical because of the large input
+space. Most often we want to check whether an input exists such that our
+program reaches a specific state. This is known as the state reachability
+problem, which can be reduced to the boolean satisfiability problem and
+solved by SAT-solvers.
+
+The reduction can be done by encoding the sequence of machine instructions
+in the program as a satisfiability formula in the theory of bit-vectors,
+known as a Satisfiability Modulo Theory (SMT) formula. The resulting
+formula is the model of our program. SMT-formulae are then further reduced
+to boolean logic formulae by technique called bit-blasting. We can then run
+a SAT-solver on resulting boolean formulae, which will tell us whether the
+program satisfies the given constraints. Effectively, we reduced the
+problem of program correctness to the satisfiability problem. Important to
+notice is that only a finite number of instructions can be transformed this
+way, which makes the model checking bounded in number of executed
+instructions.
+
+As we reduced the program correctness problem to satisfiability problem,
+solving it is NP-Complete and computationally expensive. However, model
+generation is linear in the size of the input program. We took advantage of
+this property and generated many equivalent models to analyze impact of
+different configurations on solving performance of \texttt{btormc} model
+checker.
+
+In this thesis, we present a toolbox \texttt{peRISCope} for easy
+benchmarking of different model configurations. We also present and analyze
+results from benchmarking of solving time for models with different code
+and non-code memory granularity configurations. We found that models with
+smaller memory granularity were solved faster in models generated from
+binaries compiled with \texttt{starc} compiler. On models generated from
+binaries compiled with \texttt{gcc} compiler, models with larger memory
+granularity were solved faster.
 
 ## Automated Testing of Atomic Instructions (`LR`/`SC`) Implementations in Selfie by Luis Thiele, University of Salzburg, Austria, 2022 ([PDF](https://github.com/cksystemsteaching/selfie/blob/main/theses/bachelor_thesis_thiele.pdf), [Release](https://github.com/cksystemsteaching/selfie/releases/tag/bachelor_thesis_thiele))
 
@@ -9,16 +47,13 @@ This thesis is about extending the autograder to grade an assignment called `tre
 
 Additionally, this thesis also describes the correct implementations of the `LR` and `SC` instructions in the emulator and a correct implementation of the Treiber stack in order to pass the entirety of the autograded `treiber-stack` assignment. All prerequesite requirements are also explained, namely a correct implementation of support for threads (`threads` assignment), as well as any concepts and details about the emulator one must understand such as paging or system calls.
 
-
 ## Visualizing BTOR2 Models by Markus Diller, University of Salzburg, Austria, 2022 ([PDF](https://github.com/cksystemsteaching/selfie/blob/main/theses/bachelor_thesis_diller.pdf), [Repository](https://github.com/cksystemsgroup/beator-visualizer))
 
 The process of bounded model checking is very helpful in finding edge cases for invalid program inputs. To be able to do this, a format representing the model is necessary. Unfortunately, the model file itself does not provide any information about metrics or quality of a model. Its only purpose is to serve as an input to a model checker. In this thesis a tool called `beatle` for a subset of the BTOR2 modeling language is described. It helps to alleviate the aforementioned limitations of a standalone model file by visually displaying relevant information in the browser.
 
-
 ## RISC-U Binary Optimization for Selfie by David Pape, University of Salzburg, Austria, 2021 ([PDF](https://github.com/cksystemsteaching/selfie/releases/download/bachelor_thesis_pape/bachelor_thesis_pape.pdf), [Release](https://github.com/cksystemsteaching/selfie/releases/tag/bachelor_thesis_pape))
-
-Optimizers are part of any modern compiler, and their practical importance in writing performant code cannot be understated. Still, they introduce major complexity, and its complement: bugs. In this work, we present a binary optimizer for RISC-U, a small subset of RISC-V and the target architecture of selfies[@selfie] educational compiler. To address the complexity issue, our optimizer is structured as a standalone binary, keeping the compiler simple. Since this comes at the cost of compile-time information, a binary analyzer is a prerequisite. With it, our optimizer is able to remove redundant instructions and apply several peephole optimizations, leading to a roughly five percent speedup.
 
+Optimizers are part of any modern compiler, and their practical importance in writing performant code cannot be understated. Still, they introduce major complexity, and its complement: bugs. In this work, we present a binary optimizer for RISC-U, a small subset of RISC-V and the target architecture of selfies[@selfie] educational compiler. To address the complexity issue, our optimizer is structured as a standalone binary, keeping the compiler simple. Since this comes at the cost of compile-time information, a binary analyzer is a prerequisite. With it, our optimizer is able to remove redundant instructions and apply several peephole optimizations, leading to a roughly five percent speedup.
 
 ## Linear-Time Static Analysis of RISC-V Binary Code by Thomas Wulz, University of Salzburg, Austria, 2021 ([PDF](https://github.com/cksystemsteaching/selfie/releases/download/bachelor_thesis_wulz/bachelor_thesis_wulz.pdf), [Release](https://github.com/cksystemsteaching/selfie/releases/tag/bachelor_thesis_wulz))
 
@@ -27,7 +62,7 @@ We present algorithms for static analysis of such RISC-V binaries along with a C
 
 ## RISC-V Bare-Metal Library Operating System for Selfie by Marcell Haritopoulos, University of Salzburg, Austria, 2021 ([PDF](https://github.com/cksystemsteaching/selfie/releases/download/bachelor_thesis_haritopoulos/bachelor_thesis_haritopoulos.pdf), [Release](https://github.com/cksystemsteaching/selfie/releases/tag/bachelor_thesis_haritopoulos))
 
-Despite its young age, the field of Computer Science is multifaceted and complex, at the expense of the entry barrier. Selfie, an educational project, aims to close the gap on the fundamentals. It runs on a hosted environment, but was not able to run on bare metal RISC-V hardware. We introduce a simple library operating system for RISC-V that implements the system call interface of Selfie and supports (nearly) all applications, that are written in Selfie's tiny C subset C* and are using its syscall interface, without any modifications, including Selfie itself. The OS is linked to the application as a static library and is responsible for maintaining a C environment. System calls are performed as mere function invocations instead of raising an exception using ecall. Files are stored in a static read-only file system. I/O system calls are implemented in a general way so that related projects may share the same semantics.
+Despite its young age, the field of Computer Science is multifaceted and complex, at the expense of the entry barrier. Selfie, an educational project, aims to close the gap on the fundamentals. It runs on a hosted environment, but was not able to run on bare metal RISC-V hardware. We introduce a simple library operating system for RISC-V that implements the system call interface of Selfie and supports (nearly) all applications, that are written in Selfie's tiny C subset C\* and are using its syscall interface, without any modifications, including Selfie itself. The OS is linked to the application as a static library and is responsible for maintaining a C environment. System calls are performed as mere function invocations instead of raising an exception using ecall. Files are stored in a static read-only file system. I/O system calls are implemented in a general way so that related projects may share the same semantics.
 
 ## RISC-V S-Mode-Hosted Bare-Metal Selfie by Martin Fischer, University of Salzburg, Austria, 2020 ([PDF](https://github.com/cksystemsteaching/selfie/releases/download/bachelor_thesis_fischer/bachelor_thesis_fischer.pdf), [Release](https://github.com/cksystemsteaching/selfie/releases/tag/bachelor_thesis_fischer))
 
@@ -43,7 +78,7 @@ Given a 64-bit RISC-V binary we compute a single logical formula that is satisfi
 
 ## Implementation and Application of a Parser for Boolector's Witness Format by Christoph Siller, University of Salzburg, Austria, 2020 ([PDF](https://github.com/cksystemsteaching/selfie/releases/download/bachelor_thesis_siller/bachelor_thesis_siller.pdf), [Release](https://github.com/cksystemsteaching/selfie/releases/tag/bachelor_thesis_siller))
 
-In this bachelor thesis, a parser for Boolector’s witness format is introduced, which enables automated input validation. The toolchain takes a C* file containing read calls as input and generates a Btor2 model of it, using Selfie’s model generator. The resulting model is then fed to the Boolector-based bounded model checker BtorMC. By using SMT-solving, BtorMC is able to find error states in the model within a maximum number of executed instructions. In case an error state is found, BtorMC generates a witness. This witness is parsed in order to extract the input string that causes the C* program to run into the error. To validate that the extracted string indeed triggers an error, it is fed to the C* program while running it on Mipster. If the input string was generated correctly, Mipster runs into the predicted error. In addition to the parser, this thesis provides background information on all technologies used and the concepts they are based on.
+In this bachelor thesis, a parser for Boolector’s witness format is introduced, which enables automated input validation. The toolchain takes a C* file containing read calls as input and generates a Btor2 model of it, using Selfie’s model generator. The resulting model is then fed to the Boolector-based bounded model checker BtorMC. By using SMT-solving, BtorMC is able to find error states in the model within a maximum number of executed instructions. In case an error state is found, BtorMC generates a witness. This witness is parsed in order to extract the input string that causes the C* program to run into the error. To validate that the extracted string indeed triggers an error, it is fed to the C\* program while running it on Mipster. If the input string was generated correctly, Mipster runs into the predicted error. In addition to the parser, this thesis provides background information on all technologies used and the concepts they are based on.
 
 ## Selfie - RISC-V to x86-64 Binary Translation by Alexander Kollert, University of Salzburg, Austria, 2020 ([PDF](https://github.com/cksystemsteaching/selfie/releases/download/bachelor_thesis_kollert/bachelor_thesis_kollert.pdf), [Release](https://github.com/cksystemsteaching/selfie/releases/tag/bachelor_thesis_kollert))
 

diff --git a/theses/bachelor_thesis_fejzic.pdf b/theses/bachelor_thesis_fejzic.pdf
diff --git a/tools/periscope/README.md b/tools/periscope/README.md
@@ -0,0 +1,17 @@
+# peRISCope
+
+This is a project for exploring the reasoning performance of RISC-V software
+models in BTOR2. The name `periscope` is derived from the idea of exploring
+PErformance of RISC models, and the idea of a periscope that can be used to
+explore the unknown surroundings.
+
+## Structure
+
+The project consists of two sub-projects:
+
+- [`periscope-rs`](./periscope-rs) - A binary written in Rust that can be used
+  to parse witness format producec by `btormc`, and for benchmarking of model
+  checking. Check the [README](./periscope-rs/README.md) for more information.
+- [`periscope-py`](./periscope-py) - A Python package that can be used to read
+  output of `periscope-rs` and produce plots. Check the
+  [README](./periscope-py/README.md) for more information.
diff --git a/tools/periscope/periscope-py/.gitignore b/tools/periscope/periscope-py/.gitignore
@@ -0,0 +1,3 @@
+.venv/
+
+src/__pycache__/
diff --git a/tools/periscope/periscope-py/README.md b/tools/periscope/periscope-py/README.md
@@ -0,0 +1,27 @@
+# PeRISCope - Python script for results visualisation
+
+This is a simple Python script for generating plots of benchmarking results
+produced by [`periscope-rs`](../periscope-rs).
+
+## Usage
+
+Make sure you installed all required dependencies:
+
+```sh
+pip install -r requirements.txt
+```
+
+After that, run the main script file with `--help` flag to get more information: 
+
+```sh
+python src/periscope.py --help
+```
+
+## Example usage: 
+
+```sh
+# Generate a plot with bars for each configuration, overall sort-by median time,
+# and scale the y-axis logarithmically. Plot results contained in `results_dir`
+# and save the plot to test.png file.
+python src/periscope.py --sort-by median --scale log --type cmp-bars -o test.png results_dir
+```
diff --git a/tools/periscope/periscope-py/pyrightconfig.json b/tools/periscope/periscope-py/pyrightconfig.json
@@ -0,0 +1,27 @@
+{
+  "include": [
+    "src",
+  ],
+
+  "exclude": [
+    "**/node_modules",
+    "**/__pycache__",
+    ".venv",
+  ],
+
+  "venvPath": ".",
+  "venv": ".venv",
+
+  "ignore": [
+    "src/oldstuff"
+  ],
+
+  "reportMissingImports": true,
+  "reportMissingTypeStubs": false,
+
+  "executionEnvironments": [
+    {
+      "root": "src"
+    }
+  ]
+}
diff --git a/tools/periscope/periscope-py/requirements.txt b/tools/periscope/periscope-py/requirements.txt
@@ -0,0 +1,2 @@
+jsonpickle==3.0.4
+matplotlib==3.8.4
diff --git a/tools/periscope/periscope-py/src/cmp_bars.py b/tools/periscope/periscope-py/src/cmp_bars.py
@@ -0,0 +1,134 @@
+import argparse
+import functools
+
+from matplotlib import figure
+import matplotlib.pyplot as plt
+import numpy as np
+
+from periscope_result import BenchResult
+
+
+def cmp_median(medians: list[float], labels: list[str]):
+    def compare(i, j):
+        med_cmp = medians[i] - medians[j]
+        if med_cmp == 0:
+            return labels[i] < labels[j]
+        else:
+            return med_cmp
+
+    return compare
+
+
+def cmp_median_multi():
+    def compare(
+        i: tuple[str, dict[str, BenchResult]], j: tuple[str, dict[str, BenchResult]]
+    ):
+        med_i_avg = sum([b.hyperfine_results()[0].median for b in i[1].values()]) / len(
+            i
+        )
+        med_j_avg = sum([b.hyperfine_results()[0].median for b in j[1].values()]) / len(
+            i
+        )
+
+        med_cmp = med_i_avg - med_j_avg
+        return int(med_cmp)
+
+    return compare
+
+
+def cmp_legend():
+    def compare(a: str, b: str):
+        all_numbers_a = list(
+            filter(lambda x: len(x) > 0 and x[0].isdigit(), a.split("-"))
+        )
+        all_numbers_b = list(
+            filter(lambda x: len(x) > 0 and x[0].isdigit(), b.split("-"))
+        )
+
+        for num_a, num_b in zip(all_numbers_a, all_numbers_b):
+            num_a = int(num_a.removesuffix("b"))
+            num_b = int(num_b.removesuffix("b"))
+
+            if num_a != num_b:
+                return num_a - num_b
+
+        return 0
+
+    return compare
+
+
+def plot_cmp_bars(
+    args: argparse.Namespace,
+    periscope_results: dict[str, list[BenchResult]],
+    figure: figure.Figure,
+):
+    legend = sorted(
+        list(periscope_results.keys()),
+        key=functools.cmp_to_key(cmp_legend()),
+        # key=lambda x: int(x.split("-")[0].removesuffix("b")),
+    )
+
+    bar_w = (1 - 0.4) / len(legend)
+    group_w = bar_w * len(legend)
+    offsets = np.arange(start=-(group_w / 2), stop=group_w / 2, step=bar_w)
+    cmap = plt.get_cmap("rainbow")
+    colors = [cmap(val / len(legend)) for val in range(len(legend))]
+
+    ax = figure.subplots(1, 1)
+    _, ax2 = plt.subplots(figsize=(20, 10), constrained_layout=True)
+
+    handles = []
+
+    per_file: dict[str, dict[str, BenchResult]] = {}
+
+    for bench_config in legend:
+        for bench_name in periscope_results[bench_config]:
+            b_name = (
+                bench_name.name.removesuffix("-rotorized.btor2")
+                .removesuffix("-35")
+                .removesuffix("-10")
+                .removesuffix("-1")
+                .removesuffix("-2")
+                .removesuffix("-3")
+            )
+            if b_name not in per_file:
+                per_file[b_name] = {}
+
+            per_file[b_name][bench_config] = bench_name
+
+    plt.title("Comparison of median times with different model configurations")
+
+    if args.sort_by == "median":
+        per_file = dict(
+            sorted(per_file.items(), key=functools.cmp_to_key(cmp_median_multi()))
+        )
+
+    if args.scale == "log":
+        ax2.set_yscale("log")
+        plt.title(
+            "Comparison of median times with different model configurations (Log scale)",
+        )
+
+    plt.ylabel("Time [s]")
+
+    labels = list(per_file.keys())
+    label_positions = np.arange(len(labels)) - 0.1
+    plt.tick_params(axis="x", length=20)
+    plt.xticks(list(label_positions), labels, rotation=55, ha="right")
+
+    # label_positions = list()
+
+    for idx, file in enumerate(per_file.values()):
+        for bench_idx, bench_name in enumerate(file.values()):
+            offs = offsets[bench_idx]
+            time = bench_name.hyperfine_results()[0].median
+            color = colors[bench_idx]
+            ax = plt.bar(x=idx + offs, height=time, width=bar_w, color=color)
+            handles.append(ax)
+            # label_positions.append(idx + offs)
+
+    # plt.xticks(label_positions, labels, rotation=0)
+
+    plt.ylim(0.1, 300)
+    legend = list(map(lambda x: x.removesuffix(".json").replace("-", " "), legend))
+    plt.legend(handles=handles, labels=legend)
diff --git a/tools/periscope/periscope-py/src/histogram.py b/tools/periscope/periscope-py/src/histogram.py
@@ -0,0 +1,62 @@
+import argparse
+
+from matplotlib import figure, pyplot as plt
+import numpy as np
+
+from periscope_result import BenchResult
+
+
+def cmp_wc(labels: list[str], periscope_results: list[BenchResult]):
+    def compare(i, j):
+        wc_i = periscope_results[i].wc
+        wc_j = periscope_results[j].wc
+        if wc_i == wc_j:
+            return labels[i] < labels[j]
+        else:
+            return wc_i - wc_j
+
+    return compare
+
+
+def plot_histogram(
+    args: argparse.Namespace,
+    periscope_results: list[BenchResult],
+    figure: figure.Figure,
+    of_dump: bool,
+):
+    labels = [b.name for b in periscope_results]
+    all_times = [b.times for pr in periscope_results for b in pr.hyperfine_results()]
+
+    t_min = np.min(list(map(np.min, all_times)))
+    t_max = np.max(list(map(np.max, all_times)))
+
+    if of_dump:
+        wcs = [pr.results.wc_btormc_dump for pr in periscope_results]
+    else:
+        wcs = [pr.results.wc_raw for pr in periscope_results]
+
+    times = list(map(np.min, all_times))
+
+    _ = figure.subplots(1, 1)
+    cmap = plt.get_cmap("rainbow")
+    colors = [cmap(val / len(times)) for val in range(len(times))]
+    plt.bar(x=times, height=wcs, label=labels, color=colors, width=5)
+
+    plt.title("Solving time and model size")
+    plt.xlabel("Time [s]")
+
+    if of_dump:
+        plt.ylabel("Model Size after btormc dump [#chars]")
+    else:
+        plt.ylabel("Model Size [#chars]")
+
+    plt.xticks(np.arange(np.floor(t_min), np.floor(t_max), step=10))
+    plt.xticks(
+        list(map(np.min, all_times)),
+        labels,
+        rotation=65,
+        minor=True,
+    )
+
+    for xtick, color in zip(plt.gca().get_xticklabels(which="minor"), colors):
+        xtick.set_color(color)