describe experimental setup

breandan · Dec 4, 2024 · 6aec385 · 6aec385
1 parent ed933f6
commit 6aec385
Show file tree

Hide file tree

Showing 4 changed files with 10 additions and 5 deletions.
diff --git a/latex/thesis/Thesis.pdf b/latex/thesis/Thesis.pdf
diff --git a/latex/thesis/content/Ch3_Deterministic_Repair.tex b/latex/thesis/content/Ch3_Deterministic_Repair.tex
@@ -19,7 +19,7 @@ \chapter{\rm\bfseries Deterministic Program Repair}
 
 \section{Levenshtein Automata}
 
-Levenshtein automata are finite automata that recognize all and only strings within a given edit distance of a reference string by permitting insertions, deletions, and substitutions. For example, suppose we have a string \texttt{( ) )}, and wish to find nearby repairs. To represent the language of small edits, there is an automaton, called the Levenshtein automaton, recognizing every single string that can be formed by inserting, substituting or deleting a parenthesis. We depict this automaton in Figure~\ref{fig:lev_automaton}.
+Levenshtein automata are finite automata that recognize all and only strings within a given edit distance of a reference string by permitting insertions, deletions, and substitutions. For example, suppose we have a string \texttt{( ) )}, and wish to find nearby repairs. To represent the language of nearby edits, there is an automaton, called the Levenshtein automaton, recognizing every single string that can be formed by inserting, substituting or deleting a parenthesis. We depict this automaton in Figure~\ref{fig:lev_automaton}.
 
 \begin{figure}[h!]
   \input{content/figures/lev1_simp}

diff --git a/latex/thesis/content/Ch4_Probabilistic_Repair.tex b/latex/thesis/content/Ch4_Probabilistic_Repair.tex
@@ -9,12 +9,14 @@ \chapter{\rm\bfseries Probabilistic Program Repair}
 
 We will consider two kinds of probabilistic models: a constrained Markov model and an unconstrained transformer-based neural network trained on program repair, then evaluate the performance of these models on a syntax repair benchmark consisting of pairwise program transformations. As we will show, the constrained Markov model is able to achieve state-of-the-art precision on blind prediction of the lexical sequence.
 
+Here we give each model 5k+ syntax repairs of varying lengths and Levenshtein distances and measure the precision at varying cutoffs. For example, if the ground truth syntax repair was contained in the top 10 results for half of the repair instances, the model's P@10 would be 50\%.
+
 \begin{figure}[H]
   \resizebox{.24\textwidth}{!}{\input{../popl2025/len_dist_tidy}}
   \resizebox{.24\textwidth}{!}{\input{../popl2025/len_dist_bifi_all}}
   \resizebox{.24\textwidth}{!}{\input{../popl2025/len_dist_s2p}}
   \resizebox{.24\textwidth}{!}{\input{../popl2025/len_dist_bifi}}
-  \caption{Tidyparse, Seq2Parse and BIFI repair precision across length and edits.}
+  \caption{Total repair precision across the entire test set.}
 \end{figure}
 
 If we give it an equivalent number of samples, the constrained Markov model attains an even wider margin.
@@ -24,18 +26,20 @@ \chapter{\rm\bfseries Probabilistic Program Repair}
   \resizebox{.24\textwidth}{!}{\input{../popl2025/len_dist_bifi_all}}
   \resizebox{.24\textwidth}{!}{\input{../popl2025/len_dist_tidy200}}
   \resizebox{.24\textwidth}{!}{\input{../popl2025/len_dist_tidy20k}}
-  \caption{Tidyparse, Seq2Parse and BIFI repair precision across length and edits.}
+  \caption{Sample efficiency increases sharply at larger precision intervals.}
 \end{figure}
 
-Now, we measure latency.
+Next, we measure latency, which attains state-of-the-art precision at about 10 seconds, and additional time results in higher precision.
 
 \begin{figure}[H]
+  \begin{center}
 %    \resizebox{.19\textwidth}{!}{\input{bar_hillel_repair.tex}}
   \resizebox{.24\textwidth}{!}{\input{../popl2025/bar_hillel_repair_1}}
   \resizebox{.24\textwidth}{!}{\input{../popl2025/bar_hillel_repair_2}}
   \resizebox{.24\textwidth}{!}{\input{../popl2025/bar_hillel_repair_3}}
 %    \resizebox{.24\textwidth}{!}{\input{bar_hillel_repair_5}}
 %\resizebox{.3\textwidth}{!}{\input{repair1_plot.tex}}
 %\resizebox{.307\textwidth}{!}{\input{repair2_plot.tex}}
-  \caption{Latency benchmarks. Note the varying axis ranges. The red line marks Seq2Parse and the orange line marks BIFI's Precision@1 on the same repairs.}\label{fig:human}
+  \end{center}
+  \caption{Latency benchmarks. Note the varying axis ranges. The red line marks Seq2Parse and the orange line marks BIFI's Precision@1.}\label{fig:human}
 \end{figure}
diff --git a/src/commonMain/kotlin/ai/hypergraph/kaliningraph/automata/GRE.kt b/src/commonMain/kotlin/ai/hypergraph/kaliningraph/automata/GRE.kt
@@ -5,6 +5,7 @@ import ai.hypergraph.kaliningraph.tensor.UTMatrix
 import ai.hypergraph.kaliningraph.types.*
 
 // Generalized regular expression: https://planetmath.org/generalizedregularexpression
+// Parsing with derivatives: https://matt.might.net/papers/might2011derivatives.pdf
 sealed class GRE(vararg val args: GRE) {
   companion object { operator fun invoke(s: Σᐩ) = ONE(s) }