Skip to content

Commit

Permalink
tidy up some verbiage
Browse files Browse the repository at this point in the history
  • Loading branch information
breandan committed Dec 12, 2024
1 parent 1820ea3 commit 0220d53
Show file tree
Hide file tree
Showing 6 changed files with 30 additions and 18 deletions.
11 changes: 11 additions & 0 deletions latex/bib/acmart.bib
Original file line number Diff line number Diff line change
Expand Up @@ -2329,4 +2329,15 @@ @inproceedings{long2016automatic
booktitle={Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT symposium on principles of programming languages},
pages={298--312},
year={2016}
}

@article{floyd1967nondeterministic,
title={Nondeterministic algorithms},
author={Floyd, Robert W},
journal={Journal of the ACM (JACM)},
volume={14},
number={4},
pages={636--644},
year={1967},
publisher={ACM New York, NY, USA}
}
Binary file modified latex/thesis/Thesis.pdf
Binary file not shown.
8 changes: 4 additions & 4 deletions latex/thesis/content/Ch0_Literature_Review.tex
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,12 @@ \section{Syntactic program repair}

\section{Semantic program repair}

Automated program repair is either implicitly or explicitly defined over a \textit{search space}, which is the space of all possible solutions. Previously, we looked at a very coarse-grained approximation, based on syntactic validity. In practice, we might layer additional refinements on top of these syntactic constraints, corresponding to so-called \textit{semantic} properties such as type-soundness or well-formedness. This additional criteria lets us \textit{prune} invalid solutions or \textit{quotient} the search space by an equivalence relation, often vastly reducing the set of candidate repairs.
Automated program repair is either implicitly or explicitly defined over a \textit{search space}, which is the space of all possible solutions. Previously, we looked at a very coarse-grained approximation, based on syntactic validity. In practice, one might wish to layer additional refinements on top of these syntactic constraints, corresponding to so-called \textit{semantic} properties such as type-soundness or well-formedness. This additional criteria lets us \textit{prune} invalid solutions or \textit{quotient} the search space by an equivalence relation, often vastly reducing the set of candidate repairs.

Semantically valid program representations are typically framed as a subset of the syntactically valid ones. In some cases, the syntax of a programming language is not even context-free, in which case syntax repair may be viewed as a kind of semantic repair. The C/C++~\cite{mcpeak2004elkhound} language, for example, implements a so-called lexer-hack, introducing type names into the symbol table which is used for parsing. Though generally considered in poor taste from a language design perspective, handling these kinds of scenarios is important for building tooling for practical programming languages.
Semantically valid program representations are typically framed as a subset of the syntactically valid ones. In some cases, the syntax of a programming language is not even context-free, in which case syntax repair may be viewed as a kind of semantic repair. The C/C++~\cite{mcpeak2004elkhound} language, for example, implements a so-called lexer-hack, introducing type names into the symbol table used for parsing. Though generally considered in poor taste from a language design perspective, handling these kinds of scenarios is important for building practical developer tools.

Methods such as angelic program repair have been developed to synthesize whole programs from a library of existing components. Shi et al.'s FrAngel~\cite{shi2019frangel} is one such example, which uses component-based program synthesis in conjunction with angelic nondeterminism to repair a broken program. The idea of angelic execution can be retraced to Bod\'ik et al.~\cite{bodik2010programming} who attributes the original idea to Floyd's nondeterministic \texttt{choice} operator. In the context of semantic program repair, angelic execution has been successfully developed for program synthesis by Singh et al.~\cite{singh2013automated} for auto-grading and providing feedback on programming assignments.
One approach to handling more complex synthesis tasks uses \textit{angelic execution} to generate and optimistically evaluate execution paths. Shi et al.'s FrAngel~\cite{shi2019frangel} is a particular example of this approach, which uses component-based program synthesis in conjunction with angelic nondeterminism to repair a broken program. The idea of angelic execution can be retraced to Bod\'ik et al.~\cite{bodik2010programming} who attribute the original idea to Floyd's nondeterministic \texttt{choice} operator~\cite{floyd1967nondeterministic}. In the context of semantic program repair, angelic execution has been successfully developed for program synthesis by Singh et al.~\cite{singh2013automated} for auto-grading and providing feedback on programming assignments.

The idea of angelic execution has also been employed to great effect to assist with automated program repair. In particular, Long \& Rinard~\cite{long2016automatic} introduce a tool called Prophet and use a very similar evaluation to the one we propose to generate and ranks candidate patches from a search space, then rank the generated patches according to a learned probabilistic model. Chandra et al.~\cite{chandra2011angelic} also use angelic execution to guide the search for semantic program repairs. Both systems bypass the syntactic checking stage and search directly for semantic repairs, using a set of test cases. They do not use an explicit search space, but run the tests to reject invalid candidates. Their approach is closely related to fault localization and mutation testing in the software engineering literature.
The idea of angelic execution has also been employed to great effect to assist with automated program repair. In particular, Long \& Rinard~\cite{long2016automatic} introduce a tool called Prophet and use a very similar evaluation to ours to generate and rank candidate patches from a search space, then rank the generated patches according to a learned probabilistic model. Chandra et al.~\cite{chandra2011angelic} also use angelic execution to guide the search for semantic repairs. Both systems bypass the syntactic checking stage and search directly for semantic repairs, using a set of test cases as guard conditions to reject invalid candidates. Their approach is closely related to fault localization and mutation testing in the software engineering literature.

\clearpage
17 changes: 9 additions & 8 deletions latex/thesis/content/Ch1_Introduction.tex
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ \chapter{\rm\bfseries Introduction}

%\mcgillguidelines Clearly state the rationale and objectives of the research.

Computer programs are instructions for performing a chore that humans would rather avoid doing ourselves. In order to persuade the computer to do them for us, we must communicate our intention in a way that is plain and unambiguous. Programming languages are protocols for this dialogue, designed to enable programmers to conveniently express their intent and facilitate the exchange of information between programmers and computers.
Computer programs are instructions for performing a chore that humans would rather avoid doing ourselves. In order to persuade the computer to do them for us, we must communicate our intention in a way that is plain and unambiguous. Programming languages are protocols for this dialogue, designed to enable programmers to conveniently express their intent and facilitate the exchange of information between humans and computers.

%By observing programmers, one can learn a great deal about how this dialogue unfolds.
Programs are seldom written from left-to-right in one fell swoop. During the course of writing a program, the programmer often revisits and revises code as they write, sharing additional information and receiving feedback. Often, during this process, the programmer makes a mistake, causing the program to behave in an unexpected manner. These mistakes can manifest as a simple typographic or syntactic error, or a more subtle logical error.
Expand All @@ -22,7 +22,7 @@ \chapter{\rm\bfseries Introduction}

Program repair is a highly underdetermined problem, meaning that the validity constraints do not uniquely determine a solution. A proper theory of program repair must be able to resolve this ambiguity to infer the user's intent from an incomplete specification, and incrementally refine its guess as more information becomes available from the user.

This calculus we propose has a number of desirable properties. It is highly compositional, meaning that users can manipulate constraints on programs while retaining the algebraic closure properties, such as union, intersection, and differentiation. It is well-suited for probabilistic reasoning, meaning we can use any probabilistic model of language to guide the repair process. It is also amenable to incremental repair, meaning that we can repair programs in a streaming fashion, while the user is typing.
This calculus we propose has a number of desirable properties. It is highly compositional, meaning that users can manipulate constraints on programs while retaining the algebraic closure properties, such as union, intersection, and differentiation. It is well-suited for probabilistic reasoning, meaning we can use any autoregressive language model to guide the repair process. It is also amenable to incremental repair, meaning that we can repair programs in a streaming fashion, while the user is typing.

To explain the virtues of our approach, we need some background. Formal languages are not always closed under set-theoretic operations, e.g., CFL $\cap$ CFL is not CFL in general. Let $\cdot$ denote concatenation, $*$ be Kleene star, and $\complement$ be complementation:\\

Expand All @@ -33,14 +33,13 @@ \chapter{\rm\bfseries Introduction}
Finite$^1$ & \cmark & \cmark & \cmark & \cmark & \cmark \\
Regular$^1$ & \cmark & \cmark & \cmark & \cmark & \cmark \\
\rowcolor{slightgray} Context-free$^{1, 2}$ & \cmark & \xmark$^\dagger$ & \cmark & \cmark & \xmark \\
Conjunctive$^{1,2}$ & \cmark & \cmark & \cmark & \cmark & ? \\
Context-sensitive$^2$ & \cmark & \cmark & \cmark & + & \cmark \\
Recursively Enumerable$^2$ & \cmark & \cmark & \cmark & \cmark & \xmark \\
\end{tabular}
\end{center}
\end{table}

We would like a language family that is (1) tractable, i.e., has polynomial recognition and search complexity and (2) reasonably expressive, i.e., can represent syntactic properties of real-world programming languages.\vspace{0.2cm}
For a background theory of program repair, we would like a language family that is (1) tractable, i.e., has polynomial recognition and search complexity and (2) reasonably expressive, i.e., can represent syntactic and some semantic properties of real-world programming languages.\vspace{0.2cm}

$^\dagger$ However, CFLs are closed under intersection with regular languages.

Expand Down Expand Up @@ -98,17 +97,19 @@ \section{Example}\label{sec:example}

With some semantic constraints, we could easily narrow the results, but even in their absence, one can probably rule out (2, 3, 6) given that \texttt{5[} and \texttt{2(} are rare bigrams in Python, and knowing \texttt{df.iloc} is often followed by \texttt{[}, determine (5) is the most likely repair. This is the key insight behind our approach: we can usually locate the intended fix by exhaustively searching small repairs. As the set of small repairs is itself often small, if only we had some procedure to distinguish valid from invalid patches, the resulting solutions could be simply ranked by likelihood.

The trouble is that any such procedure must be highly efficient. We cannot afford to sample the universe of possible $d$-token edits, then reject invalid samples -- assuming it takes just 10ms to generate and check each sample, (1-6) could take 24+ hours to find. The hardness of brute-force search grows exponentially with edit distance, sentence length and alphabet size. We will need a more efficient procedure for sampling all and only small valid repairs.
The trouble is that any such procedure must be highly efficient to be practically useful for developers. We cannot afford to sample the universe of possible $d$-token edits, then reject invalid samples -- assuming it takes just 10ms to generate and check each sample, (1-6) could take 24+ hours to find. The hardness of brute-force search grows exponentially with edit distance, sentence length and alphabet size. We will need a more efficient procedure for sampling all and only small valid repairs.

We will now proceed to give an informal intuition behind our method, then formalize it in the following sections. At a high level, our approach is to construct a language that represents all syntactically valid patches within a certain edit distance of the invalid code fragment. To do so, we first lexicalize the invalid source code, which simply abstracts over numbers and named identifiers.
We will first proceed to give an informal intuition behind our method, then formalize it in the following sections. At a high level, our approach is to construct a language that represents all syntactically valid patches within a certain edit distance of the invalid code fragment. To do so, we first lexicalize the invalid source code, which simply abstracts over numbers and named identifiers, but retains all other keywords.

From the lexical string, we build an automaton that represents all possible strings within a certain edit distance. Then, we proceed to construct a synthetic grammar, recognizing all strings in the intersection of the programming language and the edit ball. Finally, this grammar is reduced to a normal form and decoded with the help of a statistical model to produce a list of suggested repairs.

\begin{figure}[h!]
\includegraphics[width=\textwidth]{content/figures/flow.pdf}\vspace{-1pt}
\caption{Simplified dataflow. Given a grammar and broken code fragment, we create a automaton generating the language of small edits, then construct a grammar representing the intersection of the two languages. This grammar can be converted to a finite automaton, determinized, then decoded to produce a list of repairs.}\label{fig:arch_simp}
\caption{Simplified dataflow. Given a grammar and broken code fragment, we create a automaton generating the language of small edits, then construct a grammar representing the intersection of the two languages. This grammar can be converted to a finite automaton, determinized, then decoded to produce an explicit list of repairs.}\label{fig:arch_simp}
\end{figure}

We will now proceed to give a more detailed background on the formal language theory, then proceed to give the full Bar-Hillel construction and our specialization to Levenshtein automata intersections.
The advantage of using this approach for syntax repair is that it performs the intersection not in the set domain, but in the domain of grammars and automata, thereby avoiding a great deal of useless work that would occur were we to compute the intersection via sampling and rejection.

To explain each component, we will first require a more detailed background on the formal language theory, then we can describe the full Bar-Hillel construction and our specialization to Levenshtein automata intersections.

\clearpage
2 changes: 1 addition & 1 deletion latex/thesis/content/Ch2_Formal_Language_Theory.tex
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ \chapter{\rm\bfseries Formal Language Theory}

Similar to sets, it is possible to combine languages by manipulating their grammars, mirroring the setwise notions of union, intersection, complementation and difference over languages. These operations are convenient for combining, for example, syntactic and semantic constraints on programs. For example, we might have two grammars, $G_a, G_b$ representing two properties that are desirable or necessary for a program to be considered valid. %We can treat valid programs $P$ as a subset of the language intersection $P \subseteq \mathcal{L}(G_a) \cap \mathcal{L}(G_b)$.

Like all representations, grammars are themselves a trade-off between expressiveness and efficiency. It is possible to represent the same finite set with multiple representations of varying complexity. For example, the set of strings containing ten or fewer balanced parentheses can be expressed as a finite automaton containing millions of states, or a simple conjunctive grammar containing a few productions, $\mathcal{L}\Big(S \rightarrow ( ) \mid (S) \mid S S \Big) \cap \Sigma^{[0,10]}$.
Like all representations, grammars are themselves a trade-off between expressiveness and efficiency. It is possible to represent the same finite set with multiple representations of varying complexity. For example, the set of strings containing ten or fewer balanced parentheses can be expressed as a finite automaton containing millions of states, or a simple language conjunction containing a few productions, e.g., $\mathcal{L}\Big(S \rightarrow ( ) \mid (S) \mid S S \Big) \cap \Sigma^{[0,10]}$.

The choice of representation is heavily usage-dependent. For example, if we are interested in recognition, we might favor a disjoint representation, allowing properties to be checked independently without merging, whereas if we are interested in generation or deciding non-emptiness, we might prefer a unified representation which can be efficiently sampled without rejection.

Expand Down
10 changes: 5 additions & 5 deletions latex/thesis/content/Ch4_Probabilistic_Repair.tex
Original file line number Diff line number Diff line change
Expand Up @@ -44,16 +44,16 @@ \chapter{\rm\bfseries Probabilistic Program Repair}
\caption{Latency benchmarks. Note the varying axis ranges. The red line marks Seq2Parse and the orange line marks BIFI's Precision@1.}\label{fig:human}
\end{figure}

\noindent For Precision@k, we measure the precision of our model at top-k prediction out of all instances presented, regardless of outcome. Four outcomes are possible in each repair instance, each a strict subset of the successor.
\noindent For Precision@k, we measure the precision of our model at top-k prediction out of all instances presented, regardless of outcome. Four outcomes are possible in each repair instance, each a strict superset of the successor.

\begin{enumerate}
\item $\textsc{Rank}(\sigma') < K$: the top-K sorted results contain the true repair
\item $\textsc{Dec}(G_\cap) \rightsquigarrow \sigma'$: the true repair is sampled by the decoder
\item $\sigma' \in \mathcal{L}(G_\cap)$: the true repair is recognized by the intersection grammar
\item $|G_\cap| < \textsc{MaxHeap}:$ the intersection grammar fits in memory
\item $\sigma' \in \mathcal{L}(G_\cap)$: the true repair is recognized by the intersection grammar
\item $\textsc{Dec}(G_\cap) \rightsquigarrow \sigma'$: the true repair is sampled by the decoder
\item $\textsc{Rank}(\sigma') < K$: the top-K sorted results contain the true repair
\end{enumerate}

\noindent Repair cases that pass all four are the ideal, meaning the true repair was sampled and ranked highly, but (1) often fails. This indicates the decoder drew the true repair but was not discerning enough to realize its importance. Cases that fail (2) mean the decoder had the opportunity to, but did not actually draw the true repair, which occurs when the intersection language is too large to fully explore. In rare cases, the decoder was incapable of sampling the true repair, as the JVM ran out of memory. Below, we give a summary of distribution over outcomes across the test set.
\noindent Repair cases that pass all four are the ideal, meaning the true repair was sampled and ranked highly, but (4) often fails. This indicates the decoder drew the true repair but was not discerning enough to realize its importance. Cases that fail (2) mean the decoder had the opportunity to, but did not actually draw the true repair, which occurs when the intersection language is too large to fully explore. In rare cases, the decoder was incapable of sampling the true repair, as the JVM ran out of memory. Below, we give a summary of distribution over outcomes across the test set.

\begin{figure}[H]
\begin{center}
Expand Down

0 comments on commit 0220d53

Please sign in to comment.