Skip to content

Commit

Permalink
discuss angelic nondeterminism
Browse files Browse the repository at this point in the history
  • Loading branch information
breandan committed Dec 11, 2024
1 parent 4384259 commit 298244d
Show file tree
Hide file tree
Showing 3 changed files with 47 additions and 2 deletions.
36 changes: 36 additions & 0 deletions latex/bib/acmart.bib
Original file line number Diff line number Diff line change
Expand Up @@ -2108,6 +2108,17 @@ @article{eppstein2014k
year={2014}
}

@article{shi2019frangel,
title={Fr{A}ngel: component-based synthesis with control structures},
author={Shi, Kensen and Steinhardt, Jacob and Liang, Percy},
journal={Proceedings of the ACM on Programming Languages},
volume={3},
number={POPL},
pages={1--29},
year={2019},
publisher={ACM New York, NY, USA}
}

@inproceedings{shi2020incremental,
title={Incremental sampling without replacement for sequence models},
author={Shi, Kensen and Bieber, David and Sutton, Charles},
Expand Down Expand Up @@ -2277,4 +2288,29 @@ @article{bryant2023grammatical
pages={643--701},
year={2023},
publisher={MIT Press One Broadway, 12th Floor, Cambridge, Massachusetts 02142, USA~…}
}

@inproceedings{bodik2010programming,
title={Programming with angelic nondeterminism},
author={Bod\'ik, Rastislav and Chandra, Satish and Galenson, Joel and Kimelman, Doug and Tung, Nicholas and Barman, Shaon and Rodarmor, Casey},
booktitle={Proceedings of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages},
pages={339--352},
year={2010}
}

@inproceedings{singh2013automated,
title={Automated feedback generation for introductory programming assignments},
author={Singh, Rishabh and Gulwani, Sumit and Solar-Lezama, Armando},
booktitle={Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation},
pages={15--26},
year={2013}
}

@inproceedings{mcpeak2004elkhound,
title={Elkhound: A fast, practical {GLR} parser generator},
author={McPeak, Scott and Necula, George C},
booktitle={International Conference on Compiler Construction},
pages={73--88},
year={2004},
organization={Springer}
}
Binary file modified latex/thesis/Thesis.pdf
Binary file not shown.
13 changes: 11 additions & 2 deletions latex/thesis/content/Ch0_Literature_Review.tex
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
\chapter{\rm\bfseries Related Literature}
\label{ch:litreview}

Translating ideas into computer programs demands a high degree of precision, as computers have strict criteria for admitting valid programs. These constraints act as a failsafe against faulty programs and runtime errors, but can be tedious to debug. During the editing process, these constraints are invariably violated by the hasty or inexperienced programmer, requiring manual repair. To assist with this task, automated program repair (APR) attempts to generate possible revisions from which the author may choose. This subject has been closely investigated by programming language research and treated in a number of existing literature reviews~\cite{monperrus2018living, le2021automatic}. We direct our attention primarily towards syntax repair, which attempts to fix parsing errors, the earliest stage in program analysis.
Translating ideas into computer programs demands a high degree of precision, as computers have strict criteria for admitting valid programs. These constraints act as a failsafe against faulty programs and runtime errors, but can be tedious to debug. During the editing process, constraints are invariably violated by the hasty or inexperienced programmer, requiring manual repair. To assist with this task, automated program repair (APR) attempts to generate possible revisions from which the author may choose. This subject has been closely investigated by programming language research and treated in a number of existing literature reviews~\cite{monperrus2018living, le2021automatic}. We direct our attention primarily towards syntax repair, which attempts to fix parsing errors, the earliest stage in program analysis.

\section{Syntax Repair}
\section{Syntactic program repair}

Spellchecking is an early precursor to syntax repair that was originally developed for word processing and seeks to find, among a finite dictionary, the most likely intended revision of a misspelled word~\cite{kernighan1990spelling}. Similarly, syntax repair considers the case where this dictionary is not necessarily finite, but rather generated by a grammar representing a potentially infinite collection of words called a \textit{language}. This has applications in natural language processing~\cite{bryant2023grammatical}, although we are primarily interested in programming languages. In the case of programming language syntax, the language and corresponding grammar is typically context-free~\cite{chomsky1959algebraic}.

Expand All @@ -18,4 +18,13 @@ \section{Syntax Repair}
Neural language models are adept at learning statistical patterns, but often sacrifice validity, precision or latency. Existing neural repair models are prone to misgeneralize and hallucinate syntactically invalid repairs and do not attempt to sample from the space of all and only valid repairs. As a consequence, they have difficulty with inference scaling, where additional test time compute does not translate to improved precision on the target domain. Furthermore, the generated samples may not even be syntactically valid, as we observe in practice.

Our work aims to address all of these concerns. We try to generate every nearby valid program and prioritize the solutions by naturalness, while ensuring response time is tolerable. In other words, we attempt to satisfy soundness, completeness, naturalness and latency simultaneously.

\section{Semantic program repair}

Automated program repair is either implicitly or explicitly defined over a \textit{search space}, which is the space of all possible solutions. Previously, we looked at a very coarse-grained approximation, based on syntactic validity. In practice, we might layer additional refinements on top of these syntactic constraints, corresponding to so-called \textit{semantic} properties such as type-soundness or well-formedness. This additional criteria lets us \textit{prune} invalid solutions or \textit{quotient} the search space by an equivalence relation, often vastly reducing the set of candidate repairs.

Semantically valid program representations are typically framed as a subset of the syntactically valid ones. In some cases, the syntax of a programming language is not even context-free, in which case syntax repair may be viewed as a kind of semantic repair. The C/C++~\cite{mcpeak2004elkhound} language, for example, implements a so-called lexer-hack, introducing type names into the symbol table which is used for parsing. Though generally considered in poor taste from a language design perspective, handling these kinds of scenarios is important for building tooling for practical programming languages.

Methods such as angelic program repair have been developed to synthesize whole programs from a library of existing components. Shi et al.'s FrAngel~\cite{shi2019frangel} is one such example, which uses component-based program synthesis in conjunction with angelic nondeterminism to repair a broken program. The idea of angelic execution can be retraced to Bod\'ik et al.~\cite{bodik2010programming} who attributes the original idea to Floyd's nondeterministic \texttt{choice} operator. In the context of semantic program repair, angelic execution has been successfully developed for program synthesis by Singh et al.~\cite{singh2013automated} for autograding and providing automated program feedback in introductory programming assignments.

\clearpage

0 comments on commit 298244d

Please sign in to comment.