Skip to content

Commit

Permalink
implement alternative DP-based LBH algorithm
Browse files Browse the repository at this point in the history
  • Loading branch information
breandan committed Jan 13, 2025
1 parent 1a93f5c commit 3922bac
Show file tree
Hide file tree
Showing 10 changed files with 204 additions and 118 deletions.
Binary file modified latex/thesis/Thesis.pdf
Binary file not shown.
4 changes: 2 additions & 2 deletions latex/thesis/content/Ch0_Literature_Review.tex
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,9 @@ \section{Semantic program repair}

Automated program repair is either implicitly or explicitly defined over a \textit{search space}, which is the space of all possible solutions. Previously, we looked at a very coarse-grained approximation, based on syntactic validity. In practice, one might wish to layer additional refinements on top of these syntactic constraints, corresponding to so-called \textit{semantic} properties such as type-soundness or well-formedness. This additional criteria lets us \textit{prune} invalid solutions or \textit{quotient} the search space by an equivalence relation, often vastly reducing the set of candidate repairs.

Semantically valid programs are always subsets of syntactically valid ones: syntax overapproximates semantics. What is meant by ``semantics'' typically implicates some degree of \textit{context-sensitivity}. A rough analogy follows: syntax is to semantics as parsers are to compilers. Compilers accept fewer programs than parsers -- like compilers, semantics is somehow more expressive than syntax -- beyond the reach of context-free grammars. This is the enigma of freedom and power in formal languages: by sacrificing contextual freedom and thereby restricting the language, we gain expressive power. By increasing freedom, we lose expressive power.
Semantically valid programs are always subsets of syntactically valid ones: syntax overapproximates semantics. What is meant by ``semantics'' typically incorporates some degree of \textit{context-sensitivity}. A rough analogy holds between syntax and semantics as between parsing and compilation: like compilers, semantics is more expressive than syntax, yet compilers accept fewer programs than parsers, and their semantics are inexpressible as context-free grammars. This is the enigma of freedom and power in formal languages: by sacrificing contextual freedom and thereby restricting the language, one gains expressive power. By increasing freedom, one loses expressive power.

This analogy breaks down in certain cases, as the syntax of a programming language may not even be context-free, whereby syntax repair may be viewed as a kind of semantic repair. The C/C++~\cite{mcpeak2004elkhound} language, for example, implements a so-called lexer-hack, introducing type names into the parser's symbol table. Though generally considered in poor taste from a language design perspective, handling these kinds of scenarios is important for building practical developer tools.
This analogy breaks down in certain cases, as the syntax of a programming language may not even be context-free, whereby syntax repair may be viewed as a kind of semantic repair. The C/C++~\cite{mcpeak2004elkhound} language, for example, implements a so-called lexer-hack, introducing type names into the parser's symbol table. Though generally considered in poor taste from a language design perspective, handling these kinds of scenarios becomes important for building practical developer tools.

One approach to handling more complex synthesis tasks uses \textit{angelic execution} to generate and optimistically evaluate execution paths. Shi et al.'s FrAngel~\cite{shi2019frangel} is a particular example of this approach, which uses component-based program synthesis in conjunction with angelic nondeterminism to repair a broken program. The idea of angelic execution can be retraced to Bod\'ik et al.~\cite{bodik2010programming} who attribute the original idea to Floyd's nondeterministic \texttt{choice} operator~\cite{floyd1967nondeterministic}. In the context of semantic program repair, angelic execution has been successfully developed for program synthesis by Singh et al.~\cite{singh2013automated} for auto-grading and providing feedback on programming assignments.

Expand Down
78 changes: 67 additions & 11 deletions src/commonMain/kotlin/ai/hypergraph/kaliningraph/automata/FSA.kt
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import ai.hypergraph.kaliningraph.*
import ai.hypergraph.kaliningraph.graphs.*
import ai.hypergraph.kaliningraph.parsing.*
import ai.hypergraph.kaliningraph.types.*
import kotlin.lazy
import kotlin.random.Random

typealias Arc = Π3A<Σᐩ>
Expand All @@ -18,10 +19,42 @@ fun STC.coords() = π2 to π3
open class FSA(open val Q: TSA, open val init: Set<Σᐩ>, open val final: Set<Σᐩ>) {
open val alphabet by lazy { Q.map { it.π2 }.toSet() }
val isNominalizable by lazy { alphabet.any { it.startsWith("[!=]") } }
val nominalForm: NOM by lazy { nominalize() }
val states by lazy { Q.states }
val stateLst by lazy { states.toList() }
val stateMap by lazy { states.withIndex().associate { it.value to it.index } }
val nominalForm: NOM by lazy { nominalize() } // Converts FSA to nominal form

val transit: Map<Σᐩ, List<Pair<Σᐩ, Σᐩ>>> by lazy {
Q.groupBy { it.π1 }.mapValues { (_, v) -> v.map { it.π2 to it.π3 } }
}
val revtransit: Map<Σᐩ, List<Pair<Σᐩ, Σᐩ>>> by lazy {
Q.groupBy { it.π3 }.mapValues { (_, v) -> v.map { it.π2 to it.π1 } }
}

val states: Set<Σᐩ> by lazy { Q.states }
// States, in a topological order
val stateLst: List<Σᐩ> by lazy {
val visited = mutableSetOf<Σᐩ>()
val topSort = mutableListOf<Σᐩ>()
val stack = init.toMutableList()

fun dfs(state: Σᐩ) {
if (state !in visited) {
visited.add(state)
topSort.add(state)
transit[state]?.forEach { (_, nextState) -> dfs(nextState) }
stack.add(state)
}
}

while (stack.isNotEmpty()) dfs(stack.removeLast())
topSort
}

fun allIndexedTxs(cfg: CFG): List<Π3A<Int>> =
(cfg.unitProductions * nominalForm.flattenedTriples).filter { (_, σ: Σᐩ, arc) -> (arc.π2)(σ) }
.map { (A, _, arc) -> Triple(stateMap[arc.π1]!!, cfg.ntMap[A]!!, stateMap[arc.π3]!!) }

val numStates: Int by lazy { states.size }

val stateMap: Map<Σᐩ, Int> by lazy { stateLst.withIndex().associate { it.value to it.index } }
// Index of every state pair of states the FSA to the shortest path distance between them
val APSP: Map<Pair<Int, Int>, Int> by lazy {
graph.APSP.map { (k, v) ->
Expand All @@ -30,13 +63,14 @@ open class FSA(open val Q: TSA, open val init: Set<Σᐩ>, open val final: Set<
}.toMap()
}

val transit: Map<Σᐩ, List<Pair<Σᐩ, Σᐩ>>> by lazy {
Q.groupBy { it.π1 }.mapValues { (_, v) -> v.map { it.π2 to it.π3 } }
}
val revtransit: Map<Σᐩ, List<Pair<Σᐩ, Σᐩ>>> by lazy {
Q.groupBy { it.π3 }.mapValues { (_, v) -> v.map { it.π2 to it.π1 } }
val allPairs: Map<Pair<Int, Int>, Set<Int>> by lazy {
graph.allPairs.entries.associate { (a, b) ->
Pair(Pair(stateMap[a.first.label]!!, stateMap[a.second.label]!!), b.map { stateMap[it.label]!! }.toSet())
}
}

val finalIdxs by lazy { final.map { stateMap[it]!! } }

val stateCoords: Sequence<STC> by lazy { states.map { it.coords().let { (i, j) -> Triple(stateMap[it]!!, i, j) } }.asSequence() }
var height = 0
var width = 0
Expand All @@ -53,9 +87,9 @@ open class FSA(open val Q: TSA, open val init: Set<Σᐩ>, open val final: Set<

fun Π3A<STC>.isValidStateTriple(): Boolean =
first.coords().dominates(second.coords()) &&
second.coords().dominates(third.coords())
second.coords().dominates(third.coords())

val edgeLabels by lazy {
val edgeLabels: Map<Pair<Σᐩ, Σᐩ>, Σᐩ> by lazy {
Q.groupBy { (a, b, c) -> a to c }
.mapValues { (_, v) -> v.map { it.π2 }.toSet().joinToString(",") }
}
Expand Down Expand Up @@ -95,6 +129,28 @@ open class FSA(open val Q: TSA, open val init: Set<Σᐩ>, open val final: Set<
// }
// }

companion object {
fun levIntersect(str: Σᐩ, cfg: CFG, radius: Int): Boolean {
val levFSA = makeLevFSA(str, radius)

val dp = Array(levFSA.numStates) { Array(levFSA.numStates) { BooleanArray(cfg.nonterminals.size) { false } } }

levFSA.allIndexedTxs(cfg).forEach { (q0, nt, q1) -> dp[q0][q1][nt] = true }

for (p in 0 until levFSA.numStates)
for (q in p+1 until levFSA.numStates)
for ((w, x, z) in cfg.tripIntProd) // w -> xz
if (!dp[p][q][w])
for (r in levFSA.allPairs[p to q] ?: emptySet())
if (dp[p][r][x] && dp[r][q][z]) {
dp[p][q][w] = true
break
}

return levFSA.finalIdxs.any { f -> dp[0][f][cfg.bindex[START_SYMBOL]] }
}
}

fun walk(from: Σᐩ, next: (Σᐩ, List<Σᐩ>) -> Int): List<Σᐩ> {
val startVtx = from
val path = mutableListOf<Σᐩ>()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ class NOM(override val Q: TSA, override val init: Set<Σᐩ>, override val final
.mapValues { (_, v) -> v.map { it.second to it.third } }
}

val flattenedTriples by lazy {
val flattenedTriples: Set<Triple<Σᐩ, StrPred, Σᐩ>> by lazy {
Q.map { (a, b, c) -> a to b.predicate() to c }.toSet()
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,10 +48,10 @@ fun CFG.intersectLevFSAP(fsa: FSA, parikhMap: ParikhMap = this.parikhMap): CFG {
val validTriples = fsa.validTriples.map { arrayOf(it.π11, it.π21, it.π31) }.toTypedArray()

val ct = (fsa.validPairs * nonterminals.indices.toSet()).toList()
// val ct1 = Array(fsa.states.size) { Array(nonterminals.size) { Array(fsa.states.size) { false } } }
// val ct1 = Array(fsa.numStates) { Array(nonterminals.size) { Array(fsa.numStates) { false } } }
// ct.filter { lengthBoundsCache[it.π3].overlaps(fsa.SPLP(it.π1, it.π2)) }
// .forEach { ct1[it.π1.π1][it.π3][it.π2.π1] = true }
val ct2 = Array(fsa.states.size) { Array(nonterminals.size) { Array(fsa.states.size) { false } } }
val ct2 = Array(fsa.numStates) { Array(nonterminals.size) { Array(fsa.numStates) { false } } }
ct.filter { fsa.obeys(it.π1, it.π2, it.π3, parikhMap) }
.forEach { ct2[it.π11][it.π3][it.π21] = true }

Expand Down Expand Up @@ -173,6 +173,7 @@ fun CFG.dropVestigialProductions(
return if (rw.size == size) rw else rw.dropVestigialProductions(criteria)
}


// Generic Bar-Hillel construction for arbitrary CFL ∩ REG language
infix fun FSA.intersect(cfg: CFG) = cfg.freeze().intersect(this)

Expand All @@ -191,9 +192,7 @@ infix fun CFG.intersect(fsa: FSA): CFG {
nonterminalProductions.mapIndexed { i, it ->
val triples = fsa.states * fsa.states * fsa.states
val (A, B, C) = it.π1 to it.π2[0] to it.π2[1]
triples.map { (p, q, r) ->
"[$p~$A~$r]" to listOf("[$p~$B~$q]", "[$q~$C~$r]")
}
triples.map { (p, q, r) -> "[$p~$A~$r]" to listOf("[$p~$B~$q]", "[$q~$C~$r]") }
}.flatten()

return (initFinal + binaryProds + unitProds).toSet().postProcess()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,9 @@ val CFG.unicodeMap by cache { terminals.associateBy { Random(it.hashCode()).next
val CFG.ntLst by cache { (symbols + "ε").toList() }
val CFG.ntMap by cache { ntLst.mapIndexed { i, s -> s to i }.toMap() }

val CFG.tripIntProd: Set<Π3A<Int>> by cache { filter { it.RHS.size == 2 }.map { bindex[it.LHS] to bindex[it.RHS[0]] to bindex[it.RHS[1]] }.toSet() }
val CFG.revUnitProds: Map<Σᐩ, List<Int>> by cache { terminals.associate { it to bimap[listOf(it)].map { bindex[it] } } }

// Maps each nonterminal to the set of nonterminal pairs that can generate it,
// which is then flattened to a list of adjacent pairs of nonterminal indices
val CFG.vindex: Array<IntArray> by cache {
Expand Down Expand Up @@ -234,6 +237,7 @@ class JoinMap(val CFG: CFG) {
}

// Maps indices to nonterminals and nonterminals to indices
// TODO: Would be nice if START had a zero index (requires rebuilding caches)
class Bindex<T>(
val set: Set<T>,
val indexedNTs: List<T> = set.toList(),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ class ParikhMap(val cfg: CFG, val size: Int, reconstruct: Boolean = true) {
fun deserializePM(str: String): Map<Int, ParikhBoundsMap> =
str.lines().map { it.split(" ") }.groupBy { it.first().toInt() }
.mapValues { (_, v) ->
v.map { it[1] to it.drop(3).chunked(3).map { it[0] to (it[1].toInt()..it[2].toInt()) }.toMap() }.toMap()
v.associate { it[1] to it.drop(3).chunked(3).associate { it[0] to (it[1].toInt()..it[2].toInt()) } }
}

fun deserialize(cfg: CFG, str: String): ParikhMap {
Expand Down
13 changes: 13 additions & 0 deletions src/commonMain/kotlin/ai/hypergraph/kaliningraph/types/Graph.kt
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,18 @@ val <G: IGraph<G, E, V>, E: IEdge<G, E, V>, V: IVertex<G, E, V>> IGraph<G, E, V>
dist
}

// AllPairs[p, q] is the set of all vertices, r, such that p ->* r ->* q
val <G: IGraph<G, E, V>, E: IEdge<G, E, V>, V: IVertex<G, E, V>> IGraph<G, E, V>.allPairs: Map<Pair<V, V>, Set<V>> by cache {
// All vertices reachable from v
val forward: Map<V, Set<V>> = vertices.associateWith { v -> transitiveClosure(setOf(v)) }

// AAll vertices that can reach v (reachable from v in reversed graph)
val backward: Map<V, Set<V>> = reversed().let { it.vertices.associateWith { v -> it.transitiveClosure(setOf(v)) } }

// For every pair (p, q), collect all vertices r that lie on some path p ->* r ->* q
vertices.flatMap { p -> vertices.map { q -> Pair(Pair(p, q), (forward[p]!! intersect backward[q]!!)) } }.filter { it.second.isNotEmpty() }.toMap()
}

val <G: IGraph<G, E, V>, E: IEdge<G, E, V>, V: IVertex<G, E, V>> IGraph<G, E, V>.degMap: Map<V, Int> by cache { vertices.associateWith { it.neighbors.size } }
val <G: IGraph<G, E, V>, E: IEdge<G, E, V>, V: IVertex<G, E, V>> IGraph<G, E, V>.edges: Set<E> by cache { edgMap.values.flatten().toSet() }
val <G: IGraph<G, E, V>, E: IEdge<G, E, V>, V: IVertex<G, E, V>> IGraph<G, E, V>.edgList: List<Π2<V, E>> by cache { vertices.flatMap { s -> s.outgoing.map { s to it } } }
Expand Down Expand Up @@ -312,6 +324,7 @@ val <G: IGraph<G, E, V>, E: IEdge<G, E, V>, V: IVertex<G, E, V>> IVertex<G, E, V
val <G: IGraph<G, E, V>, E: IEdge<G, E, V>, V: IVertex<G, E, V>> IVertex<G, E, V>.neighbors: Set<V> by cache { outgoing.map { it.target }.toSet() }
val <G: IGraph<G, E, V>, E: IEdge<G, E, V>, V: IVertex<G, E, V>> IVertex<G, E, V>.outdegree: Int get() = neighbors.size


abstract class AGF<G, E, V> : IGF<G, E, V>
where G : IGraph<G, E, V>, E : IEdge<G, E, V>, V : IVertex<G, E, V> {
override val deepHashCode: Int = Random.nextInt()
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
package ai.hypergraph.kaliningraph.parsing

import Grammars
import Grammars.shortS2PParikhMap
import ai.hypergraph.kaliningraph.*
import ai.hypergraph.kaliningraph.automata.*
import ai.hypergraph.kaliningraph.repair.vanillaS2PCFG
Expand Down Expand Up @@ -320,7 +319,7 @@ class BarHillelTest {
val toRepair = origStr.tokenizeByWhitespace()
val levDist = 2
val levBall = makeLevFSA(toRepair, levDist)
println(levBall.states.size)
println(levBall.numStates)
// println(levBall.toDot())
// throw Exception("")
val intGram = gram.intersectLevFSA(levBall)
Expand Down Expand Up @@ -462,4 +461,16 @@ class BarHillelTest {

assertEquals(overwrittenRepairs, allTriples)
}

/*
./gradlew jvmTest --tests "ai.hypergraph.kaliningraph.parsing.BarHillelTest.matrixLBHTest"
*/
@Test
fun matrixLBHTest() {
val str = "NAME ( STRING . NAME ( ( NAME & NAME ) ) or STRING NEWLINE"
println(str.tokenizeByWhitespace().size)

measureTimedValue { FSA.levIntersect(str, vanillaS2PCFG, 3) }
.also { println("${it.value} / ${it.duration}") }
}
}
Loading

0 comments on commit 3922bac

Please sign in to comment.