implement alternative DP-based LBH algorithm

breandan · Jan 13, 2025 · 3922bac · 3922bac
1 parent 1a93f5c
commit 3922bac
Show file tree

Hide file tree

Showing 10 changed files with 204 additions and 118 deletions.
diff --git a/latex/thesis/Thesis.pdf b/latex/thesis/Thesis.pdf
diff --git a/latex/thesis/content/Ch0_Literature_Review.tex b/latex/thesis/content/Ch0_Literature_Review.tex
@@ -23,9 +23,9 @@ \section{Semantic program repair}
 
 Automated program repair is either implicitly or explicitly defined over a \textit{search space}, which is the space of all possible solutions. Previously, we looked at a very coarse-grained approximation, based on syntactic validity. In practice, one might wish to layer additional refinements on top of these syntactic constraints, corresponding to so-called \textit{semantic} properties such as type-soundness or well-formedness. This additional criteria lets us \textit{prune} invalid solutions or \textit{quotient} the search space by an equivalence relation, often vastly reducing the set of candidate repairs.
 
-Semantically valid programs are always subsets of syntactically valid ones: syntax overapproximates semantics. What is meant by ``semantics'' typically implicates some degree of \textit{context-sensitivity}. A rough analogy follows: syntax is to semantics as parsers are to compilers. Compilers accept fewer programs than parsers -- like compilers, semantics is somehow more expressive than syntax -- beyond the reach of context-free grammars. This is the enigma of freedom and power in formal languages: by sacrificing contextual freedom and thereby restricting the language, we gain expressive power. By increasing freedom, we lose expressive power.
+Semantically valid programs are always subsets of syntactically valid ones: syntax overapproximates semantics. What is meant by ``semantics'' typically incorporates some degree of \textit{context-sensitivity}. A rough analogy holds between syntax and semantics as between parsing and compilation: like compilers, semantics is more expressive than syntax, yet compilers accept fewer programs than parsers, and their semantics are inexpressible as context-free grammars. This is the enigma of freedom and power in formal languages: by sacrificing contextual freedom and thereby restricting the language, one gains expressive power. By increasing freedom, one loses expressive power.
 
-This analogy breaks down in certain cases, as the syntax of a programming language may not even be context-free, whereby syntax repair may be viewed as a kind of semantic repair. The C/C++~\cite{mcpeak2004elkhound} language, for example, implements a so-called lexer-hack, introducing type names into the parser's symbol table. Though generally considered in poor taste from a language design perspective, handling these kinds of scenarios is important for building practical developer tools.
+This analogy breaks down in certain cases, as the syntax of a programming language may not even be context-free, whereby syntax repair may be viewed as a kind of semantic repair. The C/C++~\cite{mcpeak2004elkhound} language, for example, implements a so-called lexer-hack, introducing type names into the parser's symbol table. Though generally considered in poor taste from a language design perspective, handling these kinds of scenarios becomes important for building practical developer tools.
 
 One approach to handling more complex synthesis tasks uses \textit{angelic execution} to generate and optimistically evaluate execution paths. Shi et al.'s FrAngel~\cite{shi2019frangel} is a particular example of this approach, which uses component-based program synthesis in conjunction with angelic nondeterminism to repair a broken program. The idea of angelic execution can be retraced to Bod\'ik et al.~\cite{bodik2010programming} who attribute the original idea to Floyd's nondeterministic \texttt{choice} operator~\cite{floyd1967nondeterministic}. In the context of semantic program repair, angelic execution has been successfully developed for program synthesis by Singh et al.~\cite{singh2013automated} for auto-grading and providing feedback on programming assignments.
 

diff --git a/src/commonMain/kotlin/ai/hypergraph/kaliningraph/automata/FSA.kt b/src/commonMain/kotlin/ai/hypergraph/kaliningraph/automata/FSA.kt
@@ -4,6 +4,7 @@ import ai.hypergraph.kaliningraph.*
 import ai.hypergraph.kaliningraph.graphs.*
 import ai.hypergraph.kaliningraph.parsing.*
 import ai.hypergraph.kaliningraph.types.*
+import kotlin.lazy
 import kotlin.random.Random
 
 typealias Arc = Π3A<Σᐩ>
@@ -18,10 +19,42 @@ fun STC.coords() = π2 to π3
 open class FSA(open val Q: TSA, open val init: Set<Σᐩ>, open val final: Set<Σᐩ>) {
   open val alphabet by lazy { Q.map { it.π2 }.toSet() }
   val isNominalizable by lazy { alphabet.any { it.startsWith("[!=]") } }
-  val nominalForm: NOM by lazy { nominalize() }
-  val states by lazy { Q.states }
-  val stateLst by lazy { states.toList() }
-  val stateMap by lazy { states.withIndex().associate { it.value to it.index } }
+  val nominalForm: NOM by lazy { nominalize() } // Converts FSA to nominal form
+
+  val transit: Map<Σᐩ, List<Pair<Σᐩ, Σᐩ>>> by lazy {
+    Q.groupBy { it.π1 }.mapValues { (_, v) -> v.map { it.π2 to it.π3 } }
+  }
+  val revtransit: Map<Σᐩ, List<Pair<Σᐩ, Σᐩ>>> by lazy {
+    Q.groupBy { it.π3 }.mapValues { (_, v) -> v.map { it.π2 to it.π1 } }
+  }
+
+  val states: Set<Σᐩ> by lazy { Q.states }
+  // States, in a topological order
+  val stateLst: List<Σᐩ> by lazy {
+    val visited = mutableSetOf<Σᐩ>()
+    val topSort = mutableListOf<Σᐩ>()
+    val stack = init.toMutableList()
+
+    fun dfs(state: Σᐩ) {
+      if (state !in visited) {
+        visited.add(state)
+        topSort.add(state)
+        transit[state]?.forEach { (_, nextState) -> dfs(nextState) }
+        stack.add(state)
+      }
+    }
+
+    while (stack.isNotEmpty()) dfs(stack.removeLast())
+    topSort
+  }
+
+  fun allIndexedTxs(cfg: CFG): List<Π3A<Int>> =
+    (cfg.unitProductions * nominalForm.flattenedTriples).filter { (_, σ: Σᐩ, arc) -> (arc.π2)(σ) }
+      .map { (A, _, arc) -> Triple(stateMap[arc.π1]!!, cfg.ntMap[A]!!, stateMap[arc.π3]!!) }
+
+  val numStates: Int by lazy { states.size }
+
+  val stateMap: Map<Σᐩ, Int> by lazy { stateLst.withIndex().associate { it.value to it.index } }
   // Index of every state pair of states the FSA to the shortest path distance between them
   val APSP: Map<Pair<Int, Int>, Int> by lazy {
     graph.APSP.map { (k, v) ->
@@ -30,13 +63,14 @@ open class FSA(open val Q: TSA, open val init: Set<Σᐩ>, open val final: Set<
     }.toMap()
   }
 
-  val transit: Map<Σᐩ, List<Pair<Σᐩ, Σᐩ>>> by lazy {
-    Q.groupBy { it.π1 }.mapValues { (_, v) -> v.map { it.π2 to it.π3 } }
-  }
-  val revtransit: Map<Σᐩ, List<Pair<Σᐩ, Σᐩ>>> by lazy {
-    Q.groupBy { it.π3 }.mapValues { (_, v) -> v.map { it.π2 to it.π1 } }
+  val allPairs: Map<Pair<Int, Int>, Set<Int>> by lazy {
+    graph.allPairs.entries.associate { (a, b) ->
+      Pair(Pair(stateMap[a.first.label]!!, stateMap[a.second.label]!!), b.map { stateMap[it.label]!! }.toSet())
+    }
   }
 
+  val finalIdxs by lazy { final.map { stateMap[it]!! } }
+
   val stateCoords: Sequence<STC> by lazy { states.map { it.coords().let { (i, j) -> Triple(stateMap[it]!!, i, j) } }.asSequence() }
   var height = 0
   var width = 0
@@ -53,9 +87,9 @@ open class FSA(open val Q: TSA, open val init: Set<Σᐩ>, open val final: Set<
 
   fun Π3A<STC>.isValidStateTriple(): Boolean =
     first.coords().dominates(second.coords()) &&
-        second.coords().dominates(third.coords())
+    second.coords().dominates(third.coords())
 
-  val edgeLabels by lazy {
+  val edgeLabels: Map<Pair<Σᐩ, Σᐩ>, Σᐩ> by lazy {
     Q.groupBy { (a, b, c) -> a to c }
       .mapValues { (_, v) -> v.map { it.π2 }.toSet().joinToString(",") }
   }
@@ -95,6 +129,28 @@ open class FSA(open val Q: TSA, open val init: Set<Σᐩ>, open val final: Set<
 //      }
 //  }
 
+  companion object {
+    fun levIntersect(str: Σᐩ, cfg: CFG, radius: Int): Boolean {
+      val levFSA = makeLevFSA(str, radius)
+
+      val dp = Array(levFSA.numStates) { Array(levFSA.numStates) { BooleanArray(cfg.nonterminals.size) { false } } }
+
+      levFSA.allIndexedTxs(cfg).forEach { (q0, nt, q1) -> dp[q0][q1][nt] = true }
+
+      for (p in 0 until levFSA.numStates)
+        for (q in p+1 until levFSA.numStates)
+          for ((w, x, z) in cfg.tripIntProd) // w -> xz
+            if (!dp[p][q][w])
+              for (r in levFSA.allPairs[p to q] ?: emptySet())
+                if (dp[p][r][x] && dp[r][q][z]) {
+                  dp[p][q][w] = true
+                  break
+                }
+
+      return levFSA.finalIdxs.any { f -> dp[0][f][cfg.bindex[START_SYMBOL]] }
+    }
+  }
+
   fun walk(from: Σᐩ, next: (Σᐩ, List<Σᐩ>) -> Int): List<Σᐩ> {
     val startVtx = from
     val path = mutableListOf<Σᐩ>()

diff --git a/src/commonMain/kotlin/ai/hypergraph/kaliningraph/automata/Nominal.kt b/src/commonMain/kotlin/ai/hypergraph/kaliningraph/automata/Nominal.kt
@@ -24,7 +24,7 @@ class NOM(override val Q: TSA, override val init: Set<Σᐩ>, override val final
       .mapValues { (_, v) -> v.map { it.second to it.third } }
   }
 
-  val flattenedTriples by lazy {
+  val flattenedTriples: Set<Triple<Σᐩ, StrPred, Σᐩ>> by lazy {
     Q.map { (a, b, c) -> a to b.predicate() to c }.toSet()
   }
 

diff --git a/src/commonMain/kotlin/ai/hypergraph/kaliningraph/parsing/BarHillel.kt b/src/commonMain/kotlin/ai/hypergraph/kaliningraph/parsing/BarHillel.kt
@@ -48,10 +48,10 @@ fun CFG.intersectLevFSAP(fsa: FSA, parikhMap: ParikhMap = this.parikhMap): CFG {
   val validTriples = fsa.validTriples.map { arrayOf(it.π1.π1, it.π2.π1, it.π3.π1) }.toTypedArray()
 
   val ct = (fsa.validPairs * nonterminals.indices.toSet()).toList()
-//  val ct1 = Array(fsa.states.size) { Array(nonterminals.size) { Array(fsa.states.size) { false } } }
+//  val ct1 = Array(fsa.numStates) { Array(nonterminals.size) { Array(fsa.numStates) { false } } }
 //  ct.filter { lengthBoundsCache[it.π3].overlaps(fsa.SPLP(it.π1, it.π2)) }
 //    .forEach { ct1[it.π1.π1][it.π3][it.π2.π1] = true }
-  val ct2 = Array(fsa.states.size) { Array(nonterminals.size) { Array(fsa.states.size) { false } } }
+  val ct2 = Array(fsa.numStates) { Array(nonterminals.size) { Array(fsa.numStates) { false } } }
   ct.filter { fsa.obeys(it.π1, it.π2, it.π3, parikhMap) }
     .forEach { ct2[it.π1.π1][it.π3][it.π2.π1] = true }
 
@@ -173,6 +173,7 @@ fun CFG.dropVestigialProductions(
   return if (rw.size == size) rw else rw.dropVestigialProductions(criteria)
 }
 
+
 // Generic Bar-Hillel construction for arbitrary CFL ∩ REG language
 infix fun FSA.intersect(cfg: CFG) = cfg.freeze().intersect(this)
 
@@ -191,9 +192,7 @@ infix fun CFG.intersect(fsa: FSA): CFG {
     nonterminalProductions.mapIndexed { i, it ->
       val triples = fsa.states * fsa.states * fsa.states
       val (A, B, C) = it.π1 to it.π2[0] to it.π2[1]
-      triples.map { (p, q, r) ->
-        "[$p~$A~$r]" to listOf("[$p~$B~$q]", "[$q~$C~$r]")
-      }
+      triples.map { (p, q, r) -> "[$p~$A~$r]" to listOf("[$p~$B~$q]", "[$q~$C~$r]") }
     }.flatten()
 
   return (initFinal + binaryProds + unitProds).toSet().postProcess()

diff --git a/src/commonMain/kotlin/ai/hypergraph/kaliningraph/parsing/CFG.kt b/src/commonMain/kotlin/ai/hypergraph/kaliningraph/parsing/CFG.kt
@@ -60,6 +60,9 @@ val CFG.unicodeMap by cache { terminals.associateBy { Random(it.hashCode()).next
 val CFG.ntLst by cache { (symbols + "ε").toList() }
 val CFG.ntMap by cache { ntLst.mapIndexed { i, s -> s to i }.toMap() }
 
+val CFG.tripIntProd: Set<Π3A<Int>> by cache { filter { it.RHS.size == 2 }.map { bindex[it.LHS] to bindex[it.RHS[0]] to bindex[it.RHS[1]] }.toSet() }
+val CFG.revUnitProds: Map<Σᐩ, List<Int>> by cache { terminals.associate { it to bimap[listOf(it)].map { bindex[it] } } }
+
 // Maps each nonterminal to the set of nonterminal pairs that can generate it,
 // which is then flattened to a list of adjacent pairs of nonterminal indices
 val CFG.vindex: Array<IntArray> by cache {
@@ -234,6 +237,7 @@ class JoinMap(val CFG: CFG) {
 }
 
 // Maps indices to nonterminals and nonterminals to indices
+// TODO: Would be nice if START had a zero index (requires rebuilding caches)
 class Bindex<T>(
   val set: Set<T>,
   val indexedNTs: List<T> = set.toList(),

diff --git a/src/commonMain/kotlin/ai/hypergraph/kaliningraph/parsing/Parikh.kt b/src/commonMain/kotlin/ai/hypergraph/kaliningraph/parsing/Parikh.kt
@@ -70,7 +70,7 @@ class ParikhMap(val cfg: CFG, val size: Int, reconstruct: Boolean = true) {
     fun deserializePM(str: String): Map<Int, ParikhBoundsMap> =
       str.lines().map { it.split(" ") }.groupBy { it.first().toInt() }
         .mapValues { (_, v) ->
-          v.map { it[1] to it.drop(3).chunked(3).map { it[0] to (it[1].toInt()..it[2].toInt()) }.toMap() }.toMap()
+          v.associate { it[1] to it.drop(3).chunked(3).associate { it[0] to (it[1].toInt()..it[2].toInt()) } }
         }
 
     fun deserialize(cfg: CFG, str: String): ParikhMap {

diff --git a/src/commonMain/kotlin/ai/hypergraph/kaliningraph/types/Graph.kt b/src/commonMain/kotlin/ai/hypergraph/kaliningraph/types/Graph.kt
@@ -207,6 +207,18 @@ val <G: IGraph<G, E, V>, E: IEdge<G, E, V>, V: IVertex<G, E, V>> IGraph<G, E, V>
   dist
 }
 
+// AllPairs[p, q] is the set of all vertices, r, such that p ->* r ->* q
+val <G: IGraph<G, E, V>, E: IEdge<G, E, V>, V: IVertex<G, E, V>> IGraph<G, E, V>.allPairs: Map<Pair<V, V>, Set<V>> by cache {
+  // All vertices reachable from v
+  val forward: Map<V, Set<V>> = vertices.associateWith { v -> transitiveClosure(setOf(v)) }
+
+  // AAll vertices that can reach v (reachable from v in reversed graph)
+  val backward: Map<V, Set<V>> = reversed().let { it.vertices.associateWith { v -> it.transitiveClosure(setOf(v)) } }
+
+  // For every pair (p, q), collect all vertices r that lie on some path p ->* r ->* q
+  vertices.flatMap { p -> vertices.map { q -> Pair(Pair(p, q), (forward[p]!! intersect backward[q]!!)) } }.filter { it.second.isNotEmpty() }.toMap()
+}
+
 val <G: IGraph<G, E, V>, E: IEdge<G, E, V>, V: IVertex<G, E, V>> IGraph<G, E, V>.degMap: Map<V, Int>     by cache { vertices.associateWith { it.neighbors.size } }
 val <G: IGraph<G, E, V>, E: IEdge<G, E, V>, V: IVertex<G, E, V>> IGraph<G, E, V>.edges: Set<E>           by cache { edgMap.values.flatten().toSet() }
 val <G: IGraph<G, E, V>, E: IEdge<G, E, V>, V: IVertex<G, E, V>> IGraph<G, E, V>.edgList: List<Π2<V, E>> by cache { vertices.flatMap { s -> s.outgoing.map { s to it } } }
@@ -312,6 +324,7 @@ val <G: IGraph<G, E, V>, E: IEdge<G, E, V>, V: IVertex<G, E, V>> IVertex<G, E, V
 val <G: IGraph<G, E, V>, E: IEdge<G, E, V>, V: IVertex<G, E, V>> IVertex<G, E, V>.neighbors: Set<V> by cache { outgoing.map { it.target }.toSet() }
 val <G: IGraph<G, E, V>, E: IEdge<G, E, V>, V: IVertex<G, E, V>> IVertex<G, E, V>.outdegree: Int get() = neighbors.size
 
+
 abstract class AGF<G, E, V> : IGF<G, E, V>
   where G : IGraph<G, E, V>, E : IEdge<G, E, V>, V : IVertex<G, E, V> {
   override val deepHashCode: Int = Random.nextInt()

diff --git a/src/commonTest/kotlin/ai/hypergraph/kaliningraph/parsing/BarHillelTest.kt b/src/commonTest/kotlin/ai/hypergraph/kaliningraph/parsing/BarHillelTest.kt
@@ -1,7 +1,6 @@
 package ai.hypergraph.kaliningraph.parsing
 
 import Grammars
-import Grammars.shortS2PParikhMap
 import ai.hypergraph.kaliningraph.*
 import ai.hypergraph.kaliningraph.automata.*
 import ai.hypergraph.kaliningraph.repair.vanillaS2PCFG
@@ -320,7 +319,7 @@ class BarHillelTest {
     val toRepair = origStr.tokenizeByWhitespace()
     val levDist = 2
     val levBall = makeLevFSA(toRepair, levDist)
-    println(levBall.states.size)
+    println(levBall.numStates)
 //  println(levBall.toDot())
 //  throw Exception("")
     val intGram = gram.intersectLevFSA(levBall)
@@ -462,4 +461,16 @@ class BarHillelTest {
 
     assertEquals(overwrittenRepairs, allTriples)
   }
+
+  /*
+  ./gradlew jvmTest --tests "ai.hypergraph.kaliningraph.parsing.BarHillelTest.matrixLBHTest"
+  */
+  @Test
+  fun matrixLBHTest() {
+    val str = "NAME ( STRING . NAME ( ( NAME & NAME ) ) or STRING NEWLINE"
+    println(str.tokenizeByWhitespace().size)
+
+    measureTimedValue { FSA.levIntersect(str, vanillaS2PCFG, 3) }
+      .also { println("${it.value} / ${it.duration}") }
+  }
 }