diff --git a/latex/thesis/Thesis.pdf b/latex/thesis/Thesis.pdf index 524d8839..9ed80d38 100644 Binary files a/latex/thesis/Thesis.pdf and b/latex/thesis/Thesis.pdf differ diff --git a/latex/thesis/content/Ch0_Literature_Review.tex b/latex/thesis/content/Ch0_Literature_Review.tex index 2ae85625..576014c2 100644 --- a/latex/thesis/content/Ch0_Literature_Review.tex +++ b/latex/thesis/content/Ch0_Literature_Review.tex @@ -23,9 +23,9 @@ \section{Semantic program repair} Automated program repair is either implicitly or explicitly defined over a \textit{search space}, which is the space of all possible solutions. Previously, we looked at a very coarse-grained approximation, based on syntactic validity. In practice, one might wish to layer additional refinements on top of these syntactic constraints, corresponding to so-called \textit{semantic} properties such as type-soundness or well-formedness. This additional criteria lets us \textit{prune} invalid solutions or \textit{quotient} the search space by an equivalence relation, often vastly reducing the set of candidate repairs. -Semantically valid programs are always subsets of syntactically valid ones: syntax overapproximates semantics. What is meant by ``semantics'' typically implicates some degree of \textit{context-sensitivity}. A rough analogy follows: syntax is to semantics as parsers are to compilers. Compilers accept fewer programs than parsers -- like compilers, semantics is somehow more expressive than syntax -- beyond the reach of context-free grammars. This is the enigma of freedom and power in formal languages: by sacrificing contextual freedom and thereby restricting the language, we gain expressive power. By increasing freedom, we lose expressive power. +Semantically valid programs are always subsets of syntactically valid ones: syntax overapproximates semantics. What is meant by ``semantics'' typically incorporates some degree of \textit{context-sensitivity}. A rough analogy holds between syntax and semantics as between parsing and compilation: like compilers, semantics is more expressive than syntax, yet compilers accept fewer programs than parsers, and their semantics are inexpressible as context-free grammars. This is the enigma of freedom and power in formal languages: by sacrificing contextual freedom and thereby restricting the language, one gains expressive power. By increasing freedom, one loses expressive power. -This analogy breaks down in certain cases, as the syntax of a programming language may not even be context-free, whereby syntax repair may be viewed as a kind of semantic repair. The C/C++~\cite{mcpeak2004elkhound} language, for example, implements a so-called lexer-hack, introducing type names into the parser's symbol table. Though generally considered in poor taste from a language design perspective, handling these kinds of scenarios is important for building practical developer tools. +This analogy breaks down in certain cases, as the syntax of a programming language may not even be context-free, whereby syntax repair may be viewed as a kind of semantic repair. The C/C++~\cite{mcpeak2004elkhound} language, for example, implements a so-called lexer-hack, introducing type names into the parser's symbol table. Though generally considered in poor taste from a language design perspective, handling these kinds of scenarios becomes important for building practical developer tools. One approach to handling more complex synthesis tasks uses \textit{angelic execution} to generate and optimistically evaluate execution paths. Shi et al.'s FrAngel~\cite{shi2019frangel} is a particular example of this approach, which uses component-based program synthesis in conjunction with angelic nondeterminism to repair a broken program. The idea of angelic execution can be retraced to Bod\'ik et al.~\cite{bodik2010programming} who attribute the original idea to Floyd's nondeterministic \texttt{choice} operator~\cite{floyd1967nondeterministic}. In the context of semantic program repair, angelic execution has been successfully developed for program synthesis by Singh et al.~\cite{singh2013automated} for auto-grading and providing feedback on programming assignments. diff --git a/src/commonMain/kotlin/ai/hypergraph/kaliningraph/automata/FSA.kt b/src/commonMain/kotlin/ai/hypergraph/kaliningraph/automata/FSA.kt index a05a8bf0..9b2dc27a 100644 --- a/src/commonMain/kotlin/ai/hypergraph/kaliningraph/automata/FSA.kt +++ b/src/commonMain/kotlin/ai/hypergraph/kaliningraph/automata/FSA.kt @@ -4,6 +4,7 @@ import ai.hypergraph.kaliningraph.* import ai.hypergraph.kaliningraph.graphs.* import ai.hypergraph.kaliningraph.parsing.* import ai.hypergraph.kaliningraph.types.* +import kotlin.lazy import kotlin.random.Random typealias Arc = Π3A<Σᐩ> @@ -18,10 +19,42 @@ fun STC.coords() = π2 to π3 open class FSA(open val Q: TSA, open val init: Set<Σᐩ>, open val final: Set<Σᐩ>) { open val alphabet by lazy { Q.map { it.π2 }.toSet() } val isNominalizable by lazy { alphabet.any { it.startsWith("[!=]") } } - val nominalForm: NOM by lazy { nominalize() } - val states by lazy { Q.states } - val stateLst by lazy { states.toList() } - val stateMap by lazy { states.withIndex().associate { it.value to it.index } } + val nominalForm: NOM by lazy { nominalize() } // Converts FSA to nominal form + + val transit: Map<Σᐩ, List>> by lazy { + Q.groupBy { it.π1 }.mapValues { (_, v) -> v.map { it.π2 to it.π3 } } + } + val revtransit: Map<Σᐩ, List>> by lazy { + Q.groupBy { it.π3 }.mapValues { (_, v) -> v.map { it.π2 to it.π1 } } + } + + val states: Set<Σᐩ> by lazy { Q.states } + // States, in a topological order + val stateLst: List<Σᐩ> by lazy { + val visited = mutableSetOf<Σᐩ>() + val topSort = mutableListOf<Σᐩ>() + val stack = init.toMutableList() + + fun dfs(state: Σᐩ) { + if (state !in visited) { + visited.add(state) + topSort.add(state) + transit[state]?.forEach { (_, nextState) -> dfs(nextState) } + stack.add(state) + } + } + + while (stack.isNotEmpty()) dfs(stack.removeLast()) + topSort + } + + fun allIndexedTxs(cfg: CFG): List<Π3A> = + (cfg.unitProductions * nominalForm.flattenedTriples).filter { (_, σ: Σᐩ, arc) -> (arc.π2)(σ) } + .map { (A, _, arc) -> Triple(stateMap[arc.π1]!!, cfg.ntMap[A]!!, stateMap[arc.π3]!!) } + + val numStates: Int by lazy { states.size } + + val stateMap: Map<Σᐩ, Int> by lazy { stateLst.withIndex().associate { it.value to it.index } } // Index of every state pair of states the FSA to the shortest path distance between them val APSP: Map, Int> by lazy { graph.APSP.map { (k, v) -> @@ -30,13 +63,14 @@ open class FSA(open val Q: TSA, open val init: Set<Σᐩ>, open val final: Set< }.toMap() } - val transit: Map<Σᐩ, List>> by lazy { - Q.groupBy { it.π1 }.mapValues { (_, v) -> v.map { it.π2 to it.π3 } } - } - val revtransit: Map<Σᐩ, List>> by lazy { - Q.groupBy { it.π3 }.mapValues { (_, v) -> v.map { it.π2 to it.π1 } } + val allPairs: Map, Set> by lazy { + graph.allPairs.entries.associate { (a, b) -> + Pair(Pair(stateMap[a.first.label]!!, stateMap[a.second.label]!!), b.map { stateMap[it.label]!! }.toSet()) + } } + val finalIdxs by lazy { final.map { stateMap[it]!! } } + val stateCoords: Sequence by lazy { states.map { it.coords().let { (i, j) -> Triple(stateMap[it]!!, i, j) } }.asSequence() } var height = 0 var width = 0 @@ -53,9 +87,9 @@ open class FSA(open val Q: TSA, open val init: Set<Σᐩ>, open val final: Set< fun Π3A.isValidStateTriple(): Boolean = first.coords().dominates(second.coords()) && - second.coords().dominates(third.coords()) + second.coords().dominates(third.coords()) - val edgeLabels by lazy { + val edgeLabels: Map, Σᐩ> by lazy { Q.groupBy { (a, b, c) -> a to c } .mapValues { (_, v) -> v.map { it.π2 }.toSet().joinToString(",") } } @@ -95,6 +129,28 @@ open class FSA(open val Q: TSA, open val init: Set<Σᐩ>, open val final: Set< // } // } + companion object { + fun levIntersect(str: Σᐩ, cfg: CFG, radius: Int): Boolean { + val levFSA = makeLevFSA(str, radius) + + val dp = Array(levFSA.numStates) { Array(levFSA.numStates) { BooleanArray(cfg.nonterminals.size) { false } } } + + levFSA.allIndexedTxs(cfg).forEach { (q0, nt, q1) -> dp[q0][q1][nt] = true } + + for (p in 0 until levFSA.numStates) + for (q in p+1 until levFSA.numStates) + for ((w, x, z) in cfg.tripIntProd) // w -> xz + if (!dp[p][q][w]) + for (r in levFSA.allPairs[p to q] ?: emptySet()) + if (dp[p][r][x] && dp[r][q][z]) { + dp[p][q][w] = true + break + } + + return levFSA.finalIdxs.any { f -> dp[0][f][cfg.bindex[START_SYMBOL]] } + } + } + fun walk(from: Σᐩ, next: (Σᐩ, List<Σᐩ>) -> Int): List<Σᐩ> { val startVtx = from val path = mutableListOf<Σᐩ>() diff --git a/src/commonMain/kotlin/ai/hypergraph/kaliningraph/automata/Nominal.kt b/src/commonMain/kotlin/ai/hypergraph/kaliningraph/automata/Nominal.kt index e0e89049..f9ddd49b 100644 --- a/src/commonMain/kotlin/ai/hypergraph/kaliningraph/automata/Nominal.kt +++ b/src/commonMain/kotlin/ai/hypergraph/kaliningraph/automata/Nominal.kt @@ -24,7 +24,7 @@ class NOM(override val Q: TSA, override val init: Set<Σᐩ>, override val final .mapValues { (_, v) -> v.map { it.second to it.third } } } - val flattenedTriples by lazy { + val flattenedTriples: Set> by lazy { Q.map { (a, b, c) -> a to b.predicate() to c }.toSet() } diff --git a/src/commonMain/kotlin/ai/hypergraph/kaliningraph/parsing/BarHillel.kt b/src/commonMain/kotlin/ai/hypergraph/kaliningraph/parsing/BarHillel.kt index 82af2f3a..7b240f96 100644 --- a/src/commonMain/kotlin/ai/hypergraph/kaliningraph/parsing/BarHillel.kt +++ b/src/commonMain/kotlin/ai/hypergraph/kaliningraph/parsing/BarHillel.kt @@ -48,10 +48,10 @@ fun CFG.intersectLevFSAP(fsa: FSA, parikhMap: ParikhMap = this.parikhMap): CFG { val validTriples = fsa.validTriples.map { arrayOf(it.π1.π1, it.π2.π1, it.π3.π1) }.toTypedArray() val ct = (fsa.validPairs * nonterminals.indices.toSet()).toList() -// val ct1 = Array(fsa.states.size) { Array(nonterminals.size) { Array(fsa.states.size) { false } } } +// val ct1 = Array(fsa.numStates) { Array(nonterminals.size) { Array(fsa.numStates) { false } } } // ct.filter { lengthBoundsCache[it.π3].overlaps(fsa.SPLP(it.π1, it.π2)) } // .forEach { ct1[it.π1.π1][it.π3][it.π2.π1] = true } - val ct2 = Array(fsa.states.size) { Array(nonterminals.size) { Array(fsa.states.size) { false } } } + val ct2 = Array(fsa.numStates) { Array(nonterminals.size) { Array(fsa.numStates) { false } } } ct.filter { fsa.obeys(it.π1, it.π2, it.π3, parikhMap) } .forEach { ct2[it.π1.π1][it.π3][it.π2.π1] = true } @@ -173,6 +173,7 @@ fun CFG.dropVestigialProductions( return if (rw.size == size) rw else rw.dropVestigialProductions(criteria) } + // Generic Bar-Hillel construction for arbitrary CFL ∩ REG language infix fun FSA.intersect(cfg: CFG) = cfg.freeze().intersect(this) @@ -191,9 +192,7 @@ infix fun CFG.intersect(fsa: FSA): CFG { nonterminalProductions.mapIndexed { i, it -> val triples = fsa.states * fsa.states * fsa.states val (A, B, C) = it.π1 to it.π2[0] to it.π2[1] - triples.map { (p, q, r) -> - "[$p~$A~$r]" to listOf("[$p~$B~$q]", "[$q~$C~$r]") - } + triples.map { (p, q, r) -> "[$p~$A~$r]" to listOf("[$p~$B~$q]", "[$q~$C~$r]") } }.flatten() return (initFinal + binaryProds + unitProds).toSet().postProcess() diff --git a/src/commonMain/kotlin/ai/hypergraph/kaliningraph/parsing/CFG.kt b/src/commonMain/kotlin/ai/hypergraph/kaliningraph/parsing/CFG.kt index 28c33024..3a12307a 100644 --- a/src/commonMain/kotlin/ai/hypergraph/kaliningraph/parsing/CFG.kt +++ b/src/commonMain/kotlin/ai/hypergraph/kaliningraph/parsing/CFG.kt @@ -60,6 +60,9 @@ val CFG.unicodeMap by cache { terminals.associateBy { Random(it.hashCode()).next val CFG.ntLst by cache { (symbols + "ε").toList() } val CFG.ntMap by cache { ntLst.mapIndexed { i, s -> s to i }.toMap() } +val CFG.tripIntProd: Set<Π3A> by cache { filter { it.RHS.size == 2 }.map { bindex[it.LHS] to bindex[it.RHS[0]] to bindex[it.RHS[1]] }.toSet() } +val CFG.revUnitProds: Map<Σᐩ, List> by cache { terminals.associate { it to bimap[listOf(it)].map { bindex[it] } } } + // Maps each nonterminal to the set of nonterminal pairs that can generate it, // which is then flattened to a list of adjacent pairs of nonterminal indices val CFG.vindex: Array by cache { @@ -234,6 +237,7 @@ class JoinMap(val CFG: CFG) { } // Maps indices to nonterminals and nonterminals to indices +// TODO: Would be nice if START had a zero index (requires rebuilding caches) class Bindex( val set: Set, val indexedNTs: List = set.toList(), diff --git a/src/commonMain/kotlin/ai/hypergraph/kaliningraph/parsing/Parikh.kt b/src/commonMain/kotlin/ai/hypergraph/kaliningraph/parsing/Parikh.kt index 64ea3e60..56326226 100644 --- a/src/commonMain/kotlin/ai/hypergraph/kaliningraph/parsing/Parikh.kt +++ b/src/commonMain/kotlin/ai/hypergraph/kaliningraph/parsing/Parikh.kt @@ -70,7 +70,7 @@ class ParikhMap(val cfg: CFG, val size: Int, reconstruct: Boolean = true) { fun deserializePM(str: String): Map = str.lines().map { it.split(" ") }.groupBy { it.first().toInt() } .mapValues { (_, v) -> - v.map { it[1] to it.drop(3).chunked(3).map { it[0] to (it[1].toInt()..it[2].toInt()) }.toMap() }.toMap() + v.associate { it[1] to it.drop(3).chunked(3).associate { it[0] to (it[1].toInt()..it[2].toInt()) } } } fun deserialize(cfg: CFG, str: String): ParikhMap { diff --git a/src/commonMain/kotlin/ai/hypergraph/kaliningraph/types/Graph.kt b/src/commonMain/kotlin/ai/hypergraph/kaliningraph/types/Graph.kt index 112fa96c..477d8a99 100644 --- a/src/commonMain/kotlin/ai/hypergraph/kaliningraph/types/Graph.kt +++ b/src/commonMain/kotlin/ai/hypergraph/kaliningraph/types/Graph.kt @@ -207,6 +207,18 @@ val , E: IEdge, V: IVertex> IGraph dist } +// AllPairs[p, q] is the set of all vertices, r, such that p ->* r ->* q +val , E: IEdge, V: IVertex> IGraph.allPairs: Map, Set> by cache { + // All vertices reachable from v + val forward: Map> = vertices.associateWith { v -> transitiveClosure(setOf(v)) } + + // AAll vertices that can reach v (reachable from v in reversed graph) + val backward: Map> = reversed().let { it.vertices.associateWith { v -> it.transitiveClosure(setOf(v)) } } + + // For every pair (p, q), collect all vertices r that lie on some path p ->* r ->* q + vertices.flatMap { p -> vertices.map { q -> Pair(Pair(p, q), (forward[p]!! intersect backward[q]!!)) } }.filter { it.second.isNotEmpty() }.toMap() +} + val , E: IEdge, V: IVertex> IGraph.degMap: Map by cache { vertices.associateWith { it.neighbors.size } } val , E: IEdge, V: IVertex> IGraph.edges: Set by cache { edgMap.values.flatten().toSet() } val , E: IEdge, V: IVertex> IGraph.edgList: List<Π2> by cache { vertices.flatMap { s -> s.outgoing.map { s to it } } } @@ -312,6 +324,7 @@ val , E: IEdge, V: IVertex> IVertex, E: IEdge, V: IVertex> IVertex.neighbors: Set by cache { outgoing.map { it.target }.toSet() } val , E: IEdge, V: IVertex> IVertex.outdegree: Int get() = neighbors.size + abstract class AGF : IGF where G : IGraph, E : IEdge, V : IVertex { override val deepHashCode: Int = Random.nextInt() diff --git a/src/commonTest/kotlin/ai/hypergraph/kaliningraph/parsing/BarHillelTest.kt b/src/commonTest/kotlin/ai/hypergraph/kaliningraph/parsing/BarHillelTest.kt index 6cd808d8..a0745c0d 100644 --- a/src/commonTest/kotlin/ai/hypergraph/kaliningraph/parsing/BarHillelTest.kt +++ b/src/commonTest/kotlin/ai/hypergraph/kaliningraph/parsing/BarHillelTest.kt @@ -1,7 +1,6 @@ package ai.hypergraph.kaliningraph.parsing import Grammars -import Grammars.shortS2PParikhMap import ai.hypergraph.kaliningraph.* import ai.hypergraph.kaliningraph.automata.* import ai.hypergraph.kaliningraph.repair.vanillaS2PCFG @@ -320,7 +319,7 @@ class BarHillelTest { val toRepair = origStr.tokenizeByWhitespace() val levDist = 2 val levBall = makeLevFSA(toRepair, levDist) - println(levBall.states.size) + println(levBall.numStates) // println(levBall.toDot()) // throw Exception("") val intGram = gram.intersectLevFSA(levBall) @@ -462,4 +461,16 @@ class BarHillelTest { assertEquals(overwrittenRepairs, allTriples) } + + /* + ./gradlew jvmTest --tests "ai.hypergraph.kaliningraph.parsing.BarHillelTest.matrixLBHTest" + */ + @Test + fun matrixLBHTest() { + val str = "NAME ( STRING . NAME ( ( NAME & NAME ) ) or STRING NEWLINE" + println(str.tokenizeByWhitespace().size) + + measureTimedValue { FSA.levIntersect(str, vanillaS2PCFG, 3) } + .also { println("${it.value} / ${it.duration}") } + } } \ No newline at end of file diff --git a/src/jvmMain/kotlin/ai/hypergraph/kaliningraph/parsing/JVMBarHillel.kt b/src/jvmMain/kotlin/ai/hypergraph/kaliningraph/parsing/JVMBarHillel.kt index 13411dac..99b0cd23 100644 --- a/src/jvmMain/kotlin/ai/hypergraph/kaliningraph/parsing/JVMBarHillel.kt +++ b/src/jvmMain/kotlin/ai/hypergraph/kaliningraph/parsing/JVMBarHillel.kt @@ -10,6 +10,8 @@ import kotlin.streams.* import kotlin.time.Duration.Companion.minutes import kotlin.time.TimeSource import java.util.concurrent.ConcurrentHashMap +import java.util.concurrent.ConcurrentLinkedQueue +import java.util.concurrent.atomic.LongAdder import kotlin.collections.asSequence fun CFG.parallelEnumSeqMinimalWOR( @@ -192,7 +194,7 @@ fun CFG.jvmIntersectLevFSAP(fsa: FSA, parikhMap: ParikhMap = this.parikhMap): CF var clock = TimeSource.Monotonic.markNow() // Tracks all nonterminals constructed on the left hand side of a synthetic production - val ntsb = Array(fsa.states.size) { Array(symbols.size) { Array(fsa.states.size) { false } } } + val ntsb = Array(fsa.numStates) { Array(symbols.size) { Array(fsa.numStates) { false } } } val initFinal = (fsa.init * fsa.final).map { (q, r) -> listOf(ntMap["START"]!!) to listOf(listOf(fsa.stateMap[q]!!, ntMap["START"]!!, fsa.stateMap[r]!!)) } @@ -214,7 +216,7 @@ fun CFG.jvmIntersectLevFSAP(fsa: FSA, parikhMap: ParikhMap = this.parikhMap): CF val ctClock = TimeSource.Monotonic.markNow() val ct = (fsa.validPairs * nonterminals.indices.toSet()).toList() - val ct2 = Array(fsa.states.size) { Array(nonterminals.size) { Array(fsa.states.size) { false } } } + val ct2 = Array(fsa.numStates) { Array(nonterminals.size) { Array(fsa.numStates) { false } } } ct.parallelStream() .filter { it: Π3 -> // Checks whether the distinct subtrajectory between two horizontal states is parseable by a given NT @@ -228,7 +230,7 @@ fun CFG.jvmIntersectLevFSAP(fsa: FSA, parikhMap: ParikhMap = this.parikhMap): CF && fsa.obeys(it.π1, it.π2, it.π3, parikhMap) } // .toList().also { -// val candidates = (fsa.states.size * nonterminals.size * fsa.states.size) +// val candidates = (fsa.numStates * nonterminals.size * fsa.numStates) // val fraction = it.size.toDouble() / candidates // println("Fraction of valid LBH triples: ${it.size}/$candidates ≈ $fraction") // } @@ -278,6 +280,7 @@ fun CFG.jvmIntersectLevFSAP(fsa: FSA, parikhMap: ParikhMap = this.parikhMap): CF .map { (l, r) -> l.toNT() to r.map { it.toNT() } } .collect(Collectors.toSet()) .also { println("Eliminated ${totalProds - it.size} extra productions before normalization") } +// .also { it.jdvpNew() } .jvmPostProcess(clock) .also { normMs += clock.elapsedNow().inWholeMilliseconds @@ -291,98 +294,6 @@ fun CFG.jvmPostProcess(clock: TimeSource.Monotonic.ValueTimeMark) = jvmDropVestigialProductions(clock) .also { println("Normalization eliminated ${size - it.size} productions in ${clock.elapsedNow()}") } -// Eliminates unit productions whose RHS is not a terminal. For Bar-Hillel intersections, we know the only -// examples of this are the (S -> *) rules, so elimination is much simpler than the full CNF normalization. -fun jvmElimVarUnitProds(cfg: CFG): CFG { - val scfg = cfg.asSequence() - val vars = scfg.asStream().parallel().map { it.first }.collect(Collectors.toSet()) - val toElim = scfg.asStream().parallel() - .filter { it.RHS.size == 1 && it.LHS == "START" && it.RHS[0] in vars } - .map { it.RHS[0] } - .collect(Collectors.toSet()) - val newCFG = scfg.asStream().parallel() - .filter { it.RHS.size > 1 || it.RHS[0] !in toElim } - .map { if (it.LHS in toElim) "START" to it.RHS else it } - .collect(Collectors.toSet()) - return newCFG -} - -// TODO: Incomplete / untested -// Based on: https://zerobone.net/blog/cs/non-productive-cfg-rules/ -// Precondition: The CFG must be binarized, i.e., almost CNF but may have useless productions -// Postcondition: The CFG is in Chomsky Normal Form (CNF) -//fun CFG.jdvpNew(): CFG { -// println("Total productions: $size") -// val timer = TimeSource.Monotonic.markNow() -// val counter = ConcurrentHashMap, LongAdder>() -// -// // Maps each nonterminal to the set of RHS sets that contain it -// val UDEPS = ConcurrentHashMap<Σᐩ, ConcurrentLinkedQueue>>(size) -// // Maps the set of symbols on the RHS of a production to the production -// val NDEPS = ConcurrentHashMap, ConcurrentLinkedQueue>(size).apply { -// put(emptySet(), ConcurrentLinkedQueue()) -// this@jdvpNew.asSequence().asStream().parallel().forEach { -// val v = it.second.toSet() // RHS set, i.e., the set of NTs on the RHS of a production -// // If |v| is 1, then the production must be a unit production, i.e, A -> a, b/c A -> B is not binarized -// getOrPut(if(it.second.size == 1) emptySet() else v) { ConcurrentLinkedQueue() }.add(it) -// v.forEach { s -> UDEPS.getOrPut(s) { ConcurrentLinkedQueue() }.add(v) } -// if (v.size == 2) counter.putIfAbsent(v, LongAdder().apply { add(2L) }) -// } -// } -// -// println("Built graph in ${timer.elapsedNow()}: ${counter.size} conjuncts, ${UDEPS.size + NDEPS.size} edges") -// -// val nextReachable: LinkedHashSet> = LinkedHashSet>().apply { add(emptySet()) } -// -// val productive = mutableSetOf() -// do { -//// println("Next reachable: ${nextReachable.size}, Productive: ${productive.size}") -// val q = nextReachable.removeFirst() -// if (counter[q]?.sum() == 0L || NDEPS[q]?.all { it in productive } == true) continue -// else if (q.size == 2) { // Conjunct -// val dec = counter[q]!!.apply { decrement() } -// if (dec.sum() == 0L) { // Seen both -// NDEPS[q]?.forEach { -// productive.add(it) -// UDEPS[it.LHS]?.forEach { st -> if (st !in productive) nextReachable.addLast(st) } -// } -// } else nextReachable.addLast(q) // Always add back if sum not zero -// } else { -// NDEPS[q]?.forEach { -// productive.add(it) -// UDEPS[it.LHS]?.forEach { st -> if (st !in productive) nextReachable.addLast(st) } -// } -// } -// } while (nextReachable.isNotEmpty()) -// -// println("Eliminated ${size - productive.size} unproductive productions in ${timer.elapsedNow()}") -// println("Resulting in ${productive.size} productions.") -// -// val QDEPS = -// ConcurrentHashMap<Σᐩ, ConcurrentLinkedQueue>(size).apply { -// productive.asSequence().asStream().parallel().forEach { -// getOrPut(it.LHS) { ConcurrentLinkedQueue() }.add(it) -// } -// } -// -// val done = mutableSetOf(START_SYMBOL) -// val nextProd: MutableList<Σᐩ> = mutableListOf(START_SYMBOL) -// val productiveAndReachable = mutableSetOf() -// -// do { -// val q = nextProd.removeFirst().also { done += it } -// QDEPS[q]?.forEach { it -> -// productiveAndReachable.add(it) -// it.RHS.forEach { if (it !in done) nextProd += it } -// } -// } while (nextProd.isNotEmpty()) -// -// println("Eliminated ${productive.size - productiveAndReachable.size} unreachable productions in ${timer.elapsedNow()}") -// println("Resulting in ${productiveAndReachable.size} productions.") -// -// return productiveAndReachable.freeze() -//} - fun CFG.jvmDropVestigialProductions(clock: TimeSource.Monotonic.ValueTimeMark): CFG { val start = clock.elapsedNow() var counter = 0 @@ -404,6 +315,22 @@ fun CFG.jvmDropVestigialProductions(clock: TimeSource.Monotonic.ValueTimeMark): else rw.jvmDropVestigialProductions(clock) } +// Eliminates unit productions whose RHS is not a terminal. For Bar-Hillel intersections, we know the only +// examples of this are the (S -> *) rules, so elimination is much simpler than the full CNF normalization. +fun jvmElimVarUnitProds(cfg: CFG): CFG { + val scfg = cfg.asSequence() + val vars = scfg.asStream().parallel().map { it.first }.collect(Collectors.toSet()) + val toElim = scfg.asStream().parallel() + .filter { it.RHS.size == 1 && it.LHS == "START" && it.RHS[0] in vars } + .map { it.RHS[0] } + .collect(Collectors.toSet()) + val newCFG = scfg.asStream().parallel() + .filter { it.RHS.size > 1 || it.RHS[0] !in toElim } + .map { if (it.LHS in toElim) "START" to it.RHS else it } + .collect(Collectors.toSet()) + return newCFG +} + /** * Eliminate all non-generating and unreachable symbols. * @@ -483,4 +410,80 @@ private fun CFG.jvmGenSym( // println("START: ${START_SYMBOL in allGenerating} ${allGenerating.size}") return allGenerating -} \ No newline at end of file +} + +// TODO: Incomplete / untested +// Based on: https://zerobone.net/blog/cs/non-productive-cfg-rules/ +// Precondition: The CFG must be binarized, i.e., almost CNF but may have useless productions +// Postcondition: The CFG is in Chomsky Normal Form (CNF) +//fun CFG.jdvpNew(): CFG { +// println("Total productions: $size") +// val timer = TimeSource.Monotonic.markNow() +// val counter = ConcurrentHashMap, LongAdder>() +// +// // Maps each nonterminal to the set of RHS sets that contain it +// val UDEPS = ConcurrentHashMap<Σᐩ, ConcurrentLinkedQueue>>(size) +// // Maps the set of symbols on the RHS of a production to the production +// val NDEPS = ConcurrentHashMap, ConcurrentLinkedQueue>(size).apply { +// put(emptySet(), ConcurrentLinkedQueue()) +// this@jdvpNew.asSequence().asStream().parallel().forEach { +// val v = it.second.toSet() // RHS set, i.e., the set of NTs on the RHS of a production +// // If |v| is 1, then the production must be a unit production, i.e, A -> a, b/c A -> B is not binarized +// getOrPut(if(it.second.size == 1) emptySet() else v) { ConcurrentLinkedQueue() }.add(it) +// v.forEach { s -> UDEPS.getOrPut(s) { ConcurrentLinkedQueue() }.add(v) } +// if (v.size == 2) counter.putIfAbsent(v, LongAdder().apply { add(2L) }) +// } +// } +// +// println("Built graph in ${timer.elapsedNow()}: ${counter.size} conjuncts, ${UDEPS.size + NDEPS.size} edges") +// +// val nextReachable: LinkedHashSet> = LinkedHashSet>().apply { add(emptySet()) } +// +// val productive = mutableSetOf() +// do { +//// println("Next reachable: ${nextReachable.size}, Productive: ${productive.size}") +// val q = nextReachable.removeFirst() +// if (counter[q]?.sum() == 0L || NDEPS[q]?.all { it in productive } == true) continue +// else if (q.size == 2) { // Conjunct +// val dec = counter[q]!!.apply { decrement() } +// if (dec.sum() == 0L) { // Seen both +// NDEPS[q]?.forEach { +// productive.add(it) +// UDEPS[it.LHS]?.forEach { st -> if (st !in productive) nextReachable.addLast(st) } +// } +// } else nextReachable.addLast(q) // Always add back if sum not zero +// } else { +// NDEPS[q]?.forEach { +// productive.add(it) +// UDEPS[it.LHS]?.forEach { st -> if (st !in productive) nextReachable.addLast(st) } +// } +// } +// } while (nextReachable.isNotEmpty()) +// +// println("Eliminated ${size - productive.size} unproductive productions in ${timer.elapsedNow()}") +// println("Resulting in ${productive.size} productions.") +// +// val QDEPS = +// ConcurrentHashMap<Σᐩ, ConcurrentLinkedQueue>(size).apply { +// productive.asSequence().asStream().parallel().forEach { +// getOrPut(it.LHS) { ConcurrentLinkedQueue() }.add(it) +// } +// } +// +// val done = mutableSetOf(START_SYMBOL) +// val nextProd: MutableList<Σᐩ> = mutableListOf(START_SYMBOL) +// val productiveAndReachable = mutableSetOf() +// +// do { +// val q = nextProd.removeFirst().also { done += it } +// QDEPS[q]?.forEach { it -> +// productiveAndReachable.add(it) +// it.RHS.forEach { if (it !in done) nextProd += it } +// } +// } while (nextProd.isNotEmpty()) +// +// println("Eliminated ${productive.size - productiveAndReachable.size} unreachable productions in ${timer.elapsedNow()}") +// println("Resulting in ${productiveAndReachable.size} productions.") +// +// return productiveAndReachable.freeze() +//} \ No newline at end of file