A number of improvements

- HashMap and HashSet have been replaced by the implementation from antlr4ts. This required the addition of some comparators. On the other hand the stardard hash + equals helper functions could go. - This improved cold start times in some cases significantly. - RuleContext.toStringTree now accepts null for the ruleNames and Parser parameters. - Updated the MySQL test parser grammar and test statements for latest server version. - Fixed configuration for generated tests cases so that tsc no longer complains about the code. Signed-off-by: Mike Lischke <[email protected]>
mike-lischke · Feb 15, 2024 · 3c048c0 · 3c048c0
1 parent f2df072
commit 3c048c0
Show file tree

Hide file tree

Showing 35 changed files with 1,026 additions and 661 deletions.
diff --git a/ReadMe.md b/ReadMe.md
@@ -152,14 +152,14 @@ Last release (pure TypeScript):
 
 | Test | Cold Run | Warm Run|
 | ---- | -------- | ------- |
-| Query Collection| 56252 ms | 326 ms |
-| Example File | 1010 ms | 195 ms |
-| Large Inserts | 13580 ms | 13677 ms |
-| Total | 20359 ms | 14238 ms |
+| Query Collection| 3230 ms | 311 ms |
+| Example File | 459 ms | 185 ms |
+| Large Inserts | 13092 ms | 13063 ms |
+| Total | 16739 ms | 13584 ms |
 
 The benchmarks consist of a set of query files, which are parsed by a MySQL parser. The MySQL grammar is one of the largest and most complex grammars you can find for ANTLR4, which, I think, makes it a perfect test case for parser tests.
 
-The query collection file contains more than 900 MySQL queries of all kinds, from very simple comments-only statements to complex stored procedures, including some deeply nested select queries that can easily exhaust the available stack space (in certain situations, such as parsing in a thread with default stack size). The minimum MySQL server version used was 8.0.0.
+The query collection file contains more than 900 MySQL queries of all kinds, from very simple comments-only statements to complex stored procedures, including some deeply nested select queries that can easily exhaust the available stack space (in certain situations, such as parsing in a thread with default stack size). The minimum MySQL server version used was 8.2.
 
 The large binary inserts file contains only a few dozen queries, but they are really large with deep recursions, so they stress the prediction engine of the parser. In addition, one query contains binary (image) data containing input characters from the entire UTF-8 range.
 
@@ -172,19 +172,19 @@ Since the Java runtime tests have been ported to TypeScript there's another set
 The original Java execution times have been taken on OS X with a 4 GHz Intel Core i7 (Java VM args: `-Xms2G -Xmx8g`):
 
 ```bash
-                loadNewUTF8 average time   356µs size  29191b over 3500 loads of 29191 symbols from Parser.java
-                loadNewUTF8 average time    75µs size   7552b over 3500 loads of  7552 symbols from RuleContext.java
-                loadNewUTF8 average time   122µs size  31784b over 3500 loads of 13379 symbols from udhr_hin.txt
-
-             lexNewJavaUTF8 average time   641µs over 2000 runs of 29191 symbols
-             lexNewJavaUTF8 average time  4987µs over 2000 runs of 29191 symbols DFA cleared
-
-         lexNewGraphemeUTF8 average time 13537µs over  400 runs of  6614 symbols from udhr_kor.txt
-         lexNewGraphemeUTF8 average time 13802µs over  400 runs of  6614 symbols from udhr_kor.txt DFA cleared
-         lexNewGraphemeUTF8 average time 18762µs over  400 runs of 13379 symbols from udhr_hin.txt
-         lexNewGraphemeUTF8 average time 18925µs over  400 runs of 13379 symbols from udhr_hin.txt DFA cleared
-         lexNewGraphemeUTF8 average time   340µs over  400 runs of    85 symbols from emoji.txt
-         lexNewGraphemeUTF8 average time   401µs over  400 runs of    85 symbols from emoji.txt DFA cleared
+                load_new_utf8 average time   232us size 131232b over 3500 loads of 29038 symbols from Parser.java
+                load_new_utf8 average time    69us size  32928b over 3500 loads of  7625 symbols from RuleContext.java
+                load_new_utf8 average time   210us size  65696b over 3500 loads of 13379 symbols from udhr_hin.txt
+
+            lex_new_java_utf8 average time   439us over 2000 runs of 29038 symbols
+            lex_new_java_utf8 average time   969us over 2000 runs of 29038 symbols DFA cleared
+
+        lex_new_grapheme_utf8 average time  4034us over  400 runs of  6614 symbols from udhr_kor.txt
+        lex_new_grapheme_utf8 average time  4173us over  400 runs of  6614 symbols from udhr_kor.txt DFA cleared
+        lex_new_grapheme_utf8 average time  7680us over  400 runs of 13379 symbols from udhr_hin.txt
+        lex_new_grapheme_utf8 average time  7946us over  400 runs of 13379 symbols from udhr_hin.txt DFA cleared
+        lex_new_grapheme_utf8 average time    70us over  400 runs of    85 symbols from emoji.txt
+        lex_new_grapheme_utf8 average time    82us over  400 runs of    85 symbols from emoji.txt DFA cleared
 ```
 
 The execute times on last release of this runtime have been measured as:
@@ -205,6 +205,8 @@ The execute times on last release of this runtime have been measured as:
          lexNewGraphemeUTF8 average time   387µs over  400 runs of    85 symbols from emoji.txt DFA cleared
 ```
 
+Note: some of the corpus sizes differ, because of the test restructuring. In any case, the numbers cannot be compared directly, because different machines were used to take them.
+
 ## Release Notes
 
 ### 2.0.10

diff --git a/package-lock.json b/package-lock.json
diff --git a/src/Parser.ts b/src/Parser.ts
@@ -686,7 +686,7 @@ export abstract class Parser extends Recognizer<ParserATNSimulator> {
     public dumpDFA(): void {
         let seenOne = false;
         for (const dfa of this.interpreter.decisionToDFA) {
-            if (dfa.states.length > 0) {
+            if (dfa.length > 0) {
                 if (seenOne) {
                     console.log();
                 }

diff --git a/src/RuleContext.ts b/src/RuleContext.ts
@@ -155,14 +155,14 @@ export class RuleContext implements ParseTree {
      * Print out a whole tree, not just a node, in LISP format
      * (root child1 .. childN). Print just a node if this is a leaf.
      */
-    public toStringTree(recog: Parser): string;
-    public toStringTree(ruleNames: string[], recog: Parser): string;
+    public toStringTree(recog: Parser | null): string;
+    public toStringTree(ruleNames: string[] | null, recog: Parser): string;
     public toStringTree(...args: unknown[]): string {
         if (args.length === 1) {
-            return Trees.toStringTree(this, null, args[0] as Parser);
+            return Trees.toStringTree(this, null, args[0] as Parser | null);
         }
 
-        return Trees.toStringTree(this, args[0] as string[], args[1] as Parser);
+        return Trees.toStringTree(this, args[0] as string[] | null, args[1] as Parser);
     }
 
     public toString(ruleNames?: string[] | null, stop?: RuleContext | null): string {

diff --git a/src/atn/ATNConfig.ts b/src/atn/ATNConfig.ts
@@ -170,25 +170,6 @@ export class ATNConfig {
             this.precedenceFilterSuppressed === other.precedenceFilterSuppressed;
     }
 
-    public hashCodeForConfigSet(): number {
-        let hashCode = 7;
-        hashCode = 31 * hashCode + this.state.stateNumber;
-        hashCode = 31 * hashCode + this.alt;
-        hashCode = 31 * hashCode + this.semanticContext.hashCode();
-
-        return hashCode;
-    }
-
-    public equalsForConfigSet(other: ATNConfig): boolean {
-        if (this === other) {
-            return true;
-        }
-
-        return this.state.stateNumber === other.state.stateNumber &&
-            this.alt === other.alt &&
-            this.semanticContext.equals(other.semanticContext);
-    }
-
     public toString(_recog?: Recognizer<ATNSimulator> | null, showAlt = true): string {
         let alt = "";
         if (showAlt) {

diff --git a/src/atn/ATNConfigSet.ts b/src/atn/ATNConfigSet.ts
@@ -4,12 +4,11 @@
  * can be found in the LICENSE.txt file in the project root.
  */
 
-/* eslint-disable jsdoc/require-returns, jsdoc/require-param */
+/* eslint-disable jsdoc/require-returns, jsdoc/require-param, max-classes-per-file */
 
 import { ATN } from "./ATN.js";
 import { SemanticContext } from "./SemanticContext.js";
 import { merge } from "./PredictionContextUtils.js";
-import { HashSet } from "../misc/HashSet.js";
 
 import { equalArrays, arrayToString } from "../utils/helpers.js";
 import { ATNConfig } from "./ATNConfig.js";
@@ -19,18 +18,31 @@ import { PredictionContext } from "./PredictionContext.js";
 import { ATNState } from "./ATNState.js";
 import { ATNSimulator } from "./ATNSimulator.js";
 import { MurmurHash } from "../utils/MurmurHash.js";
+import { HashSet } from "../misc/HashSet.js";
+import type { EqualityComparator } from "../misc/EqualityComparator.js";
 
-const hashATNConfig = (c: ATNConfig) => {
-    return c.hashCodeForConfigSet();
-};
+class KeyTypeEqualityComparer implements EqualityComparator<ATNConfig> {
+    public static readonly instance = new KeyTypeEqualityComparer();
 
-const equalATNConfigs = (a: ATNConfig, b: ATNConfig): boolean => {
-    if (a === b) {
-        return true;
-    } else if (a === null || b === null) {
-        return false;
-    } else { return a.equalsForConfigSet(b); }
-};
+    public hashCode(config: ATNConfig) {
+        let hashCode = 7;
+        hashCode = 31 * hashCode + config.state.stateNumber;
+        hashCode = 31 * hashCode + config.alt;
+        hashCode = 31 * hashCode + config.semanticContext.hashCode();
+
+        return hashCode;
+    }
+
+    public equals(a: ATNConfig, b: ATNConfig) {
+        if (a === b) {
+            return true;
+        }
+
+        return a.state.stateNumber === b.state.stateNumber &&
+            a.alt === b.alt &&
+            a.semanticContext.equals(b.semanticContext);
+    }
+}
 
 /**
  * Specialized {@link HashSet}`<`{@link ATNConfig}`>` that can track
@@ -50,7 +62,8 @@ export class ATNConfigSet {
      * All configs but hashed by (s, i, _, pi) not including context. Wiped out
      * when we go readonly as this set becomes a DFA state
      */
-    public configLookup: HashSet<ATNConfig> | null = new HashSet<ATNConfig>(hashATNConfig, equalATNConfigs);
+    public configLookup: HashSet<ATNConfig> | null =
+        new HashSet<ATNConfig>(KeyTypeEqualityComparer.instance);
 
     // Track the elements as they are added to the set; supports get(i).
     public configs: ATNConfig[] = [];
@@ -129,7 +142,7 @@ export class ATNConfigSet {
         }
 
         //config.useSimpleHash = true;
-        const existing = this.configLookup!.add(config);
+        const existing = this.configLookup!.getOrAdd(config);
         if (existing === config) {
             this.#cachedHashCode = -1;
             this.configs.push(config); // track order here
@@ -206,7 +219,7 @@ export class ATNConfigSet {
             throw new Error("This set is readonly");
         }
 
-        if (this.configLookup!.length === 0) {
+        if (this.configLookup!.size === 0) {
             return;
         }
 
@@ -232,11 +245,13 @@ export class ATNConfigSet {
             this.uniqueAlt === other.uniqueAlt &&
             this.conflictingAlts === other.conflictingAlts &&
             this.hasSemanticContext === other.hasSemanticContext &&
-            this.dipsIntoOuterContext === other.dipsIntoOuterContext) {
+            this.dipsIntoOuterContext === other.dipsIntoOuterContext &&
+            equalArrays(this.configs, other.configs)) {
+
             return true;
         }
 
-        return equalArrays(this.configs, other.configs);
+        return false;
     }
 
     public hashCode(): number {
@@ -264,15 +279,15 @@ export class ATNConfigSet {
             throw new Error("This method is not implemented for readonly sets.");
         }
 
-        return this.configLookup.has(item);
+        return this.configLookup.contains(item);
     }
 
     public containsFast(item: ATNConfig): boolean {
         if (this.configLookup === null) {
             throw new Error("This method is not implemented for readonly sets.");
         }
 
-        return this.configLookup.has(item);
+        return this.configLookup.contains(item);
     }
 
     public clear(): void {
@@ -281,7 +296,7 @@ export class ATNConfigSet {
         }
         this.configs = [];
         this.#cachedHashCode = -1;
-        this.configLookup = new HashSet();
+        this.configLookup = new HashSet(KeyTypeEqualityComparer.instance);
     }
 
     public setReadonly(readOnly: boolean): void {

diff --git a/src/atn/ATNSimulator.ts b/src/atn/ATNSimulator.ts
@@ -6,10 +6,11 @@
 
 import { DFAState } from "../dfa/DFAState.js";
 import { getCachedPredictionContext } from "./PredictionContextUtils.js";
-import { HashMap } from "../misc/HashMap.js";
 import { ATN } from "./ATN.js";
 import { PredictionContextCache } from "./PredictionContextCache.js";
 import { PredictionContext } from "./PredictionContext.js";
+import { HashMap } from "../misc/HashMap.js";
+import { ObjectEqualityComparator } from "../misc/ObjectEqualityComparator.js";
 
 export abstract class ATNSimulator {
     /** Must distinguish between missing edge and edge we know leads nowhere */
@@ -68,7 +69,7 @@ export abstract class ATNSimulator {
         if (this.sharedContextCache === null) {
             return context;
         }
-        const visited = new HashMap<PredictionContext, PredictionContext>();
+        const visited = new HashMap<PredictionContext, PredictionContext>(ObjectEqualityComparator.instance);
 
         return getCachedPredictionContext(context, this.sharedContextCache, visited);
     }

diff --git a/src/atn/LexerATNConfig.ts b/src/atn/LexerATNConfig.ts
@@ -74,11 +74,4 @@ export class LexerATNConfig extends ATNConfig {
 
     }
 
-    public override hashCodeForConfigSet(): number {
-        return this.hashCode();
-    }
-
-    public override equalsForConfigSet(other: LexerATNConfig): boolean {
-        return this.equals(other);
-    }
 }
diff --git a/src/atn/LexerATNSimulator.ts b/src/atn/LexerATNSimulator.ts
@@ -599,17 +599,15 @@ export class LexerATNSimulator extends ATNSimulator {
         }
 
         const dfa = this.decisionToDFA[this.mode];
-        const existing = dfa.states.get(proposed);
+        const existing = dfa.getState(proposed);
         if (existing !== null) {
             return existing;
         }
 
-        const newState = proposed;
-        newState.stateNumber = dfa.states.length;
         configs.setReadonly(true);
-        dfa.states.add(newState);
+        dfa.addState(proposed);
 
-        return newState;
+        return proposed;
     }
 }
 

diff --git a/src/atn/ParseInfo.ts b/src/atn/ParseInfo.ts
@@ -166,7 +166,7 @@ export class ParseInfo {
         } else {
             const decisionToDFA = this.atnSimulator.decisionToDFA[decision];
 
-            return decisionToDFA.states.length;
+            return decisionToDFA.length;
         }
     }
 

diff --git a/src/atn/ParserATNSimulator.ts b/src/atn/ParserATNSimulator.ts
@@ -1282,7 +1282,7 @@ export class ParserATNSimulator extends ATNSimulator {
                     }
 
                     c.reachesIntoOuterContext += 1;
-                    if (closureBusy.add(c) !== c) {
+                    if (closureBusy.getOrAdd(c) !== c) {
                         // avoid infinite recursion for right-recursive rules
                         continue;
                     }
@@ -1291,10 +1291,11 @@ export class ParserATNSimulator extends ATNSimulator {
                     configs.dipsIntoOuterContext = true;
                     newDepth -= 1;
                 } else {
-                    if (!t.isEpsilon && closureBusy.add(c) !== c) {
+                    if (!t.isEpsilon && closureBusy.getOrAdd(c) !== c) {
                         // avoid infinite recursion for EOF* and EOF+
                         continue;
                     }
+
                     if (t instanceof RuleTransition) {
                         // latch when newDepth goes negative - once we step out of the entry context we can't return
                         if (newDepth >= 0) {
@@ -1579,18 +1580,17 @@ export class ParserATNSimulator extends ATNSimulator {
         if (D === ATNSimulator.ERROR) {
             return D;
         }
-        const existing = dfa.states.get(D);
+        const existing = dfa.getState(D);
         if (existing !== null) {
             return existing;
         }
 
-        D.stateNumber = dfa.states.length;
         if (!D.configs.readOnly) {
             D.configs.optimizeConfigs(this);
             D.configs.setReadonly(true);
         }
 
-        dfa.states.add(D);
+        dfa.addState(D);
 
         return D;
     }

diff --git a/src/atn/PredictionContextCache.ts b/src/atn/PredictionContextCache.ts
@@ -6,14 +6,15 @@
 
 import { PredictionContext } from "./PredictionContext.js";
 import { HashMap } from "../misc/HashMap.js";
+import { ObjectEqualityComparator } from "../misc/ObjectEqualityComparator.js";
 
 /**
  * Used to cache {@link PredictionContext} objects. Its used for the shared
  * context cash associated with contexts in DFA states. This cache
  * can be used for both lexers and parsers.
  */
 export class PredictionContextCache {
-    private cache = new HashMap<PredictionContext, PredictionContext>();
+    private cache = new HashMap<PredictionContext, PredictionContext>(ObjectEqualityComparator.instance);
 
     /**
      * Add a context to the cache and return it. If the context already exists,
@@ -41,6 +42,6 @@ export class PredictionContextCache {
     }
 
     public get length(): number {
-        return this.cache.length;
+        return this.cache.size;
     }
 }