Skip to content

Commit

Permalink
A number of improvements
Browse files Browse the repository at this point in the history
- HashMap and HashSet have been replaced by the implementation from antlr4ts. This required the addition of some comparators. On the other hand the stardard hash + equals helper functions could go.
- This improved cold start times in some cases significantly.
- RuleContext.toStringTree now accepts null for the ruleNames and Parser parameters.
- Updated the MySQL test parser grammar and test statements for latest server version.
- Fixed configuration for generated tests cases so that tsc no longer complains about the code.

Signed-off-by: Mike Lischke <[email protected]>
  • Loading branch information
mike-lischke committed Feb 15, 2024
1 parent f2df072 commit 3c048c0
Show file tree
Hide file tree
Showing 35 changed files with 1,026 additions and 661 deletions.
38 changes: 20 additions & 18 deletions ReadMe.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,14 +152,14 @@ Last release (pure TypeScript):

| Test | Cold Run | Warm Run|
| ---- | -------- | ------- |
| Query Collection| 56252 ms | 326 ms |
| Example File | 1010 ms | 195 ms |
| Large Inserts | 13580 ms | 13677 ms |
| Total | 20359 ms | 14238 ms |
| Query Collection| 3230 ms | 311 ms |
| Example File | 459 ms | 185 ms |
| Large Inserts | 13092 ms | 13063 ms |
| Total | 16739 ms | 13584 ms |

The benchmarks consist of a set of query files, which are parsed by a MySQL parser. The MySQL grammar is one of the largest and most complex grammars you can find for ANTLR4, which, I think, makes it a perfect test case for parser tests.

The query collection file contains more than 900 MySQL queries of all kinds, from very simple comments-only statements to complex stored procedures, including some deeply nested select queries that can easily exhaust the available stack space (in certain situations, such as parsing in a thread with default stack size). The minimum MySQL server version used was 8.0.0.
The query collection file contains more than 900 MySQL queries of all kinds, from very simple comments-only statements to complex stored procedures, including some deeply nested select queries that can easily exhaust the available stack space (in certain situations, such as parsing in a thread with default stack size). The minimum MySQL server version used was 8.2.

The large binary inserts file contains only a few dozen queries, but they are really large with deep recursions, so they stress the prediction engine of the parser. In addition, one query contains binary (image) data containing input characters from the entire UTF-8 range.

Expand All @@ -172,19 +172,19 @@ Since the Java runtime tests have been ported to TypeScript there's another set
The original Java execution times have been taken on OS X with a 4 GHz Intel Core i7 (Java VM args: `-Xms2G -Xmx8g`):

```bash
loadNewUTF8 average time 356µs size 29191b over 3500 loads of 29191 symbols from Parser.java
loadNewUTF8 average time 75µs size 7552b over 3500 loads of 7552 symbols from RuleContext.java
loadNewUTF8 average time 122µs size 31784b over 3500 loads of 13379 symbols from udhr_hin.txt

lexNewJavaUTF8 average time 641µs over 2000 runs of 29191 symbols
lexNewJavaUTF8 average time 4987µs over 2000 runs of 29191 symbols DFA cleared

lexNewGraphemeUTF8 average time 13537µs over 400 runs of 6614 symbols from udhr_kor.txt
lexNewGraphemeUTF8 average time 13802µs over 400 runs of 6614 symbols from udhr_kor.txt DFA cleared
lexNewGraphemeUTF8 average time 18762µs over 400 runs of 13379 symbols from udhr_hin.txt
lexNewGraphemeUTF8 average time 18925µs over 400 runs of 13379 symbols from udhr_hin.txt DFA cleared
lexNewGraphemeUTF8 average time 340µs over 400 runs of 85 symbols from emoji.txt
lexNewGraphemeUTF8 average time 401µs over 400 runs of 85 symbols from emoji.txt DFA cleared
load_new_utf8 average time 232us size 131232b over 3500 loads of 29038 symbols from Parser.java
load_new_utf8 average time 69us size 32928b over 3500 loads of 7625 symbols from RuleContext.java
load_new_utf8 average time 210us size 65696b over 3500 loads of 13379 symbols from udhr_hin.txt

lex_new_java_utf8 average time 439us over 2000 runs of 29038 symbols
lex_new_java_utf8 average time 969us over 2000 runs of 29038 symbols DFA cleared

lex_new_grapheme_utf8 average time 4034us over 400 runs of 6614 symbols from udhr_kor.txt
lex_new_grapheme_utf8 average time 4173us over 400 runs of 6614 symbols from udhr_kor.txt DFA cleared
lex_new_grapheme_utf8 average time 7680us over 400 runs of 13379 symbols from udhr_hin.txt
lex_new_grapheme_utf8 average time 7946us over 400 runs of 13379 symbols from udhr_hin.txt DFA cleared
lex_new_grapheme_utf8 average time 70us over 400 runs of 85 symbols from emoji.txt
lex_new_grapheme_utf8 average time 82us over 400 runs of 85 symbols from emoji.txt DFA cleared
```

The execute times on last release of this runtime have been measured as:
Expand All @@ -205,6 +205,8 @@ The execute times on last release of this runtime have been measured as:
lexNewGraphemeUTF8 average time 387µs over 400 runs of 85 symbols from emoji.txt DFA cleared
```

Note: some of the corpus sizes differ, because of the test restructuring. In any case, the numbers cannot be compared directly, because different machines were used to take them.

## Release Notes

### 2.0.10
Expand Down
2 changes: 1 addition & 1 deletion package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion src/Parser.ts
Original file line number Diff line number Diff line change
Expand Up @@ -686,7 +686,7 @@ export abstract class Parser extends Recognizer<ParserATNSimulator> {
public dumpDFA(): void {
let seenOne = false;
for (const dfa of this.interpreter.decisionToDFA) {
if (dfa.states.length > 0) {
if (dfa.length > 0) {
if (seenOne) {
console.log();
}
Expand Down
8 changes: 4 additions & 4 deletions src/RuleContext.ts
Original file line number Diff line number Diff line change
Expand Up @@ -155,14 +155,14 @@ export class RuleContext implements ParseTree {
* Print out a whole tree, not just a node, in LISP format
* (root child1 .. childN). Print just a node if this is a leaf.
*/
public toStringTree(recog: Parser): string;
public toStringTree(ruleNames: string[], recog: Parser): string;
public toStringTree(recog: Parser | null): string;
public toStringTree(ruleNames: string[] | null, recog: Parser): string;
public toStringTree(...args: unknown[]): string {
if (args.length === 1) {
return Trees.toStringTree(this, null, args[0] as Parser);
return Trees.toStringTree(this, null, args[0] as Parser | null);
}

return Trees.toStringTree(this, args[0] as string[], args[1] as Parser);
return Trees.toStringTree(this, args[0] as string[] | null, args[1] as Parser);
}

public toString(ruleNames?: string[] | null, stop?: RuleContext | null): string {
Expand Down
19 changes: 0 additions & 19 deletions src/atn/ATNConfig.ts
Original file line number Diff line number Diff line change
Expand Up @@ -170,25 +170,6 @@ export class ATNConfig {
this.precedenceFilterSuppressed === other.precedenceFilterSuppressed;
}

public hashCodeForConfigSet(): number {
let hashCode = 7;
hashCode = 31 * hashCode + this.state.stateNumber;
hashCode = 31 * hashCode + this.alt;
hashCode = 31 * hashCode + this.semanticContext.hashCode();

return hashCode;
}

public equalsForConfigSet(other: ATNConfig): boolean {
if (this === other) {
return true;
}

return this.state.stateNumber === other.state.stateNumber &&
this.alt === other.alt &&
this.semanticContext.equals(other.semanticContext);
}

public toString(_recog?: Recognizer<ATNSimulator> | null, showAlt = true): string {
let alt = "";
if (showAlt) {
Expand Down
55 changes: 35 additions & 20 deletions src/atn/ATNConfigSet.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,11 @@
* can be found in the LICENSE.txt file in the project root.
*/

/* eslint-disable jsdoc/require-returns, jsdoc/require-param */
/* eslint-disable jsdoc/require-returns, jsdoc/require-param, max-classes-per-file */

import { ATN } from "./ATN.js";
import { SemanticContext } from "./SemanticContext.js";
import { merge } from "./PredictionContextUtils.js";
import { HashSet } from "../misc/HashSet.js";

import { equalArrays, arrayToString } from "../utils/helpers.js";
import { ATNConfig } from "./ATNConfig.js";
Expand All @@ -19,18 +18,31 @@ import { PredictionContext } from "./PredictionContext.js";
import { ATNState } from "./ATNState.js";
import { ATNSimulator } from "./ATNSimulator.js";
import { MurmurHash } from "../utils/MurmurHash.js";
import { HashSet } from "../misc/HashSet.js";
import type { EqualityComparator } from "../misc/EqualityComparator.js";

const hashATNConfig = (c: ATNConfig) => {
return c.hashCodeForConfigSet();
};
class KeyTypeEqualityComparer implements EqualityComparator<ATNConfig> {
public static readonly instance = new KeyTypeEqualityComparer();

const equalATNConfigs = (a: ATNConfig, b: ATNConfig): boolean => {
if (a === b) {
return true;
} else if (a === null || b === null) {
return false;
} else { return a.equalsForConfigSet(b); }
};
public hashCode(config: ATNConfig) {
let hashCode = 7;
hashCode = 31 * hashCode + config.state.stateNumber;
hashCode = 31 * hashCode + config.alt;
hashCode = 31 * hashCode + config.semanticContext.hashCode();

return hashCode;
}

public equals(a: ATNConfig, b: ATNConfig) {
if (a === b) {
return true;
}

return a.state.stateNumber === b.state.stateNumber &&
a.alt === b.alt &&
a.semanticContext.equals(b.semanticContext);
}
}

/**
* Specialized {@link HashSet}`<`{@link ATNConfig}`>` that can track
Expand All @@ -50,7 +62,8 @@ export class ATNConfigSet {
* All configs but hashed by (s, i, _, pi) not including context. Wiped out
* when we go readonly as this set becomes a DFA state
*/
public configLookup: HashSet<ATNConfig> | null = new HashSet<ATNConfig>(hashATNConfig, equalATNConfigs);
public configLookup: HashSet<ATNConfig> | null =
new HashSet<ATNConfig>(KeyTypeEqualityComparer.instance);

// Track the elements as they are added to the set; supports get(i).
public configs: ATNConfig[] = [];
Expand Down Expand Up @@ -129,7 +142,7 @@ export class ATNConfigSet {
}

//config.useSimpleHash = true;
const existing = this.configLookup!.add(config);
const existing = this.configLookup!.getOrAdd(config);
if (existing === config) {
this.#cachedHashCode = -1;
this.configs.push(config); // track order here
Expand Down Expand Up @@ -206,7 +219,7 @@ export class ATNConfigSet {
throw new Error("This set is readonly");
}

if (this.configLookup!.length === 0) {
if (this.configLookup!.size === 0) {
return;
}

Expand All @@ -232,11 +245,13 @@ export class ATNConfigSet {
this.uniqueAlt === other.uniqueAlt &&
this.conflictingAlts === other.conflictingAlts &&
this.hasSemanticContext === other.hasSemanticContext &&
this.dipsIntoOuterContext === other.dipsIntoOuterContext) {
this.dipsIntoOuterContext === other.dipsIntoOuterContext &&
equalArrays(this.configs, other.configs)) {

return true;
}

return equalArrays(this.configs, other.configs);
return false;
}

public hashCode(): number {
Expand Down Expand Up @@ -264,15 +279,15 @@ export class ATNConfigSet {
throw new Error("This method is not implemented for readonly sets.");
}

return this.configLookup.has(item);
return this.configLookup.contains(item);
}

public containsFast(item: ATNConfig): boolean {
if (this.configLookup === null) {
throw new Error("This method is not implemented for readonly sets.");
}

return this.configLookup.has(item);
return this.configLookup.contains(item);
}

public clear(): void {
Expand All @@ -281,7 +296,7 @@ export class ATNConfigSet {
}
this.configs = [];
this.#cachedHashCode = -1;
this.configLookup = new HashSet();
this.configLookup = new HashSet(KeyTypeEqualityComparer.instance);
}

public setReadonly(readOnly: boolean): void {
Expand Down
5 changes: 3 additions & 2 deletions src/atn/ATNSimulator.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,11 @@

import { DFAState } from "../dfa/DFAState.js";
import { getCachedPredictionContext } from "./PredictionContextUtils.js";
import { HashMap } from "../misc/HashMap.js";
import { ATN } from "./ATN.js";
import { PredictionContextCache } from "./PredictionContextCache.js";
import { PredictionContext } from "./PredictionContext.js";
import { HashMap } from "../misc/HashMap.js";
import { ObjectEqualityComparator } from "../misc/ObjectEqualityComparator.js";

export abstract class ATNSimulator {
/** Must distinguish between missing edge and edge we know leads nowhere */
Expand Down Expand Up @@ -68,7 +69,7 @@ export abstract class ATNSimulator {
if (this.sharedContextCache === null) {
return context;
}
const visited = new HashMap<PredictionContext, PredictionContext>();
const visited = new HashMap<PredictionContext, PredictionContext>(ObjectEqualityComparator.instance);

return getCachedPredictionContext(context, this.sharedContextCache, visited);
}
Expand Down
7 changes: 0 additions & 7 deletions src/atn/LexerATNConfig.ts
Original file line number Diff line number Diff line change
Expand Up @@ -74,11 +74,4 @@ export class LexerATNConfig extends ATNConfig {

}

public override hashCodeForConfigSet(): number {
return this.hashCode();
}

public override equalsForConfigSet(other: LexerATNConfig): boolean {
return this.equals(other);
}
}
8 changes: 3 additions & 5 deletions src/atn/LexerATNSimulator.ts
Original file line number Diff line number Diff line change
Expand Up @@ -599,17 +599,15 @@ export class LexerATNSimulator extends ATNSimulator {
}

const dfa = this.decisionToDFA[this.mode];
const existing = dfa.states.get(proposed);
const existing = dfa.getState(proposed);
if (existing !== null) {
return existing;
}

const newState = proposed;
newState.stateNumber = dfa.states.length;
configs.setReadonly(true);
dfa.states.add(newState);
dfa.addState(proposed);

return newState;
return proposed;
}
}

Expand Down
2 changes: 1 addition & 1 deletion src/atn/ParseInfo.ts
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,7 @@ export class ParseInfo {
} else {
const decisionToDFA = this.atnSimulator.decisionToDFA[decision];

return decisionToDFA.states.length;
return decisionToDFA.length;
}
}

Expand Down
10 changes: 5 additions & 5 deletions src/atn/ParserATNSimulator.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1282,7 +1282,7 @@ export class ParserATNSimulator extends ATNSimulator {
}

c.reachesIntoOuterContext += 1;
if (closureBusy.add(c) !== c) {
if (closureBusy.getOrAdd(c) !== c) {
// avoid infinite recursion for right-recursive rules
continue;
}
Expand All @@ -1291,10 +1291,11 @@ export class ParserATNSimulator extends ATNSimulator {
configs.dipsIntoOuterContext = true;
newDepth -= 1;
} else {
if (!t.isEpsilon && closureBusy.add(c) !== c) {
if (!t.isEpsilon && closureBusy.getOrAdd(c) !== c) {
// avoid infinite recursion for EOF* and EOF+
continue;
}

if (t instanceof RuleTransition) {
// latch when newDepth goes negative - once we step out of the entry context we can't return
if (newDepth >= 0) {
Expand Down Expand Up @@ -1579,18 +1580,17 @@ export class ParserATNSimulator extends ATNSimulator {
if (D === ATNSimulator.ERROR) {
return D;
}
const existing = dfa.states.get(D);
const existing = dfa.getState(D);
if (existing !== null) {
return existing;
}

D.stateNumber = dfa.states.length;
if (!D.configs.readOnly) {
D.configs.optimizeConfigs(this);
D.configs.setReadonly(true);
}

dfa.states.add(D);
dfa.addState(D);

return D;
}
Expand Down
5 changes: 3 additions & 2 deletions src/atn/PredictionContextCache.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,15 @@

import { PredictionContext } from "./PredictionContext.js";
import { HashMap } from "../misc/HashMap.js";
import { ObjectEqualityComparator } from "../misc/ObjectEqualityComparator.js";

/**
* Used to cache {@link PredictionContext} objects. Its used for the shared
* context cash associated with contexts in DFA states. This cache
* can be used for both lexers and parsers.
*/
export class PredictionContextCache {
private cache = new HashMap<PredictionContext, PredictionContext>();
private cache = new HashMap<PredictionContext, PredictionContext>(ObjectEqualityComparator.instance);

/**
* Add a context to the cache and return it. If the context already exists,
Expand Down Expand Up @@ -41,6 +42,6 @@ export class PredictionContextCache {
}

public get length(): number {
return this.cache.length;
return this.cache.size;
}
}
Loading

0 comments on commit 3c048c0

Please sign in to comment.