Skip to content

Commit

Permalink
Now measuring lexing and parsing time separately.
Browse files Browse the repository at this point in the history
Also added measured time for all available TS runtimes, for better comparison.

Signed-off-by: Mike Lischke <[email protected]>
  • Loading branch information
mike-lischke committed Feb 24, 2024
1 parent 537d815 commit 67b2cbf
Show file tree
Hide file tree
Showing 10 changed files with 136 additions and 134 deletions.
64 changes: 30 additions & 34 deletions ReadMe.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,29 +137,25 @@ This suite consists of 530 tests and runs in about 9s.

### Real World Example

The following tables show the results of the benchmarks previously run on the old JS runtime and on last release of this one. Warm times were taken from 5 runs with the 2 slowest stripped off and averaged.

Pure JavaScript release (with type definitions):

| Test | Cold Run | Warm Run|
| ---- | -------- | ------- |
| Query Collection| 8464 ms | 230 ms |
| Example File | 1043 ms | 112 ms |
| Large Inserts | 11022 ms | 10616 ms |
| Total | 20599 ms | 10978 ms |

Last release (pure TypeScript):

| Test | Cold Run | Warm Run|
| ---- | -------- | ------- |
| Query Collection| 2777 ms | 283 ms |
| Example File | 413 ms | 172 ms |
| Large Inserts | 11056 ms | 10939 ms |
| Total | 14315 ms | 11421 ms |
The following table shows the results of the benchmarks that were executed in the [antlr4wasm project](https://github.com/mike-lischke/antlr4wasm/tree/master/benchmarks/mysql). The column for antlr4ng, howevever, contains the current results of this runtime.

| |C++ |antlr4ng|antlr4|antlr4ts|antl4wasm|
|---:|---:|---:|---:|---:|---:|
|Query Collection (cold)|1340 ms| <u>2703</u> ms| 7984 ms| 3402 ms| 3331 ms|
| Bitrix Queries (cold)| 195 ms| <u>409</u> ms| 1134 ms| 444 ms| 998 ms|
| Large Inserts (cold)|4981 ms|10966 ms|<u>10695</u> ms|11483 ms|34243 ms|
|Query Collection (warm)| 133 ms| 283 ms| <u>223</u> ms| 259 ms| 1177 ms|
| Bitrix Queries (warm)| 70 ms| 173 ms| <u>110</u> ms| 131 ms| 815 ms|
| Large Inserts (warm)|4971 ms|10859 ms|<u>10593</u> ms|11287 ms|36317 ms|
|||||||
|Total (cold) |6546 ms|<u>14137</u> ms|19878 ms|15403 ms|38641 ms|
|Total (warm) |5198 ms|11339 ms|<u>10944</u> ms|11697 ms|38329 ms|

Underlined entries are the smallest (not counting C++ which beats them all).

The benchmarks consist of a set of query files, which are parsed by a MySQL parser. The MySQL grammar is one of the largest and most complex grammars you can find for ANTLR4, which, I think, makes it a perfect test case for parser tests.

The query collection file contains more than 900 MySQL queries of all kinds, from very simple comments-only statements to complex stored procedures, including some deeply nested select queries that can easily exhaust the available stack space (in certain situations, such as parsing in a thread with default stack size). The minimum MySQL server version used was 8.2.
The query collection file contains more than 900 MySQL queries of all kinds, from very simple comments-only statements to complex stored procedures, including some deeply nested select queries that can easily exhaust the available stack space (in certain situations, such as parsing in a thread with default stack size). The used MySQL server version was 8.2 (the grammar allows dynamic switching of server versions).

The large binary inserts file contains only a few dozen queries, but they are really large with deep recursions, so they stress the prediction engine of the parser. In addition, one query contains binary (image) data containing input characters from the entire UTF-8 range.

Expand All @@ -172,19 +168,19 @@ Since the Java runtime tests have been ported to TypeScript there's another set
The original Java execution times have been taken on OS X with a 4 GHz Intel Core i7 (Java VM args: `-Xms2G -Xmx8g`):

```bash
load_new_utf8 average time 232us size 131232b over 3500 loads of 29038 symbols from Parser.java
load_new_utf8 average time 69us size 32928b over 3500 loads of 7625 symbols from RuleContext.java
load_new_utf8 average time 210us size 65696b over 3500 loads of 13379 symbols from udhr_hin.txt

lex_new_java_utf8 average time 439us over 2000 runs of 29038 symbols
lex_new_java_utf8 average time 969us over 2000 runs of 29038 symbols DFA cleared

lex_new_grapheme_utf8 average time 4034us over 400 runs of 6614 symbols from udhr_kor.txt
lex_new_grapheme_utf8 average time 4173us over 400 runs of 6614 symbols from udhr_kor.txt DFA cleared
lex_new_grapheme_utf8 average time 7680us over 400 runs of 13379 symbols from udhr_hin.txt
lex_new_grapheme_utf8 average time 7946us over 400 runs of 13379 symbols from udhr_hin.txt DFA cleared
lex_new_grapheme_utf8 average time 70us over 400 runs of 85 symbols from emoji.txt
lex_new_grapheme_utf8 average time 82us over 400 runs of 85 symbols from emoji.txt DFA cleared
load_new_utf8 average time 232µs size 131232b over 3500 loads of 29038 symbols from Parser.java
load_new_utf8 average time 69µs size 32928b over 3500 loads of 7625 symbols from RuleContext.java
load_new_utf8 average time 210µs size 65696b over 3500 loads of 13379 symbols from udhr_hin.txt

lex_new_java_utf8 average time 439µs over 2000 runs of 29038 symbols
lex_new_java_utf8 average time 969µs over 2000 runs of 29038 symbols DFA cleared

lex_new_grapheme_utf8 average time 4034µs over 400 runs of 6614 symbols from udhr_kor.txt
lex_new_grapheme_utf8 average time 4173µs over 400 runs of 6614 symbols from udhr_kor.txt DFA cleared
lex_new_grapheme_utf8 average time 7680µs over 400 runs of 13379 symbols from udhr_hin.txt
lex_new_grapheme_utf8 average time 7946µs over 400 runs of 13379 symbols from udhr_hin.txt DFA cleared
lex_new_grapheme_utf8 average time 70µs over 400 runs of 85 symbols from emoji.txt
lex_new_grapheme_utf8 average time 82µs over 400 runs of 85 symbols from emoji.txt DFA cleared
```

The execute times on last release of this runtime have been measured as:
Expand All @@ -205,7 +201,7 @@ The execute times on last release of this runtime have been measured as:
lexNewGraphemeUTF8 average time 387µs over 400 runs of 85 symbols from emoji.txt DFA cleared
```

Note: some of the corpus sizes differ, because of the test restructuring. In any case, the numbers cannot be compared directly, because different machines were used to take them.
Note: Some of the corpus sizes differ due to the restructuring of the test. However, the numbers are not directly comparable anyway, as they were taken on different machines.

## Release Notes

Expand Down
20 changes: 12 additions & 8 deletions src/Lexer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -317,9 +317,7 @@ export abstract class Lexer extends Recognizer<LexerATNSimulator> implements Tok
const stop = this.#input.index;
const text = this.#input.getTextFromRange(start, stop);
const msg = "token recognition error at: '" + this.getErrorDisplay(text) + "'";
const listener = this.getErrorListenerDispatch();
listener.syntaxError(this, null, this.currentTokenStartLine,
this.currentTokenColumn, msg, e);
this.errorListenerDispatch.syntaxError(this, null, this.currentTokenStartLine, this.currentTokenColumn, msg, e);
}

public getErrorDisplay(s: string): string {
Expand All @@ -329,15 +327,21 @@ export abstract class Lexer extends Recognizer<LexerATNSimulator> implements Tok
public getErrorDisplayForChar(c: string): string {
if (c.charCodeAt(0) === Token.EOF) {
return "<EOF>";
} else if (c === "\n") {
}

if (c === "\n") {
return "\\n";
} else if (c === "\t") {
}

if (c === "\t") {
return "\\t";
} else if (c === "\r") {
}

if (c === "\r") {
return "\\r";
} else {
return c;
}

return c;
}

public getCharErrorDisplay(c: string): string {
Expand Down
3 changes: 1 addition & 2 deletions src/Parser.ts
Original file line number Diff line number Diff line change
Expand Up @@ -400,8 +400,7 @@ export abstract class Parser extends Recognizer<ParserATNSimulator> {
this.syntaxErrors += 1;
const line = offendingToken.line;
const column = offendingToken.column;
const listener = this.getErrorListenerDispatch();
listener.syntaxError(this, offendingToken, line, column, msg, err);
this.errorListenerDispatch.syntaxError(this, offendingToken, line, column, msg, err);
}

/**
Expand Down
2 changes: 1 addition & 1 deletion src/Recognizer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ export abstract class Recognizer<ATNInterpreter extends ATNSimulator> {
return "line " + line + ":" + column;
}

public getErrorListenerDispatch(): ANTLRErrorListener {
public get errorListenerDispatch(): ANTLRErrorListener {
return new ProxyErrorListener(this.#listeners);
}

Expand Down
6 changes: 3 additions & 3 deletions src/atn/ParserATNSimulator.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1589,15 +1589,15 @@ export class ParserATNSimulator extends ATNSimulator {
protected reportAttemptingFullContext(dfa: DFA, conflictingAlts: BitSet, configs: ATNConfigSet, startIndex: number,
stopIndex: number): void {
if (this.parser !== null) {
this.parser.getErrorListenerDispatch().reportAttemptingFullContext(this.parser, dfa, startIndex, stopIndex,
this.parser.errorListenerDispatch.reportAttemptingFullContext(this.parser, dfa, startIndex, stopIndex,
conflictingAlts, configs);
}
}

protected reportContextSensitivity(dfa: DFA, prediction: number, configs: ATNConfigSet, startIndex: number,
stopIndex: number): void {
if (this.parser !== null) {
this.parser.getErrorListenerDispatch().reportContextSensitivity(this.parser, dfa, startIndex, stopIndex,
this.parser.errorListenerDispatch.reportContextSensitivity(this.parser, dfa, startIndex, stopIndex,
prediction, configs);
}
}
Expand All @@ -1606,7 +1606,7 @@ export class ParserATNSimulator extends ATNSimulator {
protected reportAmbiguity(dfa: DFA, D: DFAState, startIndex: number, stopIndex: number,
exact: boolean, ambigAlts: BitSet | null, configs: ATNConfigSet): void {
if (this.parser !== null) {
this.parser.getErrorListenerDispatch().reportAmbiguity(this.parser, dfa, startIndex, stopIndex, exact,
this.parser.errorListenerDispatch.reportAmbiguity(this.parser, dfa, startIndex, stopIndex, exact,
ambigAlts, configs);
}
}
Expand Down
2 changes: 0 additions & 2 deletions tests/BitSet.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,4 @@ describe("BitSet", () => {
expect(bitSet.get(2000)).toEqual(false);
expect(bitSet.length).toEqual(23);
});

// ...
});
40 changes: 19 additions & 21 deletions tests/benchmarks/ParseService.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ import {

import { MySQLLexer } from "./generated/MySQLLexer.js";
import { MySQLParser } from "./generated/MySQLParser.js";
import { IParserErrorInfo, MySQLParseUnit } from "./support/helpers.js";
import { IParserErrorInfo } from "./support/helpers.js";

import { MySQLErrorListener } from "./support/MySQLErrorListener.js";

Expand Down Expand Up @@ -51,15 +51,10 @@ export class ParseService {
/**
* Quick check for syntax errors.
*
* @param text The text to parse.
* @param unit The type of input. Can be used to limit the available syntax to certain constructs.
* @param serverVersion The version of MySQL to use for checking.
* @param sqlMode The current SQL mode in the server.
*
* @returns True if no error was found, otherwise false.
*/
public errorCheck(text: string, unit: MySQLParseUnit, serverVersion: number, sqlMode: string): boolean {
this.startParsing(text, serverVersion, sqlMode);
public errorCheck(): boolean {
this.startParsing();

return this.errors.length === 0;
}
Expand All @@ -82,30 +77,33 @@ export class ParseService {
}

/**
* This is the method to parse text. Depending on fast mode it creates a syntax tree and otherwise
* bails out if an error was found, asap.
* Initializes the lexer with a new string and lets the tokenizer load all tokens.
*
* @param text The text to parse.
* @param serverVersion The version of MySQL to use for checking.
* @param text The text to tokenize.
* @param serverVersion The version of the MySQL server to use for parsing.
* @param sqlMode The current SQL mode in the server.
*
* @returns A parse tree if enabled.
*/
private startParsing(text: string, serverVersion: number, sqlMode: string): ParseTree | undefined {
this.errors = [];
public tokenize(text: string, serverVersion: number, sqlMode: string): void {
this.lexer.inputStream = CharStream.fromString(text);
this.tokenStream.setTokenSource(this.lexer);
this.lexer.serverVersion = serverVersion;
this.lexer.sqlModeFromString(sqlMode);

this.tokenStream.fill();
}

/**
* This is the method to parse text. It uses the token stream from the last call to tokenize.
*/
private startParsing(): void {
this.errors = [];

this.parser.reset();
this.parser.buildParseTrees = false;

this.lexer.serverVersion = serverVersion;
this.lexer.sqlModeFromString(sqlMode);
this.parser.serverVersion = serverVersion;
this.parser.serverVersion = this.lexer.serverVersion;
this.parser.sqlModes = this.lexer.sqlModes;

this.tree = this.parser.query();

return this.tree;
}
}
Loading

0 comments on commit 67b2cbf

Please sign in to comment.