Full TypeScript implementation

Signed-off-by: Mike Lischke <[email protected]>
mike-lischke · Nov 7, 2023 · fcb471e · fcb471e
1 parent 7c2a20e
commit fcb471e
Show file tree

Hide file tree

Showing 313 changed files with 8,624 additions and 9,404 deletions.
diff --git a/.eslintrc.json b/.eslintrc.json
@@ -13,6 +13,7 @@
     ],
     "ignorePatterns": [
         "src/**/*.js",
+        "src/tree/xpath/XPathLexer.ts",
         "spec/**/*.*js",
         "dist/**/*",
         "cli/index.js",
@@ -536,7 +537,7 @@
         ],
         "@typescript-eslint/no-explicit-any": "error",
         "@typescript-eslint/no-parameter-properties": "off",
-        "@typescript-eslint/no-use-before-define": "error",
+        "@typescript-eslint/no-use-before-define": "off",
         "@typescript-eslint/no-unsafe-assignment": "off", // TODO: enable
         "@typescript-eslint/no-unsafe-member-access": "off", // TODO: enable
         "@typescript-eslint/no-unsafe-call": "error", // TODO: enable
@@ -610,6 +611,7 @@
         ],
         "@typescript-eslint/restrict-template-expressions": "off",
         "@typescript-eslint/restrict-plus-operands": "off",
+        "@typescript-eslint/no-base-to-string": "off",
         "jsdoc/check-alignment": "error",
         "jsdoc/check-indentation": "off",
         "jsdoc/require-param-type": "off",
@@ -620,6 +622,7 @@
             {
                 "startLines": 1
             }
-        ]
+        ],
+        "jsdoc/no-undefined-types": "off"
     }
 }
diff --git a/.github/workflows/nodejs.yml b/.github/workflows/nodejs.yml
@@ -5,7 +5,7 @@ name: Build & Test
 
 on:
   push:
-    branches: [ master ]
+    branches: [ master, ts-migration ]
   pull_request:
     branches: [ master ]
 

diff --git a/.vscode/launch.json b/.vscode/launch.json
@@ -8,8 +8,9 @@
             "type": "node",
             "request": "launch",
             "name": "Run current Jest test",
-            "runtimeExecutable": null,
+            "runtimeExecutable": "node",
             "runtimeArgs": [
+                "--experimental-vm-modules",
                 "${workspaceRoot}/node_modules/.bin/jest",
                 "${fileBasenameNoExtension}.ts",
                 "--no-coverage",
@@ -38,6 +39,7 @@
                 "ts-node/esm",
                 "tests/benchmarks/run-benchmarks.ts",
             ],
+            "sourceMaps": true,
         }
     ]
 }
diff --git a/ReadMe.md b/ReadMe.md
@@ -6,23 +6,18 @@
 
 # TypeScript Runtime for ANTLR 4
 
-This package is a fork of the official ANTLR4 JavaScript runtime (with its TypeScript additions), with the following changes:
+This package is a fork of the official ANTLR4 JavaScript runtime and has been fully transformed to TypeScript. Other improvements are:
 
-- Much improved TypeScript type definitions.
 - XPath implementation.
 - Vocabulary implementation.
 - Complete Interval implementation.
 - Parser and lexer interpreters.
-- A couple of bug fixes.
-- Consistent formatting (indentation, semicolons, spaces, etc.).
-- Project folder structure is now similar to the Java runtime.
-- Numerous smaller fixes (`null` instead of `undefined` and others).
+- Numerous bug fixes and other changes.
 - Smaller node package (no test specs or other unnecessary files).
 - No CommonJS support anymore (ESM only). No differentiation between node and browser environments.
-- Build is now based on esbuild.
 - Includes the `antlr4ng-cli` tool to generate parser files compatible with this runtime. This tool uses a custom build of the ANTLR4 tool.
 
-It is (mostly) a drop-in replacement of the `antlr4` package, and can be used as such. For more information about ANTLR see www.antlr.org. Read more details about the [JavaScript](https://github.com/antlr/antlr4/blob/master/doc/javascript-target.md) and [TypeScript](https://github.com/antlr/antlr4/blob/master/doc/typescript-target.md) targets at the provided links, but keep in mind that this documentation applies to the original JS/TS target.
+This package is a blend of the original JS implementation and antlr4ts, which is a TypeScript implementation of the ANTLR4 runtime, but was abandoned. It tries to keep the best of both worlds, while following the Java runtime as close as possible. It's a bit slower than the JS runtime, but faster than antlr4ts.
 
 ## Installation
 
@@ -39,9 +34,22 @@ npm install --save-dev antlr4ng-cli
 ```
 See [its readme](./cli/ReadMe.md) for more information.
 
+If you come from one of the other JS/TS runtimes, you may have to adjust your code a bit. The antlr4ng package more strictly exposes the Java nullability for certain members. This will require that you either use the non-null assertion operator to force the compiler to accept your code, or you have to check for nullability before accessing a member. The latter is the recommended way, as it is safer.
+
+Additionally, some members have been renamed to more TypeScript like names (e.g. Parser._ctx is now Parser.context). The following table shows the most important changes:
+
+| Old Name | New Name |
+| -------- | -------- |
+| Parser._ctx | Parser.context |
+| Parser._errHandler | Parser.errorHandler |
+| Parser._input | Parser.inputStream |
+| Parser._interp | Parser.interpreter |
+
+The package requires ES2022 or newer, for features like static initialization blocks in classes and private fields (`#field`). It is recommended to use the latest TypeScript version.
+
 ## Benchmarks
 
-This runtime is constantly monitored for performance regressions. The following table shows the results of the benchmarks run on last release:
+This runtime is monitored for performance regressions. The following table shows the results of the benchmarks run on last release:
 
 | Test | Cold Run | Warm Run|
 | ---- | -------- | ------- |
@@ -50,11 +58,11 @@ This runtime is constantly monitored for performance regressions. The following
 | Large Inserts | 11022 ms | 10616 ms |
 | Total | 20599 ms | 10978 ms |
 
-The benchmarks consist of a set of query files, which are parsed by a MySQL parser. The query collection file contains more than 900 MySQL queries of all kinds, from very simple to complex stored procedures, including some deeply nested select queries that can easily exhaust available stack space. The minimum MySQL server version used was 8.0.0.
+The benchmarks consist of a set of query files, which are parsed by a MySQL parser. The query collection file contains more than 900 MySQL queries of all kinds, from very simple to complex stored procedures, including some deeply nested select queries that can easily exhaust the available stack space (in certain situations, such as parsing in a thread with default stack size). The minimum MySQL server version used was 8.0.0.
 
-The large binary inserts file contains only a few dozen queries, but they are really large with deep recursions, stressing so the prediction engine of the parser. Additionally, one query contains binary (image) data which contains input characters from the whole UTF-8 range.
+The large binary inserts file contains only a few dozen queries, but they are really large with deep recursions, so they stress the prediction engine of the parser. In addition, one query contains binary (image) data containing input characters from the entire UTF-8 range.
 
-The example file is a copy of the largest test file in [this repository](https://github.com/antlr/grammars-v4/tree/master/sql/mysql/Positive-Technologies/examples), and is known to be very slow to parse with other parsers, but the one used here.
+The example file is a copy of the largest test file in [this repository](https://github.com/antlr/grammars-v4/tree/master/sql/mysql/Positive-Technologies/examples), and is known to be very slow to parse with other MySQL grammars. The one used here, however, is fast.
 
 ## Release Notes
 

diff --git a/cli/antlr4-4.13.2-SNAPSHOT-complete.jar b/cli/antlr4-4.13.2-SNAPSHOT-complete.jar
diff --git a/cspell.json b/cspell.json
@@ -13,23 +13,34 @@
         "rdbms",
         "runtimes",
         "sakila",
+        "unpredicated",
         "whitespaces"
     ],
     "ignoreWords": [
         "AMBIG",
         "Dlanguage",
         "Grosch",
         "Harwell",
+        "Hashable",
+        "IATN",
+        "Nondisjoint",
+        "Preds",
+        "Sethi",
+        "Ullman",
         "Wirth",
         "Xexact",
         "bitrix",
         "interp",
         "localctx",
+        "longlong",
         "nbits",
+        "opnds",
         "outfile",
         "parentctx",
+        "prec",
         "precpred",
         "recog",
+        "semctx",
         "sempred",
         "ttype"
     ],

diff --git a/package.json b/package.json
@@ -37,22 +37,20 @@
         "typescript": "5.2.2"
     },
     "scripts": {
-        "prepublishOnly": "npm run build && npm run test",
-        "build": "npm run generate-test-parser && esbuild ./src/index.js --bundle --outfile=dist/antlr4.mjs --format=esm --sourcemap=external --minify",
+        "prepublishOnly": "npm run build-minified && npm run test",
+        "tsc": "tsc --watch",
+        "build": "npm run generate-test-parser && esbuild ./src/index.js --bundle --outfile=dist/antlr4.mjs --format=esm --sourcemap",
+        "build-minified": "npm run generate-test-parser && esbuild ./src/index.js --bundle --outfile=dist/antlr4.mjs --format=esm --sourcemap --minify",
         "full-test": "npm run test && npm run run-benchmarks",
         "test": "node --no-warnings --experimental-vm-modules node_modules/jest/bin/jest.js --no-coverage",
-        "lint": "eslint src/",
         "generate-test-parser": "cli/index.js -Dlanguage=TypeScript -o tests/benchmarks/generated -visitor -listener -Xexact-output-dir tests/benchmarks/MySQLLexer.g4 tests/benchmarks/MySQLParser.g4",
-        "run-benchmarks": "node --no-warnings --experimental-vm-modules --loader ts-node/esm tests/benchmarks/run-benchmarks.ts"
+        "generate-xpath-lexer": "cli/index.js -Dlanguage=TypeScript -o src/tree/xpath/generated -no-visitor -no-listener -Xexact-output-dir src/tree/xpath/XPathLexer.g4",
+        "run-benchmarks": "node --no-warnings --experimental-vm-modules  --loader ts-node/esm tests/benchmarks/run-benchmarks.ts",
+        "profile benchmarks": "node --no-warnings --experimental-vm-modules --prof --loader ts-node/esm tests/benchmarks/run-benchmarks.ts",
+        "process profile tick file": " node --prof-process isolate-0x130008000-75033-v8.log > processed.txt"
     },
     "exports": {
-        "types": "./src/index.d.ts",
+        "types": "./src/index.ts",
         "default": "./dist/antlr4.mjs"
-    },
-    "babel": {
-        "presets": [
-            "@babel/preset-env"
-        ],
-        "targets": "defaults"
     }
 }
diff --git a/src/ANTLRErrorListener.ts b/src/ANTLRErrorListener.ts
@@ -0,0 +1,180 @@
+/*
+ * Copyright (c) The ANTLR Project. All rights reserved.
+ * Use of this file is governed by the BSD 3-clause license that
+ * can be found in the LICENSE.txt file in the project root.
+ */
+
+import { Parser } from "./Parser.js";
+import { RecognitionException } from "./RecognitionException.js";
+import { Recognizer } from "./Recognizer.js";
+import { ATNConfigSet } from "./atn/ATNConfigSet.js";
+import { DFA } from "./dfa/DFA.js";
+import { ATNSimulator } from "./atn/ATNSimulator.js";
+import { Token } from "./Token.js";
+import { BitSet } from "./misc/BitSet.js";
+
+/** How to emit recognition errors. */
+export interface ANTLRErrorListener {
+    /**
+     * Upon syntax error, notify any interested parties. This is not how to
+     * recover from errors or compute error messages. {@link ANTLRErrorStrategy}
+     * specifies how to recover from syntax errors and how to compute error
+     * messages. This listener's job is simply to emit a computed message,
+     * though it has enough information to create its own message in many cases.
+     *
+     * <p>The {@link RecognitionException} is non-null for all syntax errors except
+     * when we discover mismatched token errors that we can recover from
+     * in-line, without returning from the surrounding rule (via the single
+     * token insertion and deletion mechanism).</p>
+     *
+     * @param recognizer
+     *        What parser got the error. From this
+     * 		  object, you can access the context as well
+     * 		  as the input stream.
+     * @param offendingSymbol
+     *        The offending token in the input token
+     * 		  stream, unless recognizer is a lexer (then it's null). If
+     * 		  no viable alternative error, {@code e} has token at which we
+     * 		  started production for the decision.
+     * @param line
+     * 		  The line number in the input where the error occurred.
+     * @param charPositionInLine
+     * 		  The character position within that line where the error occurred.
+     * @param msg
+     * 		  The message to emit.
+     * @param e
+     *        The exception generated by the parser that led to
+     *        the reporting of an error. It is null in the case where
+     *        the parser was able to recover in line without exiting the
+     *        surrounding rule.
+     */
+    syntaxError<S extends Token, T extends ATNSimulator>(recognizer: Recognizer<T>,
+        offendingSymbol: S | null,
+        line: number,
+        charPositionInLine: number,
+        msg: string,
+        e: RecognitionException | null): void;
+
+    /**
+     * This method is called by the parser when a full-context prediction
+     * results in an ambiguity.
+     *
+     * <p>Each full-context prediction which does not result in a syntax error
+     * will call either {@link #reportContextSensitivity} or
+     * {@link #reportAmbiguity}.</p>
+     *
+     * <p>When {@code ambigAlts} is not null, it contains the set of potentially
+     * viable alternatives identified by the prediction algorithm. When
+     * {@code ambigAlts} is null, use {@link ATNConfigSet#getAlts} to obtain the
+     * represented alternatives from the {@code configs} argument.</p>
+     *
+     * <p>When {@code exact} is {@code true}, <em>all</em> of the potentially
+     * viable alternatives are truly viable, i.e. this is reporting an exact
+     * ambiguity. When {@code exact} is {@code false}, <em>at least two</em> of
+     * the potentially viable alternatives are viable for the current input, but
+     * the prediction algorithm terminated as soon as it determined that at
+     * least the <em>minimum</em> potentially viable alternative is truly
+     * viable.</p>
+     *
+     * <p>When the {@link PredictionMode#LL_EXACT_AMBIG_DETECTION} prediction
+     * mode is used, the parser is required to identify exact ambiguities so
+     * {@code exact} will always be {@code true}.</p>
+     *
+     * <p>This method is not used by lexers.</p>
+     *
+     * @param recognizer the parser instance
+     * @param dfa the DFA for the current decision
+     * @param startIndex the input index where the decision started
+     * @param stopIndex the input input where the ambiguity was identified
+     * @param exact {@code true} if the ambiguity is exactly known, otherwise
+     * {@code false}. This is always {@code true} when
+     * {@link PredictionMode#LL_EXACT_AMBIG_DETECTION} is used.
+     * @param ambigAlts the potentially ambiguous alternatives, or {@code null}
+     * to indicate that the potentially ambiguous alternatives are the complete
+     * set of represented alternatives in {@code configs}
+     * @param configs the ATN configuration set where the ambiguity was
+     * identified
+     */
+    reportAmbiguity(recognizer: Parser,
+        dfa: DFA,
+        startIndex: number,
+        stopIndex: number,
+        exact: boolean,
+        ambigAlts: BitSet | null,
+        configs: ATNConfigSet): void;
+
+    /**
+     * This method is called when an SLL conflict occurs and the parser is about
+     * to use the full context information to make an LL decision.
+     *
+     * <p>If one or more configurations in {@code configs} contains a semantic
+     * predicate, the predicates are evaluated before this method is called. The
+     * subset of alternatives which are still viable after predicates are
+     * evaluated is reported in {@code conflictingAlts}.</p>
+     *
+     * <p>This method is not used by lexers.</p>
+     *
+     * @param recognizer the parser instance
+     * @param dfa the DFA for the current decision
+     * @param startIndex the input index where the decision started
+     * @param stopIndex the input index where the SLL conflict occurred
+     * @param conflictingAlts The specific conflicting alternatives. If this is
+     * {@code null}, the conflicting alternatives are all alternatives
+     * represented in {@code configs}. At the moment, conflictingAlts is non-null
+     * (for the reference implementation, but Sam's optimized version can see this
+     * as null).
+     * @param configs the ATN configuration set where the SLL conflict was
+     * detected
+     */
+    reportAttemptingFullContext(recognizer: Parser,
+        dfa: DFA,
+        startIndex: number,
+        stopIndex: number,
+        conflictingAlts: BitSet | null,
+        configs: ATNConfigSet): void;
+
+    /**
+     * This method is called by the parser when a full-context prediction has a
+     * unique result.
+     *
+     * <p>Each full-context prediction which does not result in a syntax error
+     * will call either {@link #reportContextSensitivity} or
+     * {@link #reportAmbiguity}.</p>
+     *
+     * <p>For prediction implementations that only evaluate full-context
+     * predictions when an SLL conflict is found (including the default
+     * {@link ParserATNSimulator} implementation), this method reports cases
+     * where SLL conflicts were resolved to unique full-context predictions,
+     * i.e. the decision was context-sensitive. This report does not necessarily
+     * indicate a problem, and it may appear even in completely unambiguous
+     * grammars.</p>
+     *
+     * <p>{@code configs} may have more than one represented alternative if the
+     * full-context prediction algorithm does not evaluate predicates before
+     * beginning the full-context prediction. In all cases, the final prediction
+     * is passed as the {@code prediction} argument.</p>
+     *
+     * <p>Note that the definition of "context sensitivity" in this method
+     * differs from the concept in {@link DecisionInfo#contextSensitivities}.
+     * This method reports all instances where an SLL conflict occurred but LL
+     * parsing produced a unique result, whether or not that unique result
+     * matches the minimum alternative in the SLL conflicting set.</p>
+     *
+     * <p>This method is not used by lexers.</p>
+     *
+     * @param recognizer the parser instance
+     * @param dfa the DFA for the current decision
+     * @param startIndex the input index where the decision started
+     * @param stopIndex the input index where the context sensitivity was
+     * finally determined
+     * @param prediction the unambiguous result of the full-context prediction
+     * @param configs the ATN configuration set where the unambiguous prediction
+     * was determined
+     */
+    reportContextSensitivity(recognizer: Parser,
+        dfa: DFA,
+        startIndex: number,
+        stopIndex: number,
+        prediction: number,
+        configs: ATNConfigSet): void;
+}
diff --git a/src/ANTLRErrorStrategy.d.ts → src/ANTLRErrorStrategy.ts b/src/ANTLRErrorStrategy.d.ts → src/ANTLRErrorStrategy.ts
@@ -25,7 +25,7 @@ import { Token } from "./Token.js";
  *
  * <p>TODO: what to do about lexers</p>
  */
-export declare interface ANTLRErrorStrategy {
+export interface ANTLRErrorStrategy {
     /**
      * Reset the error handler state for the specified {@code recognizer}.
      *

diff --git a/src/BailErrorStrategy.d.ts b/src/BailErrorStrategy.d.ts