diff --git a/CHANGELOG.md b/CHANGELOG.md index 26ac534b..a536481f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,33 +3,66 @@ Change Log This file documents all notable changes to Peggy. -Unreleased ----------- +3.0.0 +----- -Released: TBD +Released: 2023-02-21 ### Major Changes - [#280](https://github.com/peggyjs/peggy/issues/280) Add inline examples to the documentation, from @hildjj -- [#240](https://github.com/peggyjs/peggy/issues/240) Generate SourceNodes for bytecode -- [#338](https://github.com/peggyjs/peggy/pull/338) BREAKING CHANGE. Update dependencies, causing minimum supported version of node.js to move to 14. Generated grammar source should still work on older node versions and some older browsers, but testing is currently manual for those. -- [#291]: Add support for repetition operator `expression|min .. max, delimiter|`, from @Mingun - -Important information for plug-ins' authors: PR [#291] added 4 new opcodes to the bytecode: +- [#240](https://github.com/peggyjs/peggy/issues/240) Generate SourceNodes for + bytecode, from @hildjj +- [#338](https://github.com/peggyjs/peggy/pull/338) BREAKING CHANGE. Update + dependencies, causing minimum supported version of node.js to move to 14. + Generated grammar source should still work on older node versions and some + older browsers, but testing is currently manual for those. from @hildjj +- [#291](https://github.com/peggyjs/peggy/pull/291): Add support for + repetition operator `expression|min .. max, delimiter|`, from @Mingun +- [#339](https://github.com/peggyjs/peggy/pull/339): BREAKING CHANGE. Updated + the list of JavaScript reserved words. This will break existing grammars + that use any of the new words in their rule or label names. from @hildjj + +Important information for plug-in authors: PR [#291] added 4 new opcodes to the bytecode: - `IF_LT` - `IF_GE` - `IF_LT_DYNAMIC` - `IF_GE_DYNAMIC` -and added a new AST node and a visitor method `repeated`. Do not forgot to update your plug-ins. - -[#291]: https://github.com/peggyjs/peggy/pull/291 +and added a new AST node and a visitor method `repeated`. Do not forget to update your plug-ins. + +Important information for grammar authors: the following words, which used to +be valid identifiers for rules and labels, are now treated as JavaScript +reserved words, and will cause errors at compile time if you are using them: + +- abstract +- arguments +- as +- async +- boolean +- byte +- char +- double +- eval +- final +- float +- from +- get +- goto +- int +- long +- native +- of +- set +- short +- synchronized +- throws +- transient +- volatile ### Minor Changes -- [#274](https://github.com/peggyjs/peggy/issues/274) Use commander's new - `.conflicts()` to check for mutually-exclusive CLI options, from @hildjj - [#274](https://github.com/peggyjs/peggy/issues/274) `"*"` is now a valid `allowedStartRule`, which means all rules are allowed, from @hildjj - [#229](https://github.com/peggyjs/peggy/issues/229) new CLI option `-S ` or `--start-rule ` to specify the start rule when testing, @@ -47,12 +80,17 @@ and added a new AST node and a visitor method `repeated`. Do not forgot to updat (can be useful for plugin writers), from @Mingun - [#294](https://github.com/peggyjs/peggy/pull/294) Website: show errors in the editors, from @Mingun +- [#297](https://github.com/peggyjs/peggy/pull/297) Website: add Discord widget, + from @hildjj - [#299](https://github.com/peggyjs/peggy/issues/299) Add example grammar for a [SemVer.org](https://semver.org) semantic version string, from @dselman - [[#307](https://github.com/peggyjs/peggy/issues/307)] Allow grammars to have relative offsets into their source files (e.g. if embedded in another doc), from @hildjj. -- [#308](https://github.com/peggyjs/peggy/pull/308) Add support for reading test data from stdin using `-T -`, from @hildjj. +- [#308](https://github.com/peggyjs/peggy/pull/308) Add support for reading test + data from stdin using `-T -`, from @hildjj. +- [#313](https://github.com/peggyjs/peggy/pull/313) Create the website using + eleventy, from @camcherry ### Bug Fixes diff --git a/bin/peggy-cli.js b/bin/peggy-cli.js index 7234ddf3..5f990675 100644 --- a/bin/peggy-cli.js +++ b/bin/peggy-cli.js @@ -111,6 +111,14 @@ class PeggyCLI extends Command { .default([], "the first rule in the grammar") .argParser(commaArg) ) + .addOption( + new Option( + "--ast", + "Output a grammar AST instead of a parser code" + ) + .default(false) + .conflicts(["test", "testFile", "sourceMap"]) + ) .option( "--cache", "Make generated parser cache results", @@ -172,14 +180,6 @@ class PeggyCLI extends Command { "-m, --source-map [mapfile]", "Generate a source map. If name is not specified, the source map will be named \".map\" if input is a file and \"source.map\" if input is a standard input. If the special filename `inline` is given, the sourcemap will be embedded in the output file as a data URI. If the filename is prefixed with `hidden:`, no mapping URL will be included so that the mapping can be specified with an HTTP SourceMap: header. This option conflicts with the `-t/--test` and `-T/--test-file` options unless `-o/--output` is also specified" ) - .addOption( - new Option( - "--ast", - "Output a grammar AST instead of a parser code" - ) - .default(false) - .conflicts(["test", "testFile", "sourceMap"]) - ) .option( "-S, --start-rule ", "When testing, use the given rule as the start rule. If this rule is not in the allowed start rules, it will be added." diff --git a/docs/_includes/components/footer.html b/docs/_includes/components/footer.html index af79c390..bc39bb0d 100644 --- a/docs/_includes/components/footer.html +++ b/docs/_includes/components/footer.html @@ -1,8 +1,8 @@ \ No newline at end of file + diff --git a/docs/documentation.html b/docs/documentation.html index aea207af..65123d5a 100644 --- a/docs/documentation.html +++ b/docs/documentation.html @@ -205,9 +205,10 @@

Command Line

--extra-options-file <file>, you will need to ensure you are using the correct types. In particular, you may specify "plugin" as a string, or "plugins" as an array of objects that have a use -method. Always use the long (two-dash) form of the option. Options that -contain dashes should be specified in camel case. You may also specify an -"input" field instead of using the command line. For example: +method. Always use the long (two-dash) form of the option, without the +dashes, as the key. Options that contain internal dashes should be specified +in camel case. You may also specify an "input" field instead of using the +command line. For example:

// config.js or config.cjs
@@ -223,18 +224,22 @@ 

Command Line

-You can test generated parser immediately if you specify the -t/--test or -T/--test-file -option. This option conflicts with the option -m/--source-map unless -o/--output is -also specified. This option conflicts with the --ast option. +You can test generated parser immediately if you specify the +-t/--test or -T/--test-file +option. This option conflicts with the +--ast option, and also conflicts with the +-m/--source-map option unless -o/--output is also +specified.

The CLI will exit with the code:

    -
  • 0 if all was success
  • -
  • 1 if you supply incorrect or conflicting parameters
  • -
  • 2 if all parameters is correct, you specify the -t/--test or -T/--test-file option -and specified input does not parsed with the specified grammar
  • +
  • 0: if successful
  • +
  • 1: if you supply incorrect or conflicting parameters
  • +
  • 2: if you specified the +-t/--test or -T/--test-file option and the specified +input fails parsing with the specified grammar

Examples:

@@ -280,9 +285,10 @@

JavaScript API

import * as peggy from "peggy";
-

For use in browsers, include the Peggy library in your web page or application using -the <script> tag. If Peggy detects an AMD loader, it will -define itself as a module, otherwise the API will be available in the +

For use in browsers, include the Peggy library in your web page or +application using the <script> tag. If Peggy detects an AMD loader, it will define +itself as a module, otherwise the API will be available in the peg global object.

To generate a parser, call the peggy.generate method and pass your @@ -311,7 +317,7 @@

JavaScript API

false).
dependencies
-
Parser dependencies, the value is an object which maps variables used to +
Parser dependencies. The value is an object which maps variables used to access the dependencies in the parser to module IDs used to load them; valid only when format is set to "amd", "commonjs", "es", or "umd". @@ -340,11 +346,13 @@

JavaScript API

grammarSource
-
any object that represent origin of the input grammar. The CLI will set +
Any object that represent origin of the input grammar. The CLI will convert it to a string with the path to the file; parsers in network applications might use the socket and so on. The supplied object will be available at key source in the location objects, that returned by the -location() API function (default: undefined).
+location() API function (default: undefined). It is +recommended that if you do not use a string, the object you supply has a +useful toString() implementation.
info
Callback for informational messages. See Error Reporting
@@ -442,7 +450,13 @@

Using the Parser

Name of the rule to start parsing from.
tracer
-
Tracer to use.
+
+ Tracer to use. A tracer is an object containing a trace() function. + trace() takes a single parameter which is an object containing + "type" ("rule.enter", "rule.fail", "rule.match"), "rule" (the rule name as a + string), "location", and, if the type is + "rule.match", "result" (what the rule returned). +
... (any others)
Made available in the options variable
@@ -451,16 +465,16 @@

Using the Parser

As you can see above, parsers can also support their own custom options. For example:

const parser = peggy.generate(`
-{
-// options are available in the per-parse initializer
-console.log(options.validWords);  // outputs "[ 'boo', 'baz', 'boop' ]"
-}
+  {
+  // options are available in the per-parse initializer
+  console.log(options.validWords);  // outputs "[ 'boo', 'baz', 'boop' ]"
+  }
 
-validWord = @word:$[a-z]+ &{ return options.validWords.includes(word) }
+  validWord = @word:$[a-z]+ &{ return options.validWords.includes(word) }
 `);
 
 const result = parser.parse("boo", {
-validWords: [ "boo", "baz", "boop" ]
+  validWords: [ "boo", "baz", "boop" ]
 });
 
 console.log(result);  // outputs "boo"
@@ -477,22 +491,22 @@ 

Grammar Syntax and Semantics

values.

start
-= additive
+  = additive
 
 additive
-= left:multiplicative "+" right:additive { return left + right; }
-/ multiplicative
+  = left:multiplicative "+" right:additive { return left + right; }
+  / multiplicative
 
 multiplicative
-= left:primary "*" right:multiplicative { return left * right; }
-/ primary
+  = left:primary "*" right:multiplicative { return left * right; }
+  / primary
 
 primary
-= integer
-/ "(" additive:additive ")" { return additive; }
+  = integer
+  / "(" additive:additive ")" { return additive; }
 
-integer "integer"
-= digits:[0-9]+ { return parseInt(digits.join(""), 10); }
+integer "simple number" + = digits:[0-9]+ { return parseInt(digits.join(""), 10); }

On the top level, the grammar consists of rules (in our example, there are five of them). Each rule has a name (e.g. @@ -533,34 +547,34 @@

Grammar Syntax and Semantics

initializer and a per-parse initializer:

{{'{{'}}
-function makeInteger(o) {
-return parseInt(o.join(""), 10);
-}
+  function makeInteger(o) {
+    return parseInt(o.join(""), 10);
+  }
 }}
 
 {
-if (options.multiplier) {
-input = "(" + input + ")*(" + options.multiplier + ")";
-}
+  if (options.multiplier) {
+    input = `(${input})*(${options.multiplier})`;
+  }
 }
 
 start
-= additive
+  = additive
 
 additive
-= left:multiplicative "+" right:additive { return left + right; }
-/ multiplicative
+  = left:multiplicative "+" right:additive { return left + right; }
+  / multiplicative
 
 multiplicative
-= left:primary "*" right:multiplicative { return left * right; }
-/ primary
+  = left:primary "*" right:multiplicative { return left * right; }
+  / primary
 
 primary
-= integer
-/ "(" additive:additive ")" { return additive; }
+  = integer
+  / "(" additive:additive ")" { return additive; }
 
-integer "integer"
-= digits:[0-9]+ { return makeInteger(digits); }
+integer "simple number" + = digits:[0-9]+ { return makeInteger(digits); }

The parsing expressions of the rules are used to match the input text to the grammar. There are various types of expressions — matching characters or @@ -587,8 +601,8 @@

Grammar Syntax and Semantics

One special case of parser expression is a parser action — a piece of JavaScript code inside curly braces (“{” and “}”) that takes match -results of some of the the preceding expressions and returns a JavaScript value. -This value is considered match result of the preceding expression (in other +results of the preceding expression and returns a JavaScript value. +This value is then considered match result of the preceding expression (in other words, the parser action is a match result transformer).

In our arithmetics example, there are many parser actions. Consider the @@ -700,7 +714,7 @@

Parsing Expressio
rule
-

Match a parsing expression of a rule recursively and return its match +

Match a parsing expression of a rule (perhaps recursively) and return its match result.

@@ -798,13 +812,14 @@

Parsing Expressio Hence

    -
  • expression |..| is an equivalent of expression |0..| +
  • expression |..| is equivalent to expression |0..| and expression *
  • -
  • expression |1..| is an equivalent of expression +
  • +
  • expression |1..| is equivalent to expression +
-

Optionally, delimiter expression can be specified. Delimiter must appear - between expressions exactly once and it is not included in the final array.

+

Optionally, delimiter expression can be specified. The + delimiter is a separate parser expression, its match results are ignored, + and it must appear between matched expressions exactly once.

count, min and max can be represented as:

@@ -989,10 +1004,11 @@

Parsing Expressio
label : expression
-

Match the expression and remember its match result under given label. -The label must be a JavaScript identifier, but not in the list of reserved words. -By default this is a list of JavaScript reserved words, -but plugins can change it.

+

Match the expression and remember its match result under given label. The +label must be a JavaScript identifier, which includes not being in the list of +reserved words. By default this is a list of JavaScript +reserved words, but plugins can change it.

Labeled expressions are useful together with actions, where saved match results can be accessed by action's JavaScript code.

@@ -1017,7 +1033,7 @@

Parsing Expressio

Match the expression and if the label exists, remember its match result under given label. The label must be a JavaScript identifier if it exists, but not in the list of reserved words. -By default this is a list of JavaScript reserved words, +By default this is a list of JavaScript reserved words, but plugins can change it.

Return the value of this expression from the rule, or "pluck" it. You @@ -1181,38 +1197,51 @@

Parsing Lists

One of the most frequent questions about Peggy grammars is how to parse a delimited list of items. The cleanest current approach is:

-
list = word|.., _ "," _|
-  word = $[a-z]i+
-  _ = [ \t]*
+
list
+  = word|.., _ "," _|
+word
+  = $[a-z]i+
+_
+  = [ \t]*

If you want to allow a trailing delimiter, append it to the end of the rule:

-
list = word|.., delimiter| delimiter?
-  delimiter = _ "," _
-  word = $[a-z]i+
-  _ = [ \t]*
+
list
+  = word|.., delimiter| delimiter?
+delimiter
+  = _ "," _
+word
+  = $[a-z]i+
+_
+  = [ \t]*

In the grammars created before the repetition operator was added to the peggy (in 2.1.0) you could see that approach, which is equivalent of the new approach with the repetition operator, but less efficient on long lists:

-
list = head:word tail:(_ "," _ @word)* { return [head, ...tail]; }
-word = $[a-z]i+
-_ = [ \t]*
+
list
+  = head:word tail:(_ "," _ @word)* { return [head, ...tail]; }
+word
+  = $[a-z]i+
+_
+  = [ \t]*

Note that the @ in the tail section plucks the word out of the parentheses, NOT out of the rule itself.

Error Messages

-

As described above, you can annotate your grammar rules with human-readable names that will be used in error messages. For example, this production:

+

As described above, you can annotate your grammar rules with human-readable +names that will be used in error messages. For example, this production:

-
integer "integer"
-= digits:[0-9]+
+
integer "simple number"
+  = digits:[0-9]+

will produce an error message like:

-
Expected integer but "a" found.
+
Expected simple number but "a" found.
-

when parsing a non-number, referencing the human-readable name "integer." Without the human-readable name, Peggy instead uses a description of the character class that failed to match:

+

when parsing a non-number, referencing the human-readable name "simple +number." Without the human-readable name, Peggy instead uses a description of +the character class that failed to match:

Expected [0-9] but "a" found.
@@ -1245,46 +1274,46 @@

Error Messages

let source = ...;
 try {
-peggy.generate(text, { grammarSource: source, ... }); // throws SyntaxError or GrammarError
-parser.parse(input, { grammarSource: source2, ... }); // throws SyntaxError
+  peggy.generate(text, { grammarSource: source, ... }); // throws SyntaxError or GrammarError
+  parser.parse(input, { grammarSource: source2, ... }); // throws SyntaxError
 } catch (e) {
-if (typeof e.format === "function") {
-console.log(e.format([
-{ source, text },
-{ source: source2, text: input },
-...
-]));
-} else {
-throw e;
-}
+  if (typeof e.format === "function") {
+    console.log(e.format([
+      { source, text },
+      { source: source2, text: input },
+      ...
+    ]));
+  } else {
+    throw e;
+  }
 }

Messages generated by format() look like this

Error: Possible infinite loop when parsing (left recursion: start -> proxy -> end -> start)
 --> .\recursion.pegjs:1:1
-|
+  |
 1 | start = proxy;
-| ^^^^^
+  | ^^^^^
 note: Step 1: call of the rule "proxy" without input consumption
 --> .\recursion.pegjs:1:9
-|
+  |
 1 | start = proxy;
-|         ^^^^^
+  |         ^^^^^
 note: Step 2: call of the rule "end" without input consumption
 --> .\recursion.pegjs:2:11
-|
+  |
 2 | proxy = a:end { return a; };
-|           ^^^
+  |           ^^^
 note: Step 3: call itself without input consumption - left recursion
 --> .\recursion.pegjs:3:8
-|
+  |
 3 | end = !start
-|        ^^^^^
+ | ^^^^^

A plugin may register additional passes that can generate GrammarErrors to report about problems, but they shouldn't do that by throwing an instance of GrammarError. They should -use a session API instead.

+use the session API instead.

Locations

@@ -1293,9 +1322,9 @@

Locations

information by calling location() function, which returns you the following object:

{
-source: options.grammarSource,
-start: { offset: 23, line: 5, column: 6 },
-end: { offset: 25, line: 5, column: 8 }
+  source: options.grammarSource,
+  start: { offset: 23, line: 5, column: 6 },
+  end: { offset: 25, line: 5, column: 8 }
 }
 
@@ -1325,9 +1354,9 @@

Locations

For the per-parse initializer, the location is the start of the input, i.e.

{
-source: options.grammarSource,
-start: { offset: 0, line: 1, column: 1 },
-end: { offset: 0, line: 1, column: 1 }
+  source: options.grammarSource,
+  start: { offset: 0, line: 1, column: 1 },
+  end: { offset: 0, line: 1, column: 1 }
 }
 
@@ -1338,21 +1367,22 @@

Locations

the input.

Line and column are somewhat expensive to compute, so if you just need the -offset, there's also a function offset() that returns just the start offset, -and a function range() that returns the object:

+offset, there's also a function offset() that returns just the +start offset, and a function range() that returns the object:

-

-{
-source: options.grammarSource,
-start: 23,
-end: 25
-}
-
+
{
+  source: options.grammarSource,
+  start: 23,
+  end: 25
+}
-

(i.e. difference from the location() result only in type of start and end -properties, which contain just an offset instead of the Location object.)

+

(i.e. difference from the location() result only in type of +start and end properties, which contain just an +offset instead of the Location +object.)

-

All notes about values for location() object is also applicable to the range() +

All of the notes about values for location() object are also +applicable to the range() and offset() calls.

Currently, Peggy only works with the Basic Multilingual Plane (BMP) of Unicode. @@ -1360,13 +1390,17 @@

Locations

try to parse characters outside this Plane (for example, emoji, or any surrogate pairs), you may get an offset inside a code point.

-

Changing this behavior may be a breaking change and will not to be done before -Peggy 2.0. You can join to the discussion for this topic on the GitHub Discussions page.

+

Changing this behavior might be a breaking change, so it will likely cause +a major version number increase if it happens. You can join to the discussion +for this topic on the GitHub Discussions +page.

Plugins API

-

A plugin is an object with the use(config, options) method. That method will be -called for all plugins in the options.plugins array, supplied to the generate() +

A plugin is an object with the use(config, options) method. +That method will be called for all plugins in the options.plugins +array, supplied to the generate() method.

use accepts these parameters:

@@ -1392,8 +1426,8 @@

config

  • generate — passes used for actual code generating
  • -

    A plugin that implement a pass usually should push it to the end of the correct -array. Pass is a simple function with signature pass(ast, options, session):

    +

    A plugin that implements a pass should usually push it to the end of the correct +array. Each pass is a function with the signature pass(ast, options, session):

    • ast — the AST created by the config.parser.parse() method
    • @@ -1411,7 +1445,7 @@

      config

      label names. This list can be modified by plugins. That property is not required to be sorted or not contain duplicates, but it is recommend to remove duplicates.

      -

      Default list contains JavaScript reserved words, and can be found +

      Default list contains JavaScript reserved words, and can be found in the peggy.RESERVED_WORDS property.

    @@ -1424,7 +1458,7 @@

    options

    Session API

    Each compilation request is represented by a Session instance. An object of this class -is created by the compiler and passed to an each pass as a 3rd parameter. The session +is created by the compiler and given to each pass as a 3rd parameter. The session object gives access to the various compiler services. At the present time there is only one such service: reporting of diagnostics.

    @@ -1480,16 +1514,17 @@

    Compatibility

    following environments:

      -
    • Node.js 12+
    • -
    • Internet Explorer 9+
    • -
    • Edge
    • -
    • Firefox
    • -
    • Chrome
    • -
    • Safari
    • -
    • Opera
    • +
    • Node.js 14+
    • +
    • Edge
    • +
    • Firefox
    • +
    • Chrome
    • +
    • Safari
    • +
    • Opera
    - +The generated parser is intended to run in older environments when the format +chosen is "globals" or "umd". Extensive testing is NOT performed in these +environments, but issues filed regarding the generated code will be fixed.