-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Perl] Feature Update 2019 #2048
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit fixes two issues with numeric constants. 1) Some of the constants were not scoped correctly if directly followed by a `+` operator, which was caused by the `+` in the `(?![\.\w+])` lookahead, which was probably added by accident. Investigations revealed Perl correctly detecting all numbers even if followed by `.`. It then just handles that dot as concat operator. Example: `print 0x1.0x2` will print out 12 2) It's barely possible to distinguish a negative number from a subtraction operation. Therefore the leading number sign matching `[-+]?` was removed from unquoted constants. All `-` or `+` operators in front of or after numbers are now scoped as `keyword.operator` and do no longer break highlighting.
This commit scopes integers starting with `0` and containing digits from 0 to 7 as `constant.numeric.integer.octal.perl`.
Perl accepts all kinds of incomplete floating point numbers. This commit therefore scopes all of them properly. This commit doesn't split the `match` operations for performance reasons. The current implementation was benchmarked the fastest one from various alternatives tested against 10k lines of quoted and unquoted floating point numbers. Note: Need to make sure not to brake the range operator `..`.
Languages like Go or Python scope the decimal separator already. With the changes made in the previous commit doing so is too easy to ignore it.
Languages like D scope the leading `0b` or `0x` of binary/hexadecimal constants as `punctuation.definition.numeric`. This commit adds the according rules to do so for Perl as well for consistency reasons with other languages.
Quoted numbers are interpreted as decimal (float or int) only. Octal numbers as binary and hexadecimal must be unquoted. Example: print "030" + "10"; # prints 40
As the both removed contexts are not used separately at any point they are merged into the `constants-numbers` context. The rules are sorted by number bases: bin/oct/hex/dec
Merging the numbers contexts revealed some missing boundary checks. Even though they are considered not needed they are added for safety and consistency reasons to prevent possible edge cases.
Up to this commit all `${...}` or `%{...}` tokens are scoped as `variable.other` while they contain ordinary expressions. This commit... 1. replaces the scope by a `meta.variable` to avoid parts of the expressions being highlighted in a wrong way. 2. adds the `regexp-pop` context right after the opening punctuation in order to fix an issue with `/patter/flags` not being detected. Note: This change is part of tackling issue sublimehq#2017
According to the test cases of ModernPerl 01.0 is a decimal float, but was treated as octal due to the current order of the matches. This commit resorts the rules to fix it.
Case doesn't matter, when parsing 0x, 0b, or exponent e. So 0X, 0B or E are valid as well.
Perl allows all digits in a number to be replaced by `_`.
Issue: Up to this commit the first `<` after a `$var++` term is scoped as the beginning of an angle quoted string as they have been considered valid after each operator or punctuation. The following example illustrates how this edge case breaks the highlighting of normal comparisons. Example: while ($var++ < 50) {} ^^^^^^^^ string !! invalid match Solution: 1. This commits creates a new `expression-begin` context to replace the `regexp-pop` context whose name didn't express its meaning precisely anymore since angle quoted matching was added. The new context is pushed everywhere but after an `++` or `--` operator, 2. The lookahead in the new `string-quoted-angle-pop` is extended to match an angular string only, if ... a) the beginning does not conflict with other operators like `<<` or `<=>`. b) the angular string is terminated by `>` at the same line.
A fully qualified identifier must not contain whitespace. Hence the accessor `::` must not be surrounded by them as well. It was supported up to this point in order to improve the writing experience by not breaking identifiers during writing. This will cause conflicts with parsing/scoping fully qualified identifiers in future changes an therefore needs to be restricted. The test cases containing whitespace surrounded accessors are removed for the moment as properly handling such situations is part of a future change.
This commit 1. adds missing declaration or operator keywords to the list of reserved words. This list is used to ensure not to match keywords in the wrong place or to avoid matching them as anything else. Therefore it should be quite complete. 2. Don't care about `::` accessor, when doing general checks against the list of reserved words. Hence replace `{{break}}` by `\b`.
This commit adds a dedicated context for package declarations. The reasons are: 1) correctly apply scope names according to the guidelines to the statement: - adds `meta.namespace.perl` to the whole statement - adds `meta.path.perl` to the fully qualified namespace identifier - scopes the identifiers as `entity.name.namespace` 2) the way fully qualified identifiers (variables & functions) are to be matched in future changes would otherwise break the package declaration statements' syntax highlighting. 3) no variables nor other expressions are allowed and thus scoped invalid. Note: The order of the NAMESPACE and VERSION is not enforced. Wrong order won't be scoped invalid at the moment.
This commit adds the `require` context for package imports. The reasons are: 1) correctly apply scope names according to the guidelines to the statement: - adds `meta.import.require.perl` to the whole statement - adds `meta.path.perl` to the fully qualified namespace identifier - scopes the identifiers as `entity.name.namespace` 2) the way fully qualified identifiers (variables & functions) are to be matched in future changes would otherwise break the "require statements" syntax highlighting. 3) no variables nor other expressions are allowed and thus scoped invalid. Leading `::` accessors is invalid, too.
This commit refactors the subroutine definition statement to... 1) support fully qualified identifiers Example: sub NS1::NS2::NS3::function { } A real life example can be found in <..>/core_perl/B/Debug.pm Before this commit fully qualified subroutine identifiers are scoped `invalid.illegal`. This is fixed by this commit. 2) add proper highlighting of code attributes. Example: sub name :attribute(attrargs) ($) { } Code attributes must be defined in the `BEGIN { }` preprocessor sub routine. They work pretty much like decorators in python. 3) fix prototype parameters. The parameter list of a sub may only contain the types of the arguments, but no names. The parameters are separated by `;` Example: sub name ($ ; @ ; % ; $$) {} 4) remove the function block from `meta.function` scope. The way blocks need to be handled due to HEREDOCs in general breaks the `meta.function` block anyway. It would either pop off at the first closing brace or maybe even never. That's why including it is useless anyway. 5) move the `sub-...` contexts upwards in the syntax definition in order to group all coming complex statement definitions at the top of the file, while keep more atomic contexts at the bottom. 6) refactor the test cases (leading indention). Note: The namespace part is currently scoped as `support.class`, because a) this is the scope being used in other situations as well b) the scope naming guidelines don't yet suggest a scope for that part. - C# uses `variable.other.namespace`. - The rewritten Erlang introduces `variable.namespace` because it fits best to the `entity.name.namespace` which is to be used for namespace definitions. - The proposed general `variable.qualifier` would probably the best most general alternative as it is hard/impossible to distinguish namespace from class access. c) All `support.class` scopes should be renamed in one commit later.
This commit moves the variable declaration keywords to the `control` context because 1) its remaining content is not worth a context 2) they are valid wherever control keywords are valid.
Group the managing contexts at the top of the file.
Issue: If a HEREDOC is used as argument in a function call, the arguments context never pops off. Hence `meta.function-call.arguments` keeps on stack forever. Example: function_name(<<" HEREDOC"); This is the HEREDOCs value which is passed to the function HEREDOC Solution: This commit removes any `meta.function-call` scopes and all the related contexts in favor of properly highlighting HEREDOC like arguments in functions. As the function call can be nested and followed by arbitrary perl expressions up to the end of line, no better solution was found with the existing features of ST's lexer.
(I) Function calls without parentheses Perl Documentation says: Encapsulating function arguments into parentheses is optional, if the expression is clearly identified as function-call. This statement applies to calls of sub routines which are already known to the interpreter by defining them at the beginning of the script or by importing them via `use ...` statement from other modules. This commit adds some heuristics to identify such kinds of function calls even though we don't really know about the validity of the identifier. A function is clearly identified if the `identifier` is followed by - comment - end of line - end of expression (closing bracket) - end of statement (semi-colon) - HEREDOC - quoted string - regular expression - variable - word but no operator (II) Each variable or function identifier can contain a qualified path. Example: $NS1::NS2::variable NS1::NS2::function The `main` namespace is shortaned by leading `::` $::variable ::function This commit applies correct scopes for all qualified identifiers. (III) Qualifiers or identifiers can consist of only capital letters. In order to distinguish them from global constants this commit adds the following rules: 1) Prefer scoping the file handles like STDERR, STDOUT, etc. as constants even though they are used as function or look like a class or namespace. Even though these tokens don't have special meaning to the Perl interpreter they are considered for certain use by the standard library. 2) If a user defined constant looks like a function it is scoped as such. In other words tokens with capital letters only are scoped as constant only, if they don't look like a namespace, class, object or function. (IV) Object member functions According to https://perldoc.perl.org/perlobj.html an object in Perl is nothing else than a normal data structure (array, hash, ...) which is bound to a class. The `->` accessor always indicates a member function call, if followed by an identifier. Attributes are accessed via getter/setter methods in modern Perl code. Example: $obj->method $obj->getter $obj->setter <value> A subroutine always returns a reference to an object. Therefore the following is invalid: $obj->method{key} Another `->` accessor is needed to access the item of the returned hash by `method`: $obj->method->{key} Otherwise the `->` could also be used to directly access data items in the underlying hash of an object (which is discouraged). Example: $obj->{attribute} Accessing members of a nested object look like: $obj->{attribute}->method Note: Object members like `->new()` in an expression like `Class->new()` are not yet part of meta.path as they must not be part of a string interpolation. The `->` accessor scope is renamed to `punctuation.accessor.arrow` to comply with scope naming guidelines and give it the same color as the `::` operator which is used to access classes.
All reserved words like `if`, `else`, `sub`, ... are defined in the `CORE` namespace. Thus prefixing them with `CORE::` is valid syntax. This commit therefore adds a simple context, which prevents such keywords from being highlighted as ordinary function. Note: For simplicity reasons no `meta.path` is added at the moment as it would require to add a match for each qualified keyword. As a result the `CORE::` is not included in any other meta scope like `meta.function` etc.
This commit improves the overall variables matching: 1. add/fix regexp match group variables `$+`, `$-`,`%+`, `%-`,`@+`, `@-` 2. fix predefined variables pattern (replace `^` by `\^`) 3. clean up some character classes (remove escapes) 4. regroup/resort the rules 5. add special scopes for builtin variables 6. add builtin variables from English.pm
This commit adds the reference operator `\` as described at https://perldoc.perl.org/perlref.html#Making-References
This commit adds the dereference operators as described at https://perldoc.perl.org/perlref.html#Making-References Note: Perl calls the variable prefixes `type keywords`. This commit distinguishes between the variable prefix, which is the most right leading character in front of the identifier. All other type prefixes, which can be added to the left in order to perform dereferencing are scoped as `keyword.operator` in order to make sure all kinds of variables are scoped correctly.
This commit adds string interpolation to the following expressions s/<pattern>/<replacement with interpolation>/<flags>; tr/<pattern>/<replacement with interpolation>/<flags>; y/<pattern>/<replacement with interpolation>/<flags>; This change also fixes an issue with broken highlighting if the expression spans multiple lines. s/<pattern>/ <replacement with interpolation> /<flags>; or s{ <pattern> } [<replacement with interpolation>]<flags>; Notes: 1) The context `quote-like-replace` is renamed to `quoted-like-replace` as this new name reflects Perl terminology more accurately. 2) The context `quoted-like-args-find-rexexp` is renamed to `quoted-like-args-pattern` as this is more general name and matches the changes of the previous commit.
This commit adds interpolation support to comments in the same way as for strings in order to properly support the fenced code blocks. All comments are scoped with `meta.comment.perl`. The `comment` scope is cleared from stack within a fenced code block.
Nearly any character can be used as delimiter in quoted-like expressions. All the following expressions are equal: s/<pattern>/<repl>/<flags> s|<pattern>|<repl>|<flags> s#<pattern>#<repl>#<flags> s@<pattern>@<repl>@<flags> The usage depends on the characters being used in the <pattern> or <repl>. If `/` is used in must not be part of them. Perl however accepts escapes like `\/`. Theoretically a <pattern> can end with an arbitrary number of `\`, which makes creating a robust `escape` pattern complicated or even impossible. Before this commit the escape patterns check for the absence of a single `\` before the delimiter candidate `/` only. The check is extended to `\` and `\\\` in order to allow windows style path names ending with an escaped backslash. This is not a perfect solution, but should help in some rare edge cases.
This commit addresses interpolated strings and quoted like functions not to be correctly popped off from, if the interpolated string ends with a variable punctuation. Examples: a) "$repl$" ^^ no variable! b) s/pattern/$repl$/g; ^^ no variable! Example (a) results in a syntax error as the Perl interpreter doesn't know whether to handle the `%"` as variable or not. This commit does not introduce an `illegal` scope for it, though. It just ensures not to break the string boundaries. Example (b) shows quoted like functions, which Perl tokenizes by the first character after the function identifier (here `/`) first before it starts parsing the strings. The delimiters are therefore matched with higher priority. This commit adapts this behavior by 1) consuming the variable punctuation `[$@%&*]#?` in front of the delimiter (or closing bracket). We need it because, $/, $), $], etc. are valid built-in variables. 2) modifying the `variables-interpolation` context in order to make sure to pop off from a `meta.interpolation` after each variable. Otherwise 1) wouldn't work. In order to prevent sophisticated (and error prone) lookaheads for each ordinary variable, the existing `qualified-variables` and `unqualified-variables` contexts are merged for that purpose. The resulting context pops off, while the original ones don't.
This commit 1. adds the `literal-common` and `interpolated-common` contexts to group common content for interpolated and raw strings. 2. adds rules to scope C-style FORMAT placeholders as being used by printf/sprintf, etc.. See: https://perldoc.perl.org/functions/sprintf.html The pattern to match the FORMAT is designed quite restrictive in order to prevent interference with variable interpolation. Perl uses `%` for some types of variables and the FORMAT is to be supported in any kind of quoted string as it might be the format string is build dynamically. Example: $format = "%s: %d"; sprintf $format, $var1, $integer; Note: Perl supports variable interpolation even in FORMAT patterns. Something like %0${varname}X is valid. This commit does not support such constructs as it is nearly impossible to distinguish between FORMAT patterns and hashs - both start with `%`.
This commit applies all changes about qualified functions and variables to the `format` statement.
Tab indention is deprecated in Perl, but many older library functions still use it. Some color schemes highlight the whole whitespace block. It just sucks.
This commit adds another HEREDOC test case to ensure not to break highlighting due to pushing into contexts when matching braces.
A code block can contain anything.
Using string.unquoted as hash key makes it hard to distinguish interpolated item access (1) from normal string content (2) because only the braces are highlighted differently. Example: "string $hash{key} string" ^^^^^^^^^^ meta.interpolation "string $hash->{key} string" ^^^^^ meta.interpolation This commit changes the `key` scope to `constant.other.key` as it is already used for defining hashs. Example: %hash = ( key => "value", key2 => "value2") This commit makes sure to highlight valid keys when defining hashs, only.
This commit renames `support.class.perl` to `variable.namespace.perl` as more general identifier for the path parts of a qualifier. The top-level scope `support.` is not sufficient as it is meant for use with built-in entities only. Any namespace can be user defined though. The `variable.namespace` was choosen as counterpart of `entity.name.namespace` which is used to define a namespace.
This commit renames the `keyword.declaration.variable` scope to `storage.type.variable`. This step seems consequent because all the other declaration/definition keywords were renamed according to the scope naming guidelines. `sub` => storage.type.function `package` => storage.type.namespace So we have now: `my`, `local`, `state`, `our` => storage.type.variable.
Single words within expressions are most likely to be constants. They can be functions if defined by an import (use) or sub before, but we can't distinguish that. Example: if (constant) ^^^^^^^^ constant or function possible. Uppercase only identifiers are already scoped as constants, if they don't look like a namespace or function call. This commit adds the `constant-identifier` context to scope all identifiers as constants, which were not otherwise matched. This is only guesswork but the only chance we have to hopefully scope everything correctly in most situations.
Most syntaxes use the newer name.
The documentation of Perl 5.30.0 moved to https transport.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, great! This fixes all the things I wanted and many more.
Makes the functions to be added to the index. Note: The meta.function-call was removed as it used to cover the whole arguments list, which doesn't work properly if HEREDOCS are passed to a function.
This commit renames a scope in the function reference test cases, which used to exist during development only.
This commit ... 1) splits adds logical operators into - `keyword.operator.comparison.perl` - `keyword.operator.logical.perl` 2) assigns `=>` the `punctuation.separator.key-value.perl` scope. Inspired by C# and Ruby.
This commit introduces fixes for the following issues: 1) Whitspace between the `<<` and the tag name in unquoted HEREDOCS (e.g.: `<< TAG`) is not allowed. 2) The type of whitespace in quoted HEREDOC tags doesn't matter. Both space and tabs are allowed.
Applies heredoc scopes according to sublimehq#2073.
mitranim
pushed a commit
to mitranim/Packages
that referenced
this pull request
Mar 25, 2022
* [Perl] Fix numeric constant boundaries This commit fixes two issues with numeric constants. 1) Some of the constants were not scoped correctly if directly followed by a `+` operator, which was caused by the `+` in the `(?![\.\w+])` lookahead, which was probably added by accident. Investigations revealed Perl correctly detecting all numbers even if followed by `.`. It then just handles that dot as concat operator. Example: `print 0x1.0x2` will print out 12 2) It's barely possible to distinguish a negative number from a subtraction operation. Therefore the leading number sign matching `[-+]?` was removed from unquoted constants. All `-` or `+` operators in front of or after numbers are now scoped as `keyword.operator` and do no longer break highlighting. * [Perl] Add scope for octal numbers This commit scopes integers starting with `0` and containing digits from 0 to 7 as `constant.numeric.integer.octal.perl`. * [Perl] Fix incomplete floating points Perl accepts all kinds of incomplete floating point numbers. This commit therefore scopes all of them properly. This commit doesn't split the `match` operations for performance reasons. The current implementation was benchmarked the fastest one from various alternatives tested against 10k lines of quoted and unquoted floating point numbers. Note: Need to make sure not to brake the range operator `..`. * [Perl] Add scope for decimal separator Languages like Go or Python scope the decimal separator already. With the changes made in the previous commit doing so is too easy to ignore it. * [Perl] Add scope for bin/hex/oct punctuation Languages like D scope the leading `0b` or `0x` of binary/hexadecimal constants as `punctuation.definition.numeric`. This commit adds the according rules to do so for Perl as well for consistency reasons with other languages. * [Perl] Fix unsupported quoted octal numbers Quoted numbers are interpreted as decimal (float or int) only. Octal numbers as binary and hexadecimal must be unquoted. Example: print "030" + "10"; # prints 40 * [Perl] Merge the constant-number contexts As the both removed contexts are not used separately at any point they are merged into the `constants-numbers` context. The rules are sorted by number bases: bin/oct/hex/dec * [Perl] Fix bin/hex/float boundaries Merging the numbers contexts revealed some missing boundary checks. Even though they are considered not needed they are added for safety and consistency reasons to prevent possible edge cases. * [Perl] Fix ${...} scope names Up to this commit all `${...}` or `%{...}` tokens are scoped as `variable.other` while they contain ordinary expressions. This commit... 1. replaces the scope by a `meta.variable` to avoid parts of the expressions being highlighted in a wrong way. 2. adds the `regexp-pop` context right after the opening punctuation in order to fix an issue with `/patter/flags` not being detected. Note: This change is part of tackling issue sublimehq#2017 * [Perl] Fix floats being treated as octal According to the test cases of ModernPerl 01.0 is a decimal float, but was treated as octal due to the current order of the matches. This commit resorts the rules to fix it. * [Perl] Fix case of bin/hex/float punctuations Case doesn't matter, when parsing 0x, 0b, or exponent e. So 0X, 0B or E are valid as well. * [Perl] Fix numbers with underscore Perl allows all digits in a number to be replaced by `_`. * [Perl] Fix test case * [Perl] Extend the list of reserved words This commit 1. adds missing declaration or operator keywords to the list of reserved words. This list is used to ensure not to match keywords in the wrong place or to avoid matching them as anything else. Therefore it should be quite complete. 2. Don't care about `::` accessor, when doing general checks against the list of reserved words. Hence replace `{{break}}` by `\b`. * [Perl] Improve sub routine definitions This commit refactors the subroutine definition statement to... 1) support fully qualified identifiers Example: sub NS1::NS2::NS3::function { } A real life example can be found in <..>/core_perl/B/Debug.pm Before this commit fully qualified subroutine identifiers are scoped `invalid.illegal`. This is fixed by this commit. 2) add proper highlighting of code attributes. Example: sub name :attribute(attrargs) ($) { } Code attributes must be defined in the `BEGIN { }` preprocessor sub routine. They work pretty much like decorators in python. 3) fix prototype parameters. The parameter list of a sub may only contain the types of the arguments, but no names. The parameters are separated by `;` Example: sub name ($ ; @ ; % ; $$) {} 4) remove the function block from `meta.function` scope. The way blocks need to be handled due to HEREDOCs in general breaks the `meta.function` block anyway. It would either pop off at the first closing brace or maybe even never. That's why including it is useless anyway. 5) move the `sub-...` contexts upwards in the syntax definition in order to group all coming complex statement definitions at the top of the file, while keep more atomic contexts at the bottom. 6) refactor the test cases (leading indention). Note: The namespace part is currently scoped as `support.class`, because a) this is the scope being used in other situations as well b) the scope naming guidelines don't yet suggest a scope for that part. - C# uses `variable.other.namespace`. - The rewritten Erlang introduces `variable.namespace` because it fits best to the `entity.name.namespace` which is to be used for namespace definitions. - The proposed general `variable.qualifier` would probably the best most general alternative as it is hard/impossible to distinguish namespace from class access. c) All `support.class` scopes should be renamed in one commit later. * [Perl] Fix < operator after var++ Issue: Up to this commit the first `<` after a `$var++` term is scoped as the beginning of an angle quoted string as they have been considered valid after each operator or punctuation. The following example illustrates how this edge case breaks the highlighting of normal comparisons. Example: while ($var++ < 50) {} ^^^^^^^^ string !! invalid match Solution: 1. This commits creates a new `expression-begin` context to replace the `regexp-pop` context whose name didn't express its meaning precisely anymore since angle quoted matching was added. The new context is pushed everywhere but after an `++` or `--` operator, 2. The lookahead in the new `string-quoted-angle-pop` is extended to match an angular string only, if ... a) the beginning does not conflict with other operators like `<<` or `<=>`. b) the angular string is terminated by `>` at the same line. * [Perl] Add package declaration context This commit adds a dedicated context for package declarations. The reasons are: 1) correctly apply scope names according to the guidelines to the statement: - adds `meta.namespace.perl` to the whole statement - adds `meta.path.perl` to the fully qualified namespace identifier - scopes the identifiers as `entity.name.namespace` 2) the way fully qualified identifiers (variables & functions) are to be matched in future changes would otherwise break the package declaration statements' syntax highlighting. 3) no variables nor other expressions are allowed and thus scoped invalid. Note: The order of the NAMESPACE and VERSION is not enforced. Wrong order won't be scoped invalid at the moment. * [Perl] Restrict whitespace surrounding namespace accessors A fully qualified identifier must not contain whitespace. Hence the accessor `::` must not be surrounded by them as well. It was supported up to this point in order to improve the writing experience by not breaking identifiers during writing. This will cause conflicts with parsing/scoping fully qualified identifiers in future changes an therefore needs to be restricted. The test cases containing whitespace surrounded accessors are removed for the moment as properly handling such situations is part of a future change. * [Perl] Add package import context This commit adds the `require` context for package imports. The reasons are: 1) correctly apply scope names according to the guidelines to the statement: - adds `meta.import.require.perl` to the whole statement - adds `meta.path.perl` to the fully qualified namespace identifier - scopes the identifiers as `entity.name.namespace` 2) the way fully qualified identifiers (variables & functions) are to be matched in future changes would otherwise break the "require statements" syntax highlighting. 3) no variables nor other expressions are allowed and thus scoped invalid. Leading `::` accessors is invalid, too. * [Perl] Move declaration keywords This commit moves the variable declaration keywords to the `control` context because 1) its remaining content is not worth a context 2) they are valid wherever control keywords are valid. * [Perl] Move expressions context Group the managing contexts at the top of the file. * [Perl] Add string interpolation step 6 This commit adds string interpolation to the following expressions s/<pattern>/<replacement with interpolation>/<flags>; tr/<pattern>/<replacement with interpolation>/<flags>; y/<pattern>/<replacement with interpolation>/<flags>; This change also fixes an issue with broken highlighting if the expression spans multiple lines. s/<pattern>/ <replacement with interpolation> /<flags>; or s{ <pattern> } [<replacement with interpolation>]<flags>; Notes: 1) The context `quote-like-replace` is renamed to `quoted-like-replace` as this new name reflects Perl terminology more accurately. 2) The context `quoted-like-args-find-rexexp` is renamed to `quoted-like-args-pattern` as this is more general name and matches the changes of the previous commit. * [Perl] Add string interpolation step 1 This commit does not introduce functional changes, but prepares some scope names for the string interpolation support. 1. The scopes of language identifiers of embedded code blocks are renamed according to Markdown's fenced code blocks. a) The identifiers in PODs (documentation comments) are renamed from `string.unquoted` to `constant.other.language-name.<name>`. b) The identifiers of HEREDOCs are renamed from `constant.language.heredoc.<name>` to `constant.other.language-name.<name>`. Note: The HEREDOCs language identifiers are stacked into a `string` as Perl uses quotations to pin leading whitespace to the identifier. This is how it allows the end tag to be indented with the surrounding code. This syntax definition doesn't want to include these spaces into the `constant`. 2. The `meta.string.perl` scope is added to all kinds of strings to be able to safely clear the `string` scope within interpolations. * [Perl] Add string interpolation step 2 This commit prepares the `string-format` context for variable interpolation by the following steps: 1. The scopes of the 'picture line patterns' are renamed from `variable.parameter` to `constant.other.placeholder` as the patterns like @#.# compare to C-style format strings like `%1.1f`. 2. The scopes for `~`, `~~` and `...` are renamed from `constant.character.escape` to `constant.other.placeholder.text` as they are used as placeholders/patterns to format the content of variables the same way as the patterns of (a). 3. Fix an issue with some of the variables being partly scoped as `constant.placeholder` by refining of their match patterns. Note: These changes are required to clearly distinct between format patterns and variables, which are evaluated by perl during runtime by using the patterns. * [Perl] Add string interpolation step 5 This commit adds interpolation support to quoted-like operators, which are a functional pendant to normal quotations. q//; - single quoted -> no interpolation qq//; - double quoted -> interpolation qx//; - backtick quoted -> interpolation qw//; - split string into words -> no interpolation Notes: 1) The related contexts are renamed to reflect their meaning in a more general and accurate way. 2) According to the scope naming guideline all prefixes and punctuation are to be included into `meta.string`. Quoted-like operators like qq// are scoped as `meta.function-call` at the moment. Hence the `meta.string` does not cover the `qq` function identifier, even though it looks very similar to python's `r""` prefix style. The discussion about whether to turn `q`, `qq`, `qx`, `qw` into prefixes in order to include them into the `meta.string` should be part of another commit. The solution should play well with other functions like s///, m//, * [Perl] Improve escaping from embedded regexp Nearly any character can be used as delimiter in quoted-like expressions. All the following expressions are equal: s/<pattern>/<repl>/<flags> s|<pattern>|<repl>|<flags> s#<pattern>#<repl>#<flags> s@<pattern>@<repl>@<flags> The usage depends on the characters being used in the <pattern> or <repl>. If `/` is used in must not be part of them. Perl however accepts escapes like `\/`. Theoretically a <pattern> can end with an arbitrary number of `\`, which makes creating a robust `escape` pattern complicated or even impossible. Before this commit the escape patterns check for the absence of a single `\` before the delimiter candidate `/` only. The check is extended to `\` and `\\\` in order to allow windows style path names ending with an escaped backslash. This is not a perfect solution, but should help in some rare edge cases. * [Perl] Add string interpolation step 7 This commit adds interpolation support to comments in the same way as for strings in order to properly support the fenced code blocks. All comments are scoped with `meta.comment.perl`. The `comment` scope is cleared from stack within a fenced code block. * [Perl] Fix HEREDOC content boundaries This commit fixes 2 issues which are related with interpolation: 1. By adding the `meta.string` scope to the `string-heredoc-other` context the number of meta scopes to clear in the line after the HEREDOC tag got unbalanced. Fix: Add a dedicated `string-heredoc-expr-other` context to clear the `meta.string string.quoted.other` scopes. 2. The `string-heredoc-expr` context was terminated by `- match: $`. Hence the HEREDOC content block started at the end of the first line, which is inaccurate. Fix: Use the `^` in the pattern to make the content start at the beginning of the line after the HEREDOC tag. * [Perl] Add sprintf format placeholders This commit 1. adds the `literal-common` and `interpolated-common` contexts to group common content for interpolated and raw strings. 2. adds rules to scope C-style FORMAT placeholders as being used by printf/sprintf, etc.. See: https://perldoc.perl.org/functions/sprintf.html The pattern to match the FORMAT is designed quite restrictive in order to prevent interference with variable interpolation. Perl uses `%` for some types of variables and the FORMAT is to be supported in any kind of quoted string as it might be the format string is build dynamically. Example: $format = "%s: %d"; sprintf $format, $var1, $integer; Note: Perl supports variable interpolation even in FORMAT patterns. Something like %0${varname}X is valid. This commit does not support such constructs as it is nearly impossible to distinguish between FORMAT patterns and hashs - both start with `%`. * [Perl] Fix HEREDOC function call arguments Issue: If a HEREDOC is used as argument in a function call, the arguments context never pops off. Hence `meta.function-call.arguments` keeps on stack forever. Example: function_name(<<" HEREDOC"); This is the HEREDOCs value which is passed to the function HEREDOC Solution: This commit removes any `meta.function-call` scopes and all the related contexts in favor of properly highlighting HEREDOC like arguments in functions. As the function call can be nested and followed by arbitrary perl expressions up to the end of line, no better solution was found with the existing features of ST's lexer. * [Perl] Add string interpolation step 4 This commit adds support for string interpolation within HEREDOCs. HEREDOCs with single quoted name (e.g. <<'EOT') are not interpolated. Unquoted (<<EOT) or double quoted (<<"EOT") HEREDOCs are interpolated. * [Perl] Fix qualified functions and variables (I) Function calls without parentheses Perl Documentation says: Encapsulating function arguments into parentheses is optional, if the expression is clearly identified as function-call. This statement applies to calls of sub routines which are already known to the interpreter by defining them at the beginning of the script or by importing them via `use ...` statement from other modules. This commit adds some heuristics to identify such kinds of function calls even though we don't really know about the validity of the identifier. A function is clearly identified if the `identifier` is followed by - comment - end of line - end of expression (closing bracket) - end of statement (semi-colon) - HEREDOC - quoted string - regular expression - variable - word but no operator (II) Each variable or function identifier can contain a qualified path. Example: $NS1::NS2::variable NS1::NS2::function The `main` namespace is shortaned by leading `::` $::variable ::function This commit applies correct scopes for all qualified identifiers. (III) Qualifiers or identifiers can consist of only capital letters. In order to distinguish them from global constants this commit adds the following rules: 1) Prefer scoping the file handles like STDERR, STDOUT, etc. as constants even though they are used as function or look like a class or namespace. Even though these tokens don't have special meaning to the Perl interpreter they are considered for certain use by the standard library. 2) If a user defined constant looks like a function it is scoped as such. In other words tokens with capital letters only are scoped as constant only, if they don't look like a namespace, class, object or function. (IV) Object member functions According to https://perldoc.perl.org/perlobj.html an object in Perl is nothing else than a normal data structure (array, hash, ...) which is bound to a class. The `->` accessor always indicates a member function call, if followed by an identifier. Attributes are accessed via getter/setter methods in modern Perl code. Example: $obj->method $obj->getter $obj->setter <value> A subroutine always returns a reference to an object. Therefore the following is invalid: $obj->method{key} Another `->` accessor is needed to access the item of the returned hash by `method`: $obj->method->{key} Otherwise the `->` could also be used to directly access data items in the underlying hash of an object (which is discouraged). Example: $obj->{attribute} Accessing members of a nested object look like: $obj->{attribute}->method Note: Object members like `->new()` in an expression like `Class->new()` are not yet part of meta.path as they must not be part of a string interpolation. The `->` accessor scope is renamed to `punctuation.accessor.arrow` to comply with scope naming guidelines and give it the same color as the `::` operator which is used to access classes. * [Perl] Add HEREDOC in block test case This commit adds another HEREDOC test case to ensure not to break highlighting due to pushing into contexts when matching braces. * [Perl] Add reference operator This commit adds the reference operator `\` as described at https://perldoc.perl.org/perlref.html#Making-References * [Perl] Add string interpolation step 3 This commit adds a `variable-interpolation` context, which can be included wherever interpolation support is needed. It removes the last scope on stack, which should be a subscope of `string` and adds the `meta.interpolation.perl`. This commit adds interpolation support to: - double quoted strings - backtick quoted strings - format strings Note: Perl does not evaluate variables in single quoted strings. * [Perl] Add keyword qualifier All reserved words like `if`, `else`, `sub`, ... are defined in the `CORE` namespace. Thus prefixing them with `CORE::` is valid syntax. This commit therefore adds a simple context, which prevents such keywords from being highlighted as ordinary function. Note: For simplicity reasons no `meta.path` is added at the moment as it would require to add a match for each qualified keyword. As a result the `CORE::` is not included in any other meta scope like `meta.function` etc. * [Perl] Improve variable scoping This commit improves the overall variables matching: 1. add/fix regexp match group variables `$+`, `$-`,`%+`, `%-`,`@+`, `@-` 2. fix predefined variables pattern (replace `^` by `\^`) 3. clean up some character classes (remove escapes) 4. regroup/resort the rules 5. add special scopes for builtin variables 6. add builtin variables from English.pm * [Perl] Add dereference operator This commit adds the dereference operators as described at https://perldoc.perl.org/perlref.html#Making-References Note: Perl calls the variable prefixes `type keywords`. This commit distinguishes between the variable prefix, which is the most right leading character in front of the identifier. All other type prefixes, which can be added to the left in order to perform dereferencing are scoped as `keyword.operator` in order to make sure all kinds of variables are scoped correctly. * [Perl] Add missing modulo operator * [Perl] Remove deprecated scope from PODs Tab indention is deprecated in Perl, but many older library functions still use it. Some color schemes highlight the whole whitespace block. It just sucks. * [Perl] Fix block content A code block can contain anything. * [Perl] Fix qualified format statement This commit applies all changes about qualified functions and variables to the `format` statement. * [Perl] Tweak hash key scope name Using string.unquoted as hash key makes it hard to distinguish interpolated item access (1) from normal string content (2) because only the braces are highlighted differently. Example: "string $hash{key} string" ^^^^^^^^^^ meta.interpolation "string $hash->{key} string" ^^^^^ meta.interpolation This commit changes the `key` scope to `constant.other.key` as it is already used for defining hashs. Example: %hash = ( key => "value", key2 => "value2") This commit makes sure to highlight valid keys when defining hashs, only. * [Perl] Tweak namespace scopes This commit renames `support.class.perl` to `variable.namespace.perl` as more general identifier for the path parts of a qualifier. The top-level scope `support.` is not sufficient as it is meant for use with built-in entities only. Any namespace can be user defined though. The `variable.namespace` was choosen as counterpart of `entity.name.namespace` which is used to define a namespace. * [Perl] Rename Miscellaneous to Comments.tmPreferences Most syntaxes use the newer name. * [Perl] Fix interpolated string termination This commit addresses interpolated strings and quoted like functions not to be correctly popped off from, if the interpolated string ends with a variable punctuation. Examples: a) "$repl$" ^^ no variable! b) s/pattern/$repl$/g; ^^ no variable! Example (a) results in a syntax error as the Perl interpreter doesn't know whether to handle the `%"` as variable or not. This commit does not introduce an `illegal` scope for it, though. It just ensures not to break the string boundaries. Example (b) shows quoted like functions, which Perl tokenizes by the first character after the function identifier (here `/`) first before it starts parsing the strings. The delimiters are therefore matched with higher priority. This commit adapts this behavior by 1) consuming the variable punctuation `[$@%&*]#?` in front of the delimiter (or closing bracket). We need it because, $/, $), $], etc. are valid built-in variables. 2) modifying the `variables-interpolation` context in order to make sure to pop off from a `meta.interpolation` after each variable. Otherwise 1) wouldn't work. In order to prevent sophisticated (and error prone) lookaheads for each ordinary variable, the existing `qualified-variables` and `unqualified-variables` contexts are merged for that purpose. The resulting context pops off, while the original ones don't. * [Perl] Tweak function vs. constant matching Single words within expressions are most likely to be constants. They can be functions if defined by an import (use) or sub before, but we can't distinguish that. Example: if (constant) ^^^^^^^^ constant or function possible. Uppercase only identifiers are already scoped as constants, if they don't look like a namespace or function call. This commit adds the `constant-identifier` context to scope all identifiers as constants, which were not otherwise matched. This is only guesswork but the only chance we have to hopefully scope everything correctly in most situations. * [Perl] Tweak storage scopes This commit renames the `keyword.declaration.variable` scope to `storage.type.variable`. This step seems consequent because all the other declaration/definition keywords were renamed according to the scope naming guidelines. `sub` => storage.type.function `package` => storage.type.namespace So we have now: `my`, `local`, `state`, `our` => storage.type.variable. * [Perl] Optimize expressions contexts * [Perl] Update perldoc urls The documentation of Perl 5.30.0 moved to https transport. * [Perl] Add meta.function-call to function identifiers Makes the functions to be added to the index. Note: The meta.function-call was removed as it used to cover the whole arguments list, which doesn't work properly if HEREDOCS are passed to a function. * [Perl] Fix test cases This commit renames a scope in the function reference test cases, which used to exist during development only. * [Perl] Tweak operator scopes This commit ... 1) splits adds logical operators into - `keyword.operator.comparison.perl` - `keyword.operator.logical.perl` 2) assigns `=>` the `punctuation.separator.key-value.perl` scope. Inspired by C# and Ruby. * [Perl] Fix unquoted Heredoc tag This commit introduces fixes for the following issues: 1) Whitspace between the `<<` and the tag name in unquoted HEREDOCS (e.g.: `<< TAG`) is not allowed. 2) The type of whitespace in quoted HEREDOC tags doesn't matter. Both space and tabs are allowed. * [Perl] Tweak HEREDOC scope names Applies heredoc scopes according to sublimehq#2073. * [Perl] Case insensitive first_line_match
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #1325 (already fixed in 3207)
Fixes #1326 (already fixed in 3207)
Fixes #1029 (already fixed in 3207)
Fixes #1495 (already fixed in 3207)
Fixes #2017
Preamble
I know the PR is way too big again. Wouldn't have expected this amount of changes at all, when I started to think about implementing string interpolation in spring. But the latest perldocs, further investigations about the Perl interpreter itself, comparison with ModernPerl and some discussions in the forum and issue #2017 revealed a number of short commings with the current implementation, which prevented correct interpolation support. The assumptions about how functions and variables are identified are just not accurate enough and the resulting implementation too "simple" to match them all correctly. Need a little bit more context sensitive scoping to accomblish better results.
Main Goals
This PR therefore includes all required changes and fixes to accomblish the following main goals:
meta.path
andvariable.namespace
scopes as a suggestion resulting from [RFC] Scopes for module/namespace access #1842).All the different expressions were therefore tested against the Perl interpreter to make sure the added syntax test cases are correct.
meta.string.perl meta.interpolation.perl
and clearing thestring
scope.\
and&
.Fixes
sub name :codeattribute ...
type keywords
like$
,@
, ... only. No names allowed.%
modulo operator was not scopeds/pattern\\/
failed due to the\\
.invalid.deprecated
was somehow annoying.Benchmarks
The overall performance impact of the changes is negligible, except parsing of code with many interpolated strings increases depending on the number of interpolated variables.
The benchmark file which was used to check performance impact of all the changes contains many interpolated strings. The overall parsing time increased by about 15% but it is still only 20% of the time which is needed with ModernPerl.
Known issues
The usage of the Regular Expressions package prevents proper matching of interpolated variables in regulare expressions. Fixing it means implementing new regexp syntax for Perl.
Notes
This PR includes #2032 and #2036 for technical reasons. I leave it to @wbond to merge them first or together with this PR.