Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perl] Feature Update 2019 #2048

Merged
merged 59 commits into from
Oct 17, 2019
Merged

Conversation

deathaxe
Copy link
Collaborator

@deathaxe deathaxe commented Aug 14, 2019

Fixes #1325 (already fixed in 3207)
Fixes #1326 (already fixed in 3207)
Fixes #1029 (already fixed in 3207)
Fixes #1495 (already fixed in 3207)
Fixes #2017

Preamble

I know the PR is way too big again. Wouldn't have expected this amount of changes at all, when I started to think about implementing string interpolation in spring. But the latest perldocs, further investigations about the Perl interpreter itself, comparison with ModernPerl and some discussions in the forum and issue #2017 revealed a number of short commings with the current implementation, which prevented correct interpolation support. The assumptions about how functions and variables are identified are just not accurate enough and the resulting implementation too "simple" to match them all correctly. Need a little bit more context sensitive scoping to accomblish better results.

Main Goals

This PR therefore includes all required changes and fixes to accomblish the following main goals:

  1. Correctly scope unqualified and qualified case insensitive function identifiers and variables properly (including meta.path and variable.namespace scopes as a suggestion resulting from [RFC] Scopes for module/namespace access #1842).
    All the different expressions were therefore tested against the Perl interpreter to make sure the added syntax test cases are correct.
  2. Add support for string interpolation including meta.string.perl meta.interpolation.perl and clearing the string scope.
  3. Improve the scoping of referencing and dereferencing operators and add missing \ and &.

Fixes

  • some comparison expressions were matched as angular quoted strings
  • qualified identifiers must not contain whitespace
  • sub routine definitions can contain attributes like sub name :codeattribute ...
  • sub routine parameter lists consist of type keywords like $, @, ... only. No names allowed.
  • the % modulo operator was not scoped
  • escaping from some regular expressions in like s/pattern\\/ failed due to the \\.
  • scoping the deprecated tab indention in POD comments as invalid.deprecated was somehow annoying.

Benchmarks

The overall performance impact of the changes is negligible, except parsing of code with many interpolated strings increases depending on the number of interpolated variables.

The benchmark file which was used to check performance impact of all the changes contains many interpolated strings. The overall parsing time increased by about 15% but it is still only 20% of the time which is needed with ModernPerl.

Known issues

The usage of the Regular Expressions package prevents proper matching of interpolated variables in regulare expressions. Fixing it means implementing new regexp syntax for Perl.

Notes

This PR includes #2032 and #2036 for technical reasons. I leave it to @wbond to merge them first or together with this PR.

DeathAxe added 30 commits August 1, 2019 14:27
This commit fixes two issues with numeric constants.

1) Some of the constants were not scoped correctly if directly followed
   by a `+` operator, which was caused by the `+` in the `(?![\.\w+])`
   lookahead, which was probably added by accident.

   Investigations revealed Perl correctly detecting all numbers even
   if followed by `.`. It then just handles that dot as concat operator.

   Example:

       `print 0x1.0x2` will print out 12

2) It's barely possible to distinguish a negative number from a
   subtraction operation. Therefore the leading number sign matching
   `[-+]?` was removed from unquoted constants.

   All `-` or `+` operators in front of or after numbers are now
   scoped as `keyword.operator` and do no longer break highlighting.
This commit scopes integers starting with `0` and containing digits
from 0 to 7 as `constant.numeric.integer.octal.perl`.
Perl accepts all kinds of incomplete floating point numbers.

This commit therefore scopes all of them properly.

This commit doesn't split the `match` operations for performance
reasons. The current implementation was benchmarked the fastest one
from various alternatives tested against 10k lines of quoted and
unquoted floating point numbers.

Note:
  Need to make sure not to brake the range operator `..`.
Languages like Go or Python scope the decimal separator already.
With the changes made in the previous commit doing so is too easy to
ignore it.
Languages like D scope the leading `0b` or `0x` of binary/hexadecimal
constants as `punctuation.definition.numeric`.

This commit adds the according rules to do so for Perl as well for
consistency reasons with other languages.
Quoted numbers are interpreted as decimal (float or int) only.
Octal numbers as binary and hexadecimal must be unquoted.

Example:

  print "030" + "10";    # prints 40
As the both removed contexts are not used separately at any point they
are merged into the `constants-numbers` context.

The rules are sorted by number bases: bin/oct/hex/dec
Merging the numbers contexts revealed some missing boundary checks.
Even though they are considered not needed they are added for safety
and consistency reasons to prevent possible edge cases.
Up to this commit all `${...}` or `%{...}` tokens are scoped as
`variable.other` while they contain ordinary expressions.

This commit...

1. replaces the scope by a `meta.variable` to avoid parts of the
   expressions being highlighted in a wrong way.
2. adds the `regexp-pop` context right after the opening punctuation
   in order to fix an issue with `/patter/flags` not being detected.

Note:
  This change is part of tackling issue sublimehq#2017
According to the test cases of ModernPerl 01.0 is a decimal float, but
was treated as octal due to the current order of the matches.

This commit resorts the rules to fix it.
Case doesn't matter, when parsing 0x, 0b, or exponent e. So 0X, 0B or E
are valid as well.
Perl allows all digits in a number to be replaced by `_`.
Issue:

Up to this commit the first `<` after a `$var++` term is scoped as the
beginning of an angle quoted string as they have been considered valid
after each operator or punctuation. The following example illustrates
how this edge case breaks the highlighting of normal comparisons.

Example:

  while ($var++ < 50) {}
                ^^^^^^^^ string  !! invalid match

Solution:

1. This commits creates a new `expression-begin` context to replace the
   `regexp-pop` context whose name didn't express its meaning precisely
   anymore since angle quoted matching was added. The new context is
   pushed everywhere but after an `++` or `--` operator,
2. The lookahead in the new `string-quoted-angle-pop` is extended to
   match an angular string only, if ...
   a) the beginning does not conflict with other operators like
      `<<` or `<=>`.
   b) the angular string is terminated by `>` at the same line.
A fully qualified identifier must not contain whitespace. Hence the
accessor `::` must not be surrounded by them as well.

It was supported up to this point in order to improve the writing
experience by not breaking identifiers during writing.

This will cause conflicts with parsing/scoping fully qualified
identifiers in future changes an therefore needs to be restricted.

The test cases containing whitespace surrounded accessors are removed
for the moment as properly handling such situations is part of a future
change.
This commit

1. adds missing declaration or operator keywords to the list of
   reserved words.

   This list is used to ensure not to match keywords in the wrong place
   or to avoid matching them as anything else. Therefore it should be
   quite complete.

2. Don't care about `::` accessor, when doing general checks against
   the list of reserved words. Hence replace `{{break}}` by `\b`.
This commit adds a dedicated context for package declarations.

The reasons are:

1) correctly apply scope names according to the guidelines to the
   statement:
   - adds `meta.namespace.perl` to the whole statement
   - adds `meta.path.perl` to the fully qualified namespace identifier
   - scopes the identifiers as `entity.name.namespace`
2) the way fully qualified identifiers (variables & functions) are to
   be matched in future changes would otherwise break the package
   declaration statements' syntax highlighting.
3) no variables nor other expressions are allowed and thus scoped
   invalid.

Note:
  The order of the NAMESPACE and VERSION is not enforced.
  Wrong order won't be scoped invalid at the moment.
This commit adds the `require` context for package imports.

The reasons are:

1) correctly apply scope names according to the guidelines to the
   statement:
   - adds `meta.import.require.perl` to the whole statement
   - adds `meta.path.perl` to the fully qualified namespace identifier
   - scopes the identifiers as `entity.name.namespace`
2) the way fully qualified identifiers (variables & functions) are to
   be matched in future changes would otherwise break the "require
   statements" syntax highlighting.
3) no variables nor other expressions are allowed and thus scoped
   invalid. Leading `::` accessors is invalid, too.
This commit refactors the subroutine definition statement to...

1) support fully qualified identifiers

   Example:

     sub NS1::NS2::NS3::function { }

   A real life example can be found in <..>/core_perl/B/Debug.pm

   Before this commit fully qualified subroutine identifiers are scoped
   `invalid.illegal`. This is fixed by this commit.

2) add proper highlighting of code attributes.

   Example:

     sub name :attribute(attrargs) ($) { }

   Code attributes must be defined in the `BEGIN { }` preprocessor
   sub routine. They work pretty much like decorators in python.

3) fix prototype parameters.

   The parameter list of a sub may only contain the types of the
   arguments, but no names. The parameters are separated by `;`

   Example:

     sub name ($ ; @ ; % ; $$) {}

4) remove the function block from `meta.function` scope.

   The way blocks need to be handled due to HEREDOCs in general breaks
   the `meta.function` block anyway. It would either pop off at the
   first closing brace or maybe even never. That's why including it
   is useless anyway.

5) move the `sub-...` contexts upwards in the syntax definition in
   order to group all coming complex statement definitions at the top
   of the file, while keep more atomic contexts at the bottom.

6) refactor the test cases (leading indention).

Note:

  The namespace part is currently scoped as `support.class`, because
  a) this is the scope being used in other situations as well
  b) the scope naming guidelines don't yet suggest a scope for that
     part.

     - C# uses `variable.other.namespace`.
     - The rewritten Erlang introduces `variable.namespace` because it
       fits best to the `entity.name.namespace` which is to be used for
       namespace definitions.
     - The proposed general `variable.qualifier` would probably the
       best most general alternative as it is hard/impossible to
       distinguish namespace from class access.

   c) All `support.class` scopes should be renamed in one commit later.
This commit moves the variable declaration keywords to the `control`
context because
1) its remaining content is not worth a context
2) they are valid wherever control keywords are valid.
Group the managing contexts at the top of the file.
Issue:

If a HEREDOC is used as argument in a function call, the arguments
context never pops off. Hence `meta.function-call.arguments` keeps on
stack forever.

Example:

   function_name(<<"   HEREDOC");
   This is the HEREDOCs value which is passed to the function
   HEREDOC

Solution:

This commit removes any `meta.function-call` scopes and all the related
contexts in favor of properly highlighting HEREDOC like arguments in
functions.

As the function call can be nested and followed by arbitrary perl
expressions up to the end of line, no better solution was found with
the existing features of ST's lexer.
(I) Function calls without parentheses

Perl Documentation says: Encapsulating function arguments into
parentheses is optional, if the expression is clearly identified as
function-call.

This statement applies to calls of sub routines which are already known
to the interpreter by defining them at the beginning of the script or
by importing them via `use ...` statement from other modules.

This commit adds some heuristics to identify such kinds of function
calls even though we don't really know about the validity of the
identifier.

A function is clearly identified if the `identifier` is followed by
  - comment
  - end of line
  - end of expression (closing bracket)
  - end of statement (semi-colon)
  - HEREDOC
  - quoted string
  - regular expression
  - variable
  - word but no operator

(II) Each variable or function identifier can contain a qualified path.

Example:

  $NS1::NS2::variable
   NS1::NS2::function

The `main` namespace is shortaned by leading `::`

  $::variable
  ::function

This commit applies correct scopes for all qualified identifiers.

(III) Qualifiers or identifiers can consist of only capital letters.

In order to distinguish them from global constants this commit adds
the following rules:

1) Prefer scoping the file handles like STDERR, STDOUT, etc. as
   constants even though they are used as function or look like a class
   or namespace. Even though these tokens don't have special meaning
   to the Perl interpreter they are considered for certain use by the
   standard library.
2) If a user defined constant looks like a function it is scoped as
   such. In other words tokens with capital letters only are scoped as
   constant only, if they don't look like a namespace, class, object or
   function.

(IV) Object member functions

According to https://perldoc.perl.org/perlobj.html an object in Perl
is nothing else than a normal data structure (array, hash, ...) which
is bound to a class.

The `->` accessor always indicates a member function call, if followed
by an identifier. Attributes are accessed via getter/setter methods in
modern Perl code.

Example:

   $obj->method
   $obj->getter
   $obj->setter <value>

A subroutine always returns a reference to an object. Therefore the
following is invalid:

   $obj->method{key}

Another `->` accessor is needed to access the item of the returned hash by `method`:

  $obj->method->{key}

Otherwise the `->` could also be used to directly access data items in
the underlying hash of an object (which is discouraged).

Example:

   $obj->{attribute}

Accessing members of a nested object look like:

   $obj->{attribute}->method

Note:

  Object members like `->new()` in an expression like `Class->new()`
  are not yet part of meta.path as they must not be part of a string
  interpolation.

The `->` accessor scope is renamed to `punctuation.accessor.arrow` to
comply with scope naming guidelines and give it the same color as the
`::` operator which is used to access classes.
All reserved words like `if`, `else`, `sub`, ... are defined in the
`CORE` namespace. Thus prefixing them with `CORE::` is valid syntax.

This commit therefore adds a simple context, which prevents such
keywords from being highlighted as ordinary function.

Note:
  For simplicity reasons no `meta.path` is added at the moment as it
  would require to add a match for each qualified keyword.

  As a result the `CORE::` is not included in any other meta scope like
  `meta.function` etc.
This commit improves the overall variables matching:

1. add/fix regexp match group variables `$+`, `$-`,`%+`, `%-`,`@+`, `@-`
2. fix predefined variables pattern (replace `^` by `\^`)
3. clean up some character classes (remove escapes)
4. regroup/resort the rules
5. add special scopes for builtin variables
6. add builtin variables from English.pm
This commit adds the reference operator `\` as described at

   https://perldoc.perl.org/perlref.html#Making-References
This commit adds the dereference operators as described at

   https://perldoc.perl.org/perlref.html#Making-References

Note:
  Perl calls the variable prefixes `type keywords`.
  This commit distinguishes between the variable prefix, which is the
  most right leading character in front of the identifier. All other
  type prefixes, which can be added to the left in order to perform
  dereferencing are scoped as `keyword.operator` in order to make sure
  all kinds of variables are scoped correctly.
DeathAxe added 15 commits August 14, 2019 18:53
This commit adds string interpolation to the following expressions

  s/<pattern>/<replacement with interpolation>/<flags>;
  tr/<pattern>/<replacement with interpolation>/<flags>;
  y/<pattern>/<replacement with interpolation>/<flags>;

This change also fixes an issue with broken highlighting if the
expression spans multiple lines.

  s/<pattern>/
  <replacement with interpolation>
  /<flags>;

or

  s{
    <pattern>
  }
  [<replacement with interpolation>]<flags>;

Notes:

1) The context `quote-like-replace` is renamed to `quoted-like-replace`
   as this new name reflects Perl terminology more accurately.

2) The context `quoted-like-args-find-rexexp` is renamed to
   `quoted-like-args-pattern` as this is more general name and matches
   the changes of the previous commit.
This commit adds interpolation support to comments in the same way as
for strings in order to properly support the fenced code blocks.

All comments are scoped with `meta.comment.perl`.
The `comment` scope is cleared from stack within a fenced code block.
Nearly any character can be used as delimiter in quoted-like
expressions.

All the following expressions are equal:

   s/<pattern>/<repl>/<flags>
   s|<pattern>|<repl>|<flags>
   s#<pattern>#<repl>#<flags>
   s@<pattern>@<repl>@<flags>

The usage depends on the characters being used in the <pattern> or
<repl>. If `/` is used in must not be part of them. Perl however
accepts escapes like `\/`.

Theoretically a <pattern> can end with an arbitrary number of `\`,
which makes creating a robust `escape` pattern complicated or even
impossible.

Before this commit the escape patterns check for the absence of a
single `\` before the delimiter candidate `/` only.
The check is extended to `\` and `\\\` in order to allow windows style
path names ending with an escaped backslash.

This is not a perfect solution, but should help in some rare edge cases.
This commit addresses interpolated strings and quoted like functions
not to be correctly popped off from, if the interpolated string ends
with a variable punctuation.

Examples:

  a) "$repl$"
           ^^ no variable!

  b) s/pattern/$repl$/g;
                    ^^ no variable!

Example (a) results in a syntax error as the Perl interpreter doesn't
know whether to handle the `%"` as variable or not. This commit does
not introduce an `illegal` scope for it, though. It just ensures not
to break the string boundaries.

Example (b) shows quoted like functions, which Perl tokenizes by the
first character after the function identifier (here `/`) first before
it starts parsing the strings. The delimiters are therefore matched
with higher priority.

This commit adapts this behavior by

1) consuming the variable punctuation `[$@%&*]#?` in front of the
   delimiter (or closing bracket).

   We need it because, $/, $), $], etc. are valid built-in variables.

2) modifying the `variables-interpolation` context in order to make
   sure to pop off from a `meta.interpolation` after each variable.
   Otherwise 1) wouldn't work.

   In order to prevent sophisticated (and error prone) lookaheads for
   each ordinary variable, the existing `qualified-variables` and
   `unqualified-variables` contexts are merged for that purpose.

   The resulting context pops off, while the original ones don't.
This commit

1. adds the `literal-common` and `interpolated-common` contexts to
   group common content for interpolated and raw strings.
2. adds rules to scope C-style FORMAT placeholders as being used by
   printf/sprintf, etc..

   See: https://perldoc.perl.org/functions/sprintf.html

The pattern to match the FORMAT is designed quite restrictive in order
to prevent interference with variable interpolation. Perl uses `%` for
some types of variables and the FORMAT is to be supported in any kind
of quoted string as it might be the format string is build dynamically.

Example:

  $format = "%s: %d";
  sprintf $format, $var1, $integer;

Note:

  Perl supports variable interpolation even in FORMAT patterns.
  Something like %0${varname}X is valid. This commit does not support
  such constructs as it is nearly impossible to distinguish between
  FORMAT patterns and hashs - both start with `%`.
This commit applies all changes about qualified functions and variables
to the `format` statement.
Tab indention is deprecated in Perl, but many older library functions
still use it. Some color schemes highlight the whole whitespace block.
It just sucks.
This commit adds another HEREDOC test case to ensure not to break
highlighting due to pushing into contexts when matching braces.
A code block can contain anything.
Using string.unquoted as hash key makes it hard to distinguish
interpolated item access (1) from normal string content (2) because
only the braces are highlighted differently.

Example:

   "string $hash{key} string"
           ^^^^^^^^^^ meta.interpolation

   "string $hash->{key} string"
           ^^^^^ meta.interpolation

This commit changes the `key` scope to `constant.other.key` as it is
already used for defining hashs.

Example:

    %hash = ( key => "value", key2 => "value2")

This commit makes sure to highlight valid keys when defining hashs,
only.
This commit renames `support.class.perl` to `variable.namespace.perl`
as more general identifier for the path parts of a qualifier.

The top-level scope `support.` is not sufficient as it is meant for use
with built-in entities only. Any namespace can be user defined though.

The `variable.namespace` was choosen as counterpart of
`entity.name.namespace` which is used to define a namespace.
This commit renames the `keyword.declaration.variable` scope to
`storage.type.variable`. This step seems consequent because all the
other declaration/definition keywords were renamed according to the
scope naming guidelines.

`sub` => storage.type.function
`package` => storage.type.namespace

So we have now:

`my`, `local`, `state`, `our` => storage.type.variable.
Single words within expressions are most likely to be constants.
They can be functions if defined by an import (use) or sub before,
but we can't distinguish that.

Example:

    if (constant)
        ^^^^^^^^  constant or function possible.

Uppercase only identifiers are already scoped as constants, if they
don't look like a namespace or function call.

This commit adds the `constant-identifier` context to scope all
identifiers as constants, which were not otherwise matched.

This is only guesswork but the only chance we have to hopefully scope
everything correctly in most situations.
Most syntaxes use the newer name.
The documentation of Perl 5.30.0 moved to https transport.
Copy link
Collaborator

@michaelblyons michaelblyons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, great! This fixes all the things I wanted and many more.

Perl/syntax_test_perl.pl Show resolved Hide resolved
DeathAxe added 3 commits August 15, 2019 18:27
Makes the functions to be added to the index.

Note:
  The meta.function-call was removed as it used to cover the whole
  arguments list, which doesn't work properly if HEREDOCS are passed to
  a function.
This commit renames a scope in the function reference test cases,
which used to exist during development only.
This commit ...

1) splits adds logical operators into
   - `keyword.operator.comparison.perl`
   - `keyword.operator.logical.perl`

2) assigns `=>` the `punctuation.separator.key-value.perl` scope.

Inspired by C# and Ruby.
DeathAxe added 3 commits August 28, 2019 18:50
This commit introduces fixes for the following issues:

1) Whitspace between the `<<` and the tag name in unquoted HEREDOCS
   (e.g.: `<< TAG`) is not allowed.

2) The type of whitespace in quoted HEREDOC tags doesn't matter. Both
   space and tabs are allowed.
Applies heredoc scopes according to sublimehq#2073.
@wbond wbond merged commit d4c38f2 into sublimehq:master Oct 17, 2019
@deathaxe deathaxe deleted the pr/perl/feature-patch-2019 branch October 18, 2019 15:10
mitranim pushed a commit to mitranim/Packages that referenced this pull request Mar 25, 2022
* [Perl] Fix numeric constant boundaries

This commit fixes two issues with numeric constants.

1) Some of the constants were not scoped correctly if directly followed
   by a `+` operator, which was caused by the `+` in the `(?![\.\w+])`
   lookahead, which was probably added by accident.

   Investigations revealed Perl correctly detecting all numbers even
   if followed by `.`. It then just handles that dot as concat operator.

   Example:

       `print 0x1.0x2` will print out 12

2) It's barely possible to distinguish a negative number from a
   subtraction operation. Therefore the leading number sign matching
   `[-+]?` was removed from unquoted constants.

   All `-` or `+` operators in front of or after numbers are now
   scoped as `keyword.operator` and do no longer break highlighting.

* [Perl] Add scope for octal numbers

This commit scopes integers starting with `0` and containing digits
from 0 to 7 as `constant.numeric.integer.octal.perl`.

* [Perl] Fix incomplete floating points

Perl accepts all kinds of incomplete floating point numbers.

This commit therefore scopes all of them properly.

This commit doesn't split the `match` operations for performance
reasons. The current implementation was benchmarked the fastest one
from various alternatives tested against 10k lines of quoted and
unquoted floating point numbers.

Note:
  Need to make sure not to brake the range operator `..`.

* [Perl] Add scope for decimal separator

Languages like Go or Python scope the decimal separator already.
With the changes made in the previous commit doing so is too easy to
ignore it.

* [Perl] Add scope for bin/hex/oct punctuation

Languages like D scope the leading `0b` or `0x` of binary/hexadecimal
constants as `punctuation.definition.numeric`.

This commit adds the according rules to do so for Perl as well for
consistency reasons with other languages.

* [Perl] Fix unsupported quoted octal numbers

Quoted numbers are interpreted as decimal (float or int) only.
Octal numbers as binary and hexadecimal must be unquoted.

Example:

  print "030" + "10";    # prints 40

* [Perl] Merge the constant-number contexts

As the both removed contexts are not used separately at any point they
are merged into the `constants-numbers` context.

The rules are sorted by number bases: bin/oct/hex/dec

* [Perl] Fix bin/hex/float boundaries

Merging the numbers contexts revealed some missing boundary checks.
Even though they are considered not needed they are added for safety
and consistency reasons to prevent possible edge cases.

* [Perl] Fix ${...} scope names

Up to this commit all `${...}` or `%{...}` tokens are scoped as
`variable.other` while they contain ordinary expressions.

This commit...

1. replaces the scope by a `meta.variable` to avoid parts of the
   expressions being highlighted in a wrong way.
2. adds the `regexp-pop` context right after the opening punctuation
   in order to fix an issue with `/patter/flags` not being detected.

Note:
  This change is part of tackling issue sublimehq#2017

* [Perl] Fix floats being treated as octal

According to the test cases of ModernPerl 01.0 is a decimal float, but
was treated as octal due to the current order of the matches.

This commit resorts the rules to fix it.

* [Perl] Fix case of bin/hex/float punctuations

Case doesn't matter, when parsing 0x, 0b, or exponent e. So 0X, 0B or E
are valid as well.

* [Perl] Fix numbers with underscore

Perl allows all digits in a number to be replaced by `_`.

* [Perl] Fix test case

* [Perl] Extend the list of reserved words

This commit

1. adds missing declaration or operator keywords to the list of
   reserved words.

   This list is used to ensure not to match keywords in the wrong place
   or to avoid matching them as anything else. Therefore it should be
   quite complete.

2. Don't care about `::` accessor, when doing general checks against
   the list of reserved words. Hence replace `{{break}}` by `\b`.

* [Perl] Improve sub routine definitions

This commit refactors the subroutine definition statement to...

1) support fully qualified identifiers

   Example:

     sub NS1::NS2::NS3::function { }

   A real life example can be found in <..>/core_perl/B/Debug.pm

   Before this commit fully qualified subroutine identifiers are scoped
   `invalid.illegal`. This is fixed by this commit.

2) add proper highlighting of code attributes.

   Example:

     sub name :attribute(attrargs) ($) { }

   Code attributes must be defined in the `BEGIN { }` preprocessor
   sub routine. They work pretty much like decorators in python.

3) fix prototype parameters.

   The parameter list of a sub may only contain the types of the
   arguments, but no names. The parameters are separated by `;`

   Example:

     sub name ($ ; @ ; % ; $$) {}

4) remove the function block from `meta.function` scope.

   The way blocks need to be handled due to HEREDOCs in general breaks
   the `meta.function` block anyway. It would either pop off at the
   first closing brace or maybe even never. That's why including it
   is useless anyway.

5) move the `sub-...` contexts upwards in the syntax definition in
   order to group all coming complex statement definitions at the top
   of the file, while keep more atomic contexts at the bottom.

6) refactor the test cases (leading indention).

Note:

  The namespace part is currently scoped as `support.class`, because
  a) this is the scope being used in other situations as well
  b) the scope naming guidelines don't yet suggest a scope for that
     part.

     - C# uses `variable.other.namespace`.
     - The rewritten Erlang introduces `variable.namespace` because it
       fits best to the `entity.name.namespace` which is to be used for
       namespace definitions.
     - The proposed general `variable.qualifier` would probably the
       best most general alternative as it is hard/impossible to
       distinguish namespace from class access.

   c) All `support.class` scopes should be renamed in one commit later.

* [Perl] Fix < operator after var++

Issue:

Up to this commit the first `<` after a `$var++` term is scoped as the
beginning of an angle quoted string as they have been considered valid
after each operator or punctuation. The following example illustrates
how this edge case breaks the highlighting of normal comparisons.

Example:

  while ($var++ < 50) {}
                ^^^^^^^^ string  !! invalid match

Solution:

1. This commits creates a new `expression-begin` context to replace the
   `regexp-pop` context whose name didn't express its meaning precisely
   anymore since angle quoted matching was added. The new context is
   pushed everywhere but after an `++` or `--` operator,
2. The lookahead in the new `string-quoted-angle-pop` is extended to
   match an angular string only, if ...
   a) the beginning does not conflict with other operators like
      `<<` or `<=>`.
   b) the angular string is terminated by `>` at the same line.

* [Perl] Add package declaration context

This commit adds a dedicated context for package declarations.

The reasons are:

1) correctly apply scope names according to the guidelines to the
   statement:
   - adds `meta.namespace.perl` to the whole statement
   - adds `meta.path.perl` to the fully qualified namespace identifier
   - scopes the identifiers as `entity.name.namespace`
2) the way fully qualified identifiers (variables & functions) are to
   be matched in future changes would otherwise break the package
   declaration statements' syntax highlighting.
3) no variables nor other expressions are allowed and thus scoped
   invalid.

Note:
  The order of the NAMESPACE and VERSION is not enforced.
  Wrong order won't be scoped invalid at the moment.

* [Perl] Restrict whitespace surrounding namespace accessors

A fully qualified identifier must not contain whitespace. Hence the
accessor `::` must not be surrounded by them as well.

It was supported up to this point in order to improve the writing
experience by not breaking identifiers during writing.

This will cause conflicts with parsing/scoping fully qualified
identifiers in future changes an therefore needs to be restricted.

The test cases containing whitespace surrounded accessors are removed
for the moment as properly handling such situations is part of a future
change.

* [Perl] Add package import context

This commit adds the `require` context for package imports.

The reasons are:

1) correctly apply scope names according to the guidelines to the
   statement:
   - adds `meta.import.require.perl` to the whole statement
   - adds `meta.path.perl` to the fully qualified namespace identifier
   - scopes the identifiers as `entity.name.namespace`
2) the way fully qualified identifiers (variables & functions) are to
   be matched in future changes would otherwise break the "require
   statements" syntax highlighting.
3) no variables nor other expressions are allowed and thus scoped
   invalid. Leading `::` accessors is invalid, too.

* [Perl] Move declaration keywords

This commit moves the variable declaration keywords to the `control`
context because
1) its remaining content is not worth a context
2) they are valid wherever control keywords are valid.

* [Perl] Move expressions context

Group the managing contexts at the top of the file.

* [Perl] Add string interpolation step 6

This commit adds string interpolation to the following expressions

  s/<pattern>/<replacement with interpolation>/<flags>;
  tr/<pattern>/<replacement with interpolation>/<flags>;
  y/<pattern>/<replacement with interpolation>/<flags>;

This change also fixes an issue with broken highlighting if the
expression spans multiple lines.

  s/<pattern>/
  <replacement with interpolation>
  /<flags>;

or

  s{
    <pattern>
  }
  [<replacement with interpolation>]<flags>;

Notes:

1) The context `quote-like-replace` is renamed to `quoted-like-replace`
   as this new name reflects Perl terminology more accurately.

2) The context `quoted-like-args-find-rexexp` is renamed to
   `quoted-like-args-pattern` as this is more general name and matches
   the changes of the previous commit.

* [Perl] Add string interpolation step 1

This commit does not introduce functional changes, but prepares some
scope names for the string interpolation support.

1. The scopes of language identifiers of embedded code blocks are
   renamed according to Markdown's fenced code blocks.

   a) The identifiers in PODs (documentation comments) are renamed
      from `string.unquoted` to `constant.other.language-name.<name>`.
   b) The identifiers of HEREDOCs are renamed
      from `constant.language.heredoc.<name>`
      to `constant.other.language-name.<name>`.

      Note:

      The HEREDOCs language identifiers are stacked into a `string` as
      Perl uses quotations to pin leading whitespace to the identifier.
      This is how it allows the end tag to be indented with the
      surrounding code. This syntax definition doesn't want to include
      these spaces into the `constant`.

2. The `meta.string.perl` scope is added to all kinds of strings to be
   able to safely clear the `string` scope within interpolations.

* [Perl] Add string interpolation step 2

This commit prepares the `string-format` context for variable
interpolation by the following steps:

1. The scopes of the  'picture line patterns' are renamed from
   `variable.parameter` to `constant.other.placeholder` as the patterns
    like @#.# compare to C-style format strings like `%1.1f`.

2. The scopes for `~`, `~~` and `...` are renamed
   from `constant.character.escape`
   to `constant.other.placeholder.text`
   as they are used as placeholders/patterns to format the content of
   variables the same way as the patterns of (a).

3. Fix an issue with some of the variables being partly scoped as
   `constant.placeholder` by refining of their match patterns.

Note:

 These changes are required to clearly distinct between format patterns
 and variables, which are evaluated by perl during runtime by using the
 patterns.

* [Perl] Add string interpolation step 5

This commit adds interpolation support to quoted-like operators, which
are a functional pendant to normal quotations.

  q//;   - single quoted -> no interpolation
  qq//;  - double quoted -> interpolation
  qx//;  - backtick quoted -> interpolation
  qw//;  - split string into words -> no interpolation

Notes:

1) The related contexts are renamed to reflect their meaning in a more
   general and accurate way.

2) According to the scope naming guideline all prefixes and punctuation
   are to be included into `meta.string`. Quoted-like operators like
   qq// are scoped as `meta.function-call` at the moment.
   Hence the `meta.string` does not cover the `qq` function identifier,
   even though it looks very similar to python's `r""` prefix style.

   The discussion about whether to turn `q`, `qq`, `qx`, `qw` into
   prefixes in order to include them into the `meta.string` should be
   part of another commit.

   The solution should play well with other functions like s///, m//,

* [Perl] Improve escaping from embedded regexp

Nearly any character can be used as delimiter in quoted-like
expressions.

All the following expressions are equal:

   s/<pattern>/<repl>/<flags>
   s|<pattern>|<repl>|<flags>
   s#<pattern>#<repl>#<flags>
   s@<pattern>@<repl>@<flags>

The usage depends on the characters being used in the <pattern> or
<repl>. If `/` is used in must not be part of them. Perl however
accepts escapes like `\/`.

Theoretically a <pattern> can end with an arbitrary number of `\`,
which makes creating a robust `escape` pattern complicated or even
impossible.

Before this commit the escape patterns check for the absence of a
single `\` before the delimiter candidate `/` only.
The check is extended to `\` and `\\\` in order to allow windows style
path names ending with an escaped backslash.

This is not a perfect solution, but should help in some rare edge cases.

* [Perl] Add string interpolation step 7

This commit adds interpolation support to comments in the same way as
for strings in order to properly support the fenced code blocks.

All comments are scoped with `meta.comment.perl`.
The `comment` scope is cleared from stack within a fenced code block.

* [Perl] Fix HEREDOC content boundaries

This commit fixes 2 issues which are related with interpolation:

1. By adding the `meta.string` scope to the `string-heredoc-other`
   context the number of meta scopes to clear in the line after the
   HEREDOC tag got unbalanced.

   Fix: Add a dedicated `string-heredoc-expr-other` context to clear
        the `meta.string string.quoted.other` scopes.

2. The `string-heredoc-expr` context was terminated by `- match: $`.
   Hence the HEREDOC content block started at the end of the first
   line, which is inaccurate.

   Fix: Use the `^` in the pattern to make the content start at the
        beginning of the line after the HEREDOC tag.

* [Perl] Add sprintf format placeholders

This commit

1. adds the `literal-common` and `interpolated-common` contexts to
   group common content for interpolated and raw strings.
2. adds rules to scope C-style FORMAT placeholders as being used by
   printf/sprintf, etc..

   See: https://perldoc.perl.org/functions/sprintf.html

The pattern to match the FORMAT is designed quite restrictive in order
to prevent interference with variable interpolation. Perl uses `%` for
some types of variables and the FORMAT is to be supported in any kind
of quoted string as it might be the format string is build dynamically.

Example:

  $format = "%s: %d";
  sprintf $format, $var1, $integer;

Note:

  Perl supports variable interpolation even in FORMAT patterns.
  Something like %0${varname}X is valid. This commit does not support
  such constructs as it is nearly impossible to distinguish between
  FORMAT patterns and hashs - both start with `%`.

* [Perl] Fix HEREDOC function call arguments

Issue:

If a HEREDOC is used as argument in a function call, the arguments
context never pops off. Hence `meta.function-call.arguments` keeps on
stack forever.

Example:

   function_name(<<"   HEREDOC");
   This is the HEREDOCs value which is passed to the function
   HEREDOC

Solution:

This commit removes any `meta.function-call` scopes and all the related
contexts in favor of properly highlighting HEREDOC like arguments in
functions.

As the function call can be nested and followed by arbitrary perl
expressions up to the end of line, no better solution was found with
the existing features of ST's lexer.

* [Perl] Add string interpolation step 4

This commit adds support for string interpolation within HEREDOCs.

HEREDOCs with single quoted name (e.g. <<'EOT') are not interpolated.
Unquoted (<<EOT) or double quoted (<<"EOT") HEREDOCs are interpolated.

* [Perl] Fix qualified functions and variables

(I) Function calls without parentheses

Perl Documentation says: Encapsulating function arguments into
parentheses is optional, if the expression is clearly identified as
function-call.

This statement applies to calls of sub routines which are already known
to the interpreter by defining them at the beginning of the script or
by importing them via `use ...` statement from other modules.

This commit adds some heuristics to identify such kinds of function
calls even though we don't really know about the validity of the
identifier.

A function is clearly identified if the `identifier` is followed by
  - comment
  - end of line
  - end of expression (closing bracket)
  - end of statement (semi-colon)
  - HEREDOC
  - quoted string
  - regular expression
  - variable
  - word but no operator

(II) Each variable or function identifier can contain a qualified path.

Example:

  $NS1::NS2::variable
   NS1::NS2::function

The `main` namespace is shortaned by leading `::`

  $::variable
  ::function

This commit applies correct scopes for all qualified identifiers.

(III) Qualifiers or identifiers can consist of only capital letters.

In order to distinguish them from global constants this commit adds
the following rules:

1) Prefer scoping the file handles like STDERR, STDOUT, etc. as
   constants even though they are used as function or look like a class
   or namespace. Even though these tokens don't have special meaning
   to the Perl interpreter they are considered for certain use by the
   standard library.
2) If a user defined constant looks like a function it is scoped as
   such. In other words tokens with capital letters only are scoped as
   constant only, if they don't look like a namespace, class, object or
   function.

(IV) Object member functions

According to https://perldoc.perl.org/perlobj.html an object in Perl
is nothing else than a normal data structure (array, hash, ...) which
is bound to a class.

The `->` accessor always indicates a member function call, if followed
by an identifier. Attributes are accessed via getter/setter methods in
modern Perl code.

Example:

   $obj->method
   $obj->getter
   $obj->setter <value>

A subroutine always returns a reference to an object. Therefore the
following is invalid:

   $obj->method{key}

Another `->` accessor is needed to access the item of the returned hash by `method`:

  $obj->method->{key}

Otherwise the `->` could also be used to directly access data items in
the underlying hash of an object (which is discouraged).

Example:

   $obj->{attribute}

Accessing members of a nested object look like:

   $obj->{attribute}->method

Note:

  Object members like `->new()` in an expression like `Class->new()`
  are not yet part of meta.path as they must not be part of a string
  interpolation.

The `->` accessor scope is renamed to `punctuation.accessor.arrow` to
comply with scope naming guidelines and give it the same color as the
`::` operator which is used to access classes.

* [Perl] Add HEREDOC in block test case

This commit adds another HEREDOC test case to ensure not to break
highlighting due to pushing into contexts when matching braces.

* [Perl] Add reference operator

This commit adds the reference operator `\` as described at

   https://perldoc.perl.org/perlref.html#Making-References

* [Perl] Add string interpolation step 3

This commit adds a `variable-interpolation` context, which can be
included wherever interpolation support is needed.

It removes the last scope on stack, which should be a subscope of
`string` and adds the `meta.interpolation.perl`.

This commit adds interpolation support to:

 - double quoted strings
 - backtick quoted strings
 - format strings

Note: Perl does not evaluate variables in single quoted strings.

* [Perl] Add keyword qualifier

All reserved words like `if`, `else`, `sub`, ... are defined in the
`CORE` namespace. Thus prefixing them with `CORE::` is valid syntax.

This commit therefore adds a simple context, which prevents such
keywords from being highlighted as ordinary function.

Note:
  For simplicity reasons no `meta.path` is added at the moment as it
  would require to add a match for each qualified keyword.

  As a result the `CORE::` is not included in any other meta scope like
  `meta.function` etc.

* [Perl] Improve variable scoping

This commit improves the overall variables matching:

1. add/fix regexp match group variables `$+`, `$-`,`%+`, `%-`,`@+`, `@-`
2. fix predefined variables pattern (replace `^` by `\^`)
3. clean up some character classes (remove escapes)
4. regroup/resort the rules
5. add special scopes for builtin variables
6. add builtin variables from English.pm

* [Perl] Add dereference operator

This commit adds the dereference operators as described at

   https://perldoc.perl.org/perlref.html#Making-References

Note:
  Perl calls the variable prefixes `type keywords`.
  This commit distinguishes between the variable prefix, which is the
  most right leading character in front of the identifier. All other
  type prefixes, which can be added to the left in order to perform
  dereferencing are scoped as `keyword.operator` in order to make sure
  all kinds of variables are scoped correctly.

* [Perl] Add missing modulo operator

* [Perl] Remove deprecated scope from PODs

Tab indention is deprecated in Perl, but many older library functions
still use it. Some color schemes highlight the whole whitespace block.
It just sucks.

* [Perl] Fix block content

A code block can contain anything.

* [Perl] Fix qualified format statement

This commit applies all changes about qualified functions and variables
to the `format` statement.

* [Perl] Tweak hash key scope name

Using string.unquoted as hash key makes it hard to distinguish
interpolated item access (1) from normal string content (2) because
only the braces are highlighted differently.

Example:

   "string $hash{key} string"
           ^^^^^^^^^^ meta.interpolation

   "string $hash->{key} string"
           ^^^^^ meta.interpolation

This commit changes the `key` scope to `constant.other.key` as it is
already used for defining hashs.

Example:

    %hash = ( key => "value", key2 => "value2")

This commit makes sure to highlight valid keys when defining hashs,
only.

* [Perl] Tweak namespace scopes

This commit renames `support.class.perl` to `variable.namespace.perl`
as more general identifier for the path parts of a qualifier.

The top-level scope `support.` is not sufficient as it is meant for use
with built-in entities only. Any namespace can be user defined though.

The `variable.namespace` was choosen as counterpart of
`entity.name.namespace` which is used to define a namespace.

* [Perl] Rename Miscellaneous to Comments.tmPreferences

Most syntaxes use the newer name.

* [Perl] Fix interpolated string termination

This commit addresses interpolated strings and quoted like functions
not to be correctly popped off from, if the interpolated string ends
with a variable punctuation.

Examples:

  a) "$repl$"
           ^^ no variable!

  b) s/pattern/$repl$/g;
                    ^^ no variable!

Example (a) results in a syntax error as the Perl interpreter doesn't
know whether to handle the `%"` as variable or not. This commit does
not introduce an `illegal` scope for it, though. It just ensures not
to break the string boundaries.

Example (b) shows quoted like functions, which Perl tokenizes by the
first character after the function identifier (here `/`) first before
it starts parsing the strings. The delimiters are therefore matched
with higher priority.

This commit adapts this behavior by

1) consuming the variable punctuation `[$@%&*]#?` in front of the
   delimiter (or closing bracket).

   We need it because, $/, $), $], etc. are valid built-in variables.

2) modifying the `variables-interpolation` context in order to make
   sure to pop off from a `meta.interpolation` after each variable.
   Otherwise 1) wouldn't work.

   In order to prevent sophisticated (and error prone) lookaheads for
   each ordinary variable, the existing `qualified-variables` and
   `unqualified-variables` contexts are merged for that purpose.

   The resulting context pops off, while the original ones don't.

* [Perl] Tweak function vs. constant matching

Single words within expressions are most likely to be constants.
They can be functions if defined by an import (use) or sub before,
but we can't distinguish that.

Example:

    if (constant)
        ^^^^^^^^  constant or function possible.

Uppercase only identifiers are already scoped as constants, if they
don't look like a namespace or function call.

This commit adds the `constant-identifier` context to scope all
identifiers as constants, which were not otherwise matched.

This is only guesswork but the only chance we have to hopefully scope
everything correctly in most situations.

* [Perl] Tweak storage scopes

This commit renames the `keyword.declaration.variable` scope to
`storage.type.variable`. This step seems consequent because all the
other declaration/definition keywords were renamed according to the
scope naming guidelines.

`sub` => storage.type.function
`package` => storage.type.namespace

So we have now:

`my`, `local`, `state`, `our` => storage.type.variable.

* [Perl] Optimize expressions contexts

* [Perl] Update perldoc urls

The documentation of Perl 5.30.0 moved to https transport.

* [Perl] Add meta.function-call to function identifiers

Makes the functions to be added to the index.

Note:
  The meta.function-call was removed as it used to cover the whole
  arguments list, which doesn't work properly if HEREDOCS are passed to
  a function.

* [Perl] Fix test cases

This commit renames a scope in the function reference test cases,
which used to exist during development only.

* [Perl] Tweak operator scopes

This commit ...

1) splits adds logical operators into
   - `keyword.operator.comparison.perl`
   - `keyword.operator.logical.perl`

2) assigns `=>` the `punctuation.separator.key-value.perl` scope.

Inspired by C# and Ruby.

* [Perl] Fix unquoted Heredoc tag

This commit introduces fixes for the following issues:

1) Whitspace between the `<<` and the tag name in unquoted HEREDOCS
   (e.g.: `<< TAG`) is not allowed.

2) The type of whitespace in quoted HEREDOC tags doesn't matter. Both
   space and tabs are allowed.

* [Perl] Tweak HEREDOC scope names

Applies heredoc scopes according to sublimehq#2073.

* [Perl] Case insensitive first_line_match
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment