-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: improve performance for specific grammars with deeply nested closures and build CJS version #25
Conversation
Signed-off-by: Peter van Gulik <[email protected]>
Signed-off-by: Peter van Gulik <[email protected]>
Signed-off-by: Peter van Gulik <[email protected]>
Signed-off-by: Peter van Gulik <[email protected]>
…ance for grammars with deeply nested closures Signed-off-by: Peter van Gulik <[email protected]>
Great, thanks @Codeneos! I'm a bit ambivalent to bring back CJS builds, but maybe it's better that way to ease usage with plain Node.js. The implementation of these classes was on my mental todo list anyway (as is the entire optimization of the runtime). Just today I implemented a |
src/misc/HashMap.ts
Outdated
private _values: TValue[] = []; | ||
private _keys: TKey[] = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As these are especially marked with a leading underscore (which my linter rules disallow btw.) I'd say they should be declared as real private (e.g. #values
). Maybe even make all private fields really private. I come from a C++ background where private really means private :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree 100%! I was a bit worried about the synthetic sugar that tsc adds when using private fields would have to much impact on performance. And as values
and keys
are both functions I decided to go with the classic _
prefix.
I'll do a quick test if real #
private fields have a noticeable performance impact and if not make the changes suggested 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw. the release bundle is not built by TSC. That creates only the typings. esbuild
is what creates the release and I hope it directly takes over private members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, your right! I forgot that esbuild
doesn't use tsc
.
And yes your right, it takes over the private members as is. No strange hidden map and helper functions that you sometimes get with tsc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are especially marked with a leading underscore (which my linter rules disallow btw.) I'd say they should be declared as real private (e.g.
#values
). Maybe even make all private fields really private. I come from a C++ background where private really means private :-)
Ok changes applied - I left the hashFunction
and equalsFunction
as design-time privates.
For me the problem with ESM only is that it doesn't work for everything yet. For example VSCode extensions need to be written in CJS format as ESM is not yet supported by VSCode. There are some workarounds converting ESM back to CJS but it isn't ideal. So if you would be okay with supporting CJS for the time being it would help me a lot! I noticed optimizations are very specific, the HashMap and Set changes I did work great for my grammar and source code but when I run the benchmark it has no effect at all. |
I wonder why Git blocks merging, saying that all commits need to be signed. They are all signed.... |
When you bundle your extension (as you should) then you can generate a CJS bundle from any type of module (I do that with my extension already). But running with test frameworks needs a separate transformer, so a CJS bundle here simplifies things there. |
I am using webpack and esbuild but was getting some ERR about the ESM module not found when I ran quick test earlier today to see if I could move to ESM. Could be I missed something though somewhere. |
Signed-off-by: Peter van Gulik <[email protected]>
@Codeneos Unfortunately I had to revert your changes in HashSet and HashMap, because they broke the Java runtime tests. To my shame I have to admit that I did not run these tests for a while. Otherwise I would have seen that problem earlier. Only because I'm currently converting the Java runtime tests to TypeScript I saw this problem. If you want you can open a new PR with updated classes that also work well in the Java runtime tests. |
Thanks for letting me know! I'll recreate the PR and fix the issue. Do you already have some pointers as to what is causing the Java tests to fail? Oh and, for my knowledge how do I run the Java runtime tests? Are these part of the repo or do need something specific for that? |
Running the Java runtime tests is a pain. You need to build ANTLR4 locally and then symlink the JavaScript runtime folder in the ANTLR4 repo folder to the repo folder of antlr4ng. The good news is however, that I managed to complete the TS port of the ANTLR4 runtime tests today. Clone this repo locally, run It's all a bit rough yet and if you want to run a specific test you have to manually change the |
Great that the tests already ported everything to TS! I'll clone the repo and run the tests from there to pin point the issue. |
For your information I just reached a big milestone: All tests now use Jest and run in parallel, reducing total execution time to under 20s on my box. You can even debug individual tests, which should help you to better figure out why a test fails. |
I've re-created the PR and made the changes I had intended to make from scratch again, this time I'm adding jest unit tests as I go to ensure that everything works as intended and keeps working as intended :) |
Both the
antlr4ng
andantlr4
implement a customHashMap
andSet
that is compatible with their Java counterparts. Feature wise the implementations are correct but with certain grammars and specific source files that have deeply nested closures theSet
class is stressed unproportionally.This PR optimizes the methods of the
Set
andHashMap
which yields a significantly performance improvement in parsing time. For me parsing of 2318 sources improved from ~165.365s to ~28.300s. This doesn't improve performance for all grammars.This was tested using APEX, which is a c-style programming language for Salesforce CRM. The grammar file used was derived from Java which is what APEX is loosely based on.
Before
After
Analysis
The 2 anonymous functions on line
600
(1) and602
(2) are from thelength
function of theHashSet
. Changing thelength
function to a more simpler function that doesn't usemap
orreduce
significantly improves performance. As thelength
function alone counts for 15% of the execution time.I suspect this is due to how
Object.keys
andmap
work in that both methods allocate new arrays which is a relatively costly operation.Solution
As both the
Set
andHashMap
in the antlr4 runtime only support add, contains and get operations making them collection types from which no elements can be removed thus optimizing them is relatively simple.By stored the actual data of both collections in an array we can speed up accessing both the keys and the values of the collections. Next the length property can be determined by checking the length of the values array avoiding
Object.keys
andmap
.The hash buckets instead of storing the actual value will store a reference to the index in the array. Arrays in JS are spares by design so you can add an element at index 100 without needing to set the previous 99 indices. Also you will not get an out of bounds error when accessing an index beyond the length of an array, you will simply get
undefined
. But as the Set and HashMap do not support removal of entries this doesn't matter that much :).Other changes
CharStream
, with antlr4ng I cannot implement theCharStream
anymore due to it's#private
property.