-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(genAdds): refactor recursive procedure to iterative #1277
Conversation
|
a0284c0
to
f5f9789
Compare
@eoghanmurray nice find on the blocked, I moved that too quickly. Feel free to just leave a review next time and I'm happy to work on the changes |
I'm tempted to leave the Array pre-allocation out of this PR for code readability reasons. Also, this article suggests the V8 engine is much more performant with |
We should also include the test changes in this PR also for further review — see the |
@eoghanmurray no problem, let me make the change. Is there a reason that you decided to move genAdds inside the |
@eoghanmurray I updated the PR based on your review |
Do you mean f5f9789 ? |
Yeah, thats what I meant. Lmk if I need to update anything else |
I'm not sure why the test is failing |
Seems unrelated, I will rebase the PR |
const next = genAddsQueue.pop(); | ||
if (!next) { | ||
throw new Error( | ||
'Add queue is corrupt, there is no next item to process', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be unreachable. I can add a continue clause in case we don't want to throw, or just silence ts if that's the preference
@eoghanmurray friendly ping :) |
@eoghanmurray small ping here :) |
Apologies for delay; do you have this running in production? I've just cherry-picked this PR today and am putting it out in a 'canary' environment to a subset of customers. Reasons to be cautious: I've fixed a completely separate 'node ordering' issue on the replay side of things, where non-anticipated ordering of nodes in a mutation produced worst case performance and ultimately broke the replay (as it triggered a timeout). |
We do not.
Perfect, let me know. I looked at your PR about node ordering issue, but since it's been a while since I looked at this code I lost some context and requirements, so I have a hard time imagining the exact issue you are fixing. |
@eoghanmurray any findings from the canary builds here? |
No issues that I know of |
Does it give the confidence we need to merge this one in? Wondering if we should be cherry-picking it too or it merges here and update our fork |
3ef5f64
to
cd2ea39
Compare
See also #1398 ... I'm trying to work out how these two PRs might interact |
@eoghanmurray I think my PR already implements the changes from that PR as it performs an in-order traversal to enqueue items and then proceeds to pop the items from the queue (lifo), which is the essentially the same as iterating backwards as you process items with the added benefit that we remove recursion |
Super, I missed that it was LIFO. |
Yeah, you want to avoid unshift in this case because it's On. We would have had to use a list in this case. Btw, whats the status of this PR, can we merge this or do you want me to do anything else here? |
I think we should have merged a long time ago, however with something like this it's hard to tell if there isn't some case where it makes things worse (not that I believe this is the case from having examined it). I meant to put together another 'throw everything at it' mutation benchmark test today with this in mind, but ran into some technical issues. I think it will be merged soon. |
You could rebase/merge on master which will re-trigger fresh tests; not sure if the indicated test failure is significant (it is no longer available for viewing). |
cd2ea39
to
ebe0db6
Compare
To summarize benchmarks: they don't show any difference (so probably aren't exacting enough?) |
@eoghanmurray I need to look more into it. There is a chance that benchmarks are not impacted because the depth of the benchmarked DOM is in most cases fairly shallow, which could mean recursive calls are never deep enough to impact these benchmarks. I may create a separate benchmark to test adding deep DOM trees, it should be something we should benchmark for. |
@eoghanmurray I wont have the time to look into adding a proper benchmark here soon. Can this still be merged or should we wait for that? |
@eoghanmurray Any chance we could merge this? I would like to move over to splitting down the other larger PR I had |
@eoghanmurray ping |
@eoghanmurray I realized that the benchmarks were so similar because I wasnt rebuilding the packages and running yarn benchmark from within the rrweb package. I've reran the benchmarks with the following results. The largest change was observed on the append 1000 elements and reverse their order benchmark where perf seems to have improved by 15%. The other benchmarks seem to have all stayed within a few % of the current value. The metrics of each these runs look like standard error, so I dont think we can take them into account 1121ms -> 1075ms 165ms -> 140ms -15% (largest change) 1390ms -> 1401ms 81.7m -> 82.9ms 54ms -> 56ms 22.8 -> 21.8 89.6ms -> 89.8ms 19.3ms -> 17.9ms |
I've rebased on master and cleaned up a little in https://github.com/eoghanmurray/rrweb/tree/genadds-recursion If that branch is acceptable you can
|
(I thought this branch was already merged) |
0435984
to
2b8de82
Compare
@eoghanmurray the tests fail because the order of iteration is reversed, which is expected given the changes (iirc this doesnt have an effect), but please lmk if it does and I can see if there is something I can fix |
So do I understand it correctly again that this changes things from depth first to breadth first? Otherwise we'll need to |
@eoghanmurray I'm going to close this PR as it has been broken down into individual prs |
Is #1539 the equivalent? Are you sure this isn't still important? |
Not sure how representative the benchmark is, so I'm adding the screenshot of the profile (deep DOM tree benchmark). Looking at timings, it seems to take ~18ms before and 13ms after my change, or about ~30% gain. Feel free to do more extensive benchmarking and take those timings with a grain of salt as I just ran the benchmarks once
Before
After
We can see the stack is shallower as expected (and uses less memory, sadly no metrics on that).
A small benefit is that we also remove the overhead of calling into the polymorphic genAdds fn.
The queue has a magic default size of 1000 - feel free to tweak that to some better value, I'm not sure what a sensible default could be here