-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
problems with GC memory reclamation on wpe 2.38 #1363
Comments
Hi Tomasz, Sorry for the delay. I have been struggling to get a WPE build working on a rpi remotely all last week, but my pi finally arrived in the mail today so I should be able to reproduce this tomorrow. I have tried unsuccessfully to reproduce this on desktop with PSON (process swap on navigation) disabled, but I will try a few more configurations. I am curious though; why does wpe navigate in the same process? This is not an area that I am at all familiar with. My intuition is that this would be faster, and also reduce memory fragmentation, without too much memory overhead. |
hello @justinmichaud, |
I was able to get a heap snapshot ('gcdebugging') type, it's here: snapshot-leak-gddebugging.json.z01.txt snapshot-leak-gddebugging.json.zip.txt (Had to split the zip file - too big for github - and add '.txt' suffixes to upload - github doesn't accept 'z01' files) Once unzipped, it is possible to open that file in web inspector; it is visible that the heap still contains the objects referencing both pages that I've visited ('https://stv.prd.sky.ch/store/' and 'https://widgets.metrological.com/lightning/liberty/2e3c4fc22f0d35e3eb7fdb47eb7d4658#boot). Also, curiously, web inspector shows the Window objects that seem to be connected to both these pages (and 3 more): I've reformatted the GC roots information from this heap, it is here: snapshot-leak-gddebugging.rootsinfo.json Again, the heap seems to contain GC roots that reference both old & new pages. |
also, regarding PSON - this seems to be disabled by default on wpe:
|
One more update, regarding our platform - it might not be obvious from the description, but this is 32 bit arm (armv7l) cat /proc/cpuinfo processor : 1 processor : 2 processor : 3 Hardware : Broadcom STB (Flattened Device Tree) |
Hey! No worries, I just got back from vacation myself. I am still working on this, but getting an armv7 setup working is proving to be challenging. In the meantime, I have been trying to do some investigation on the desktop port. I would like to be able to log where the roots are coming from without WebInspector, because it can keep some things alive. I am also trying to find a way to manually trigger the memory pressure handler. Does this issue reproduce on 64-bit builds for you? |
hello,
fwiw, I achieved something like this by directly calling GCController::singleton().dumpHeap() via WebProcess::garbageCollectJavaScriptObjects, which can be invoked from wpe thunder plugin in our integration. I also needed to modify HeapSnapshotBuilder.cpp to dump directly to the file, otherwise it will quickly require a lot of memory (json StringBuffer resizes exponentially)
I haven't tested that on 64 bit builds. I could probably try on my desktop with x86_64, but what exact setup would you suggest? wpe 2.38 with the cog (https://github.com/Igalia/cog) application? |
Don't worry, I thought maybe you may have had a 64-bit device handy running the same WPE build and software stack handy, since that would reduce a number of variables. I'll try to reproduce this on desktop too. |
Hello, I've been doing some investigation of this issue in parallel, and though I'd share what I've seen so far. Since the web inspector is rather underwhelming when it comes to analysing the heap dumps, I've prepared a small tool myself (see nodereader-networkx.py attached). And did following test, typical of our application flow:
And after that, I took a heap (gcdebugging type) snapshot. With the help of this python tool (using 'node_by_class Window command') I can see that there are multiple Window objects; the most recent one still has proper 'Label' with the latest url; the old ones have 'about:blank' labels (but still exist, and via some links, they can be connected to these previously visited apps). Something like this: Enter node id: node_by_class Window Taking a look at the root objects that are holding these Window objects ('closest_root_paths $window_object_node_id' command), it seems that these old (not the latest) Window object are mostly held by root paths that look like this: (4523988 class 'TextTrack' label 'url https://tv.playsuisse.ch/Landing' cell 0x79675c98(0xab95d3f8) ROOT) -> [0/na] -> (4523988 class 'Structure' label 'TextTrack' cell 0x7a0b62b0(0x0)) -> [0/na] -> (4523988 class 'Window' label 'about:blank' cell 0x7e812038(0x0)) (4523988 class 'CSSStyleDeclaration' label 'url https://tv.playsuisse.ch/Landing' cell 0x7a0d45e0(0xab938294) ROOT) -> [0/na] -> (4523988 class 'Structure' label 'CSSStyleDeclaration' cell 0x7a0d8c30(0x0)) -> [0/na] -> (4523988 class 'Window' label 'about:blank' cell 0x7e812038(0x0))
I've looked into Source/JavaScriptCore/runtime/Structure.cpp / Structure::visitChildrenImpl; during GC it is indeed visiting m_globalObject, guess that's where the GC link comes from. I've actually tried to comment this part out, like: template
(3793819 class 'CSSStyleDeclaration' label 'url https://stv.prd.sky.ch/store/' cell 0x764c7520(0xab9144c4) ROOT) -> [0/na] -> (1934867 class 'Structure' label 'CSSStyleDeclaration' cell 0x82263520(0x0)) -> [0/na] -> (1922666 class 'CSSStyleDeclaration' label '' cell 0x84bc3c30(0x0)) -> [1/setProperty] -> (1926559 class 'Function' label 'setProperty' cell 0x7e2b4be0(0x0)) -> [0/na] -> (1913906 class 'Window' label 'about:blank' cell 0xab8d8038(0x0)) so, now it's like 'class X -> structure of class X -> class X again (but some different cell/object now?) -> some Function (like, 'setProperty' here) -> Window' Don't know how to get rid of this 'Function' link. It seems that this global object / Window instances are held locally, while this global object actually changes when you navigate (right?) Attaching this new gc debugging snapshot (5-snapshot-then-exited-sstore.json), it should be possible to look into it with this helper tool I've attached. |
That looks suspicious! Why is this a root? I'll investigate, but this seems like the culprit. Just an update: I fixed a bug in WebInspector so linux users can actually load heap dumps now. This python tool is super helpful, thanks! Separately, I created a way to manually trigger the memory pressure handler, which does free up some of the memory. I think this might be related to a different issue we were experiencing with MotionMark. |
Weak references now display the correct root marking reason. Weak references shouldn't be marked unless they are still reachable, so I am trying to see why these weak references are reachable. It is unfortunate that they make it harder to see where the "true" root is coming from. I have made a little script that exports the heap snapshot to graphml, and I am opening it in Gephi. I will see if I can see any interesting structure |
Update: I see that there are a bunch of Handle<>s keeping the window object alive. I am adding some logging to see where these are allocated from. My guess at the moment is that there is somewhere in our generated dom bindings code that is keeping these alive when they shouldn't. |
I have removed as many Strong<>'s as I can and logged the rest. It seems like ScriptController is keeping alive the JS Window objects via a Strong. My hypothesis is that a C++ reference cycle is keeping Frame alive, which is the owner of ScriptController. |
Can you see if this fixes your issue? I will check on my rpi tomorrow, I am just suggesting it here in case you can get to it first (timezones):
For completeness, I am running with the following options and I can no longer reproduce the issue I was seeing on ToT desktop. That doesn't necessarily mean anything about WPE.
I saw the following trace:
|
Hello, I've tried the InspectorInstrumentation.cpp change; something like: --- a/Source/WebCore/inspector/InspectorInstrumentation.cpp
with following envs: JSC_useSourceProviderCache=0 The change seems to be applied, I can see the added logs - but unfortunately it doesn't seem to improve the memory situation; nothing more seems to be released on full GC. I have not applied the previous patch you've supplied (heapSnapshot.patch.txt) - it would require some rebasing for me, it seems to be prepared over 2.42 version (looking at Heap.cpp differences). Should I try with these changes applied first? Another thing, I think I've found a reliable workaround on our platform - you can see the change here: 44bb7312475166426ede9a046e353bbd9e9ef314.patch.txt So, what the change does is disabling Conservative Scan on-demand. And I do it only when manually forcing GC (from our thunder plugin, via webkit_web_context_garbage_collect_javascript_objects call, which goes to WebProcess::garbageCollectJavaScriptObjects). I only do it in 'safe point', when wpe have navigated to 'idle/#boot' webpage. And this has the effect of releasing the memory momentarily, just like on wpe 2.22 - the heap is mostly clean. I've tested this workaround on 2 of our platforms, and it seems to work 100% of the time on both (from what I've tested so far). My current understanding is:
https://3df51d5ca14f.edge.sdk.awswaf.com/3df51d5ca14f/4104523408d1/challenge.js which contains a long Uint32Array of 32 bit pseudo-random values. If conservative scan tries to interpret that as pointers, it is almost 100% sure that some of them will point to some allocated heap objects (like, with 5% mem occupied by the app data, probability of not hitting this area on random - (95/100)^128 - is already around 0.1%). If that hypotesis is right, would we be able to do anything else about it? This seems to be what conservative root scan should work like, after all. Probably this is much smaller issue on 64 bit ... And what do you think about such a workaround? (again, this would need to be done only in 'safe places', and the interface would be something different that the current atomic_bool ...) |
This was my fear. That being said, I am not so sure that this is the case we are encountering yet. In JSC, we only conservatively scan the stack + registers. It is very easy for something to find its way in there accidentally, but in that case it would show up clearly in the heap snapshot. Skipping Weaks, the heap mark reason should be ConservativeRoots if this was the case. My patch also fixes the heap root mark reason being NONE, but only for Strongs. So I am not so sure that this is the case. In the meantime, if you can try only disabling parts of the conservative collection, that might help narrow it down. Make sure the VM top callframe is being set correctly too, in case we are scanning too much. For a workaround you can actually ship (in addition to providing a performance benefit), you can enable PSON. Even if conservative scanning is the cause, this is the only sound workaround that I can think of. If anything blocks you from doing that, we can try to fix it. |
PSON is disabled on WebPlatformForEmbedded because of the extra memory usage of having several web processes alive simultaneously. WPE is usually running on an environment where memory is scarce, and those extra MB are very valuable. |
Hello @justinmichaud ,
|
@magomez if there is enough memory to jit, there should be enough to use pson. We should figure out what issues are blocking enablement. Even if we discover conservative scanning is not the cause of this leak, it totally may cause memory issues in the future that are far greater than the cost of the prewarmed process. We can disable the back-forward cache. The workaround above is very incorrect and will lead to crashes. This is the only workaround that can ship. |
so, seems this GC problem can be narrowed down to: conservativeRoots.add(currentThreadState.stackTop, currentThreadState.stackOrigin, jitStubRoutines, codeBlocks); in MachineThreads::gatherFromCurrentThread; disabling that part seems enough to prevent weird GC behavior. From some other debugs I've added, seems the stack span being considered is eg. like 'begin: 0xbe84c680 end: 0xbe84e000', 6528 bytes. Some values that are considered look like pointers, some not: ConservativeRoots::genericAddPointer p 0xb634b29c (there is also this removeArrayPtrTag invocation in genericAddPointer, but it doesn't seem to modify the pointers on our arch) |
I also took a coredump around MachineThreads::gatherFromCurrentThread; from what I see:
currentThreadState.stackTop: 0xbeacf680 currentThreadState.stackOrigin: 0xbead1000.
(gdb) f 24
(gdb) bt
(gdb) f 57 so the frame is at 0xbead0128 while currentThreadState.stackOrigin is 0xbead1000, quite a bit higher - might that be the issue of 'scanning too much'? In any case, seems to me that for this particular case - when GC is invoked via WebKit::WebProcess::garbageCollectJavaScriptObjects (called via VM::whenIdle), from IPC message - the part of the stack that is being scanned for roots (frames #24 - #58) can only provide false positives, as it happens to us. I mean, in this case - seems there is no user script processing related code, only some internal GC logic, IPC and glib/gloop. And there are like >1600 potential 'addresses' to scan, that with 32 bit address space and relatively big javascript heap makes us hold the memory forever ... |
Hello @justinmichaud, One more debugging result that might be interesting: I've tried to limit the CurrentThreadState stack range that MachineThreads::gatherFromCurrentThread is considering; simply something like this: Source/WebKit/WebProcess/WebProcess.cpp: extern std::atomic<void*> GCJSO_STACK; And then in Source/JavaScriptCore/heap/MachineStackMarker.cpp, MachineThreads::gatherFromCurrentThread, if this 'GCJSO_STACK' falls in the range [currentThreadState.stackTop, currentThreadState.stackOrigin] - stackOrigin is corrected to 'GCJSO_STACK' value. This makes this range much smaller: (debug) currentThreadState.stackTop: 0xbec01660 GCJSO_STACK: 0xbec017ec currentThreadState.stackOrigin: 0xbec03000; correcting stackOrigin to GCJSO_STACK And from what I've tested so far, this seems enough to get the correct behavior. So suppose indeed, as you suggested - VM top callframe not being set correctly might be the root cause of the problem here. |
Nice, this is a super promising lead! Sorry for the slow progress over the past couple days, I have been taking some PTO. I have been attempting to upstream my debugging tools, but in the process, I broke ToT again. I have been trying to understand why, since the heap snapshots you provided seem to suggest that some Strong<> is responsible for keeping alive these window objects. It is sounding like the ToT issue may be unrelated, which is not surprising. I will investiagte how VMTopCallframe is being set, this sounds very suspicious. For reference: WebKit/WebKit@b52c0f4 seems related. I will follow up when I have more information, thank you for your thorough debugging! |
Turns out that conservative GC root scanning is scanning too deep into the stack & this results in GC being prevented (for more details see the ticket & upstream issue: WebPlatformForEmbedded#1363 This workaround introduces 'stack guards' that prevent this in particular cases, that is - if there is gloop main loop routine in the stack (see RunLoopGLib), the GC root search will not go deeper that this frame. This is enough to make GC work when invoked synchronously, via VM::shrinkFootprintWhenIdle; therefore for the workaround to work, wpe plugin needs to call that (another part of the workaround is implemented in WebKitBrowser plugin)
* ARRISEOS-45892 limit conservative scan range Turns out that conservative GC root scanning is scanning too deep into the stack & this results in GC being prevented (for more details see the ticket & upstream issue: WebPlatformForEmbedded#1363 This workaround introduces 'stack guards' that prevent this in particular cases, that is - if there is gloop main loop routine in the stack (see RunLoopGLib), the GC root search will not go deeper that this frame. This is enough to make GC work when invoked synchronously, via VM::shrinkFootprintWhenIdle; therefore for the workaround to work, wpe plugin needs to call that (another part of the workaround is implemented in WebKitBrowser plugin) * ARRISEOS-45892 limit conservative scan range Small correction: immediate garbageCollectNow is still useful, since under heavy load shrinkFootprintWhenIdle might be executed quite late.
I have finally gotten a fully working RPI setup with buildroot, debugging info, and cogctl allowing JS input. I made a hack that also fixes the memory leak on ToT by making some things weak references. Of course this can't ship, but it helps to narrow it down. This also fixes the issue on the pi:
I think it is important to find a good explanation for this. Since I observe the same behaviour on 64-bit, it makes me wonder why scanning too deeply could be the cause. I'll follow up with more results. |
* ARRISEOS-45892 limit conservative scan range Turns out that conservative GC root scanning is scanning too deep into the stack & this results in GC being prevented (for more details see the ticket & upstream issue: WebPlatformForEmbedded#1363 This workaround introduces 'stack guards' that prevent this in particular cases, that is - if there is gloop main loop routine in the stack (see RunLoopGLib), the GC root search will not go deeper that this frame. This is enough to make GC work when invoked synchronously, via VM::shrinkFootprintWhenIdle; therefore for the workaround to work, wpe plugin needs to call that (another part of the workaround is implemented in WebKitBrowser plugin) * ARRISEOS-45892 limit conservative scan range Small correction: immediate garbageCollectNow is still useful, since under heavy load shrinkFootprintWhenIdle might be executed quite late.
* ARRISEOS-45892 limit conservative scan range Turns out that conservative GC root scanning is scanning too deep into the stack & this results in GC being prevented (for more details see the ticket & upstream issue: WebPlatformForEmbedded#1363 This workaround introduces 'stack guards' that prevent this in particular cases, that is - if there is gloop main loop routine in the stack (see RunLoopGLib), the GC root search will not go deeper that this frame. This is enough to make GC work when invoked synchronously, via VM::shrinkFootprintWhenIdle; therefore for the workaround to work, wpe plugin needs to call that (another part of the workaround is implemented in WebKitBrowser plugin) * ARRISEOS-45892 limit conservative scan range Small correction: immediate garbageCollectNow is still useful, since under heavy load shrinkFootprintWhenIdle might be executed quite late.
* ARRISEOS-45892 limit conservative scan range Turns out that conservative GC root scanning is scanning too deep into the stack & this results in GC being prevented (for more details see the ticket & upstream issue: WebPlatformForEmbedded#1363 This workaround introduces 'stack guards' that prevent this in particular cases, that is - if there is gloop main loop routine in the stack (see RunLoopGLib), the GC root search will not go deeper that this frame. This is enough to make GC work when invoked synchronously, via VM::shrinkFootprintWhenIdle; therefore for the workaround to work, wpe plugin needs to call that (another part of the workaround is implemented in WebKitBrowser plugin) * ARRISEOS-45892 limit conservative scan range Small correction: immediate garbageCollectNow is still useful, since under heavy load shrinkFootprintWhenIdle might be executed quite late.
Turns out that conservative GC root scanning is going too deep into the stack and this may prevent the collection of objects - see: WebPlatformForEmbedded#1363 The change introduces 'stack guards' that prevent this in particular cases, that is - if there is gloop main loop routine in the stack the GC root stack search will not go deeper that this frame. This avoids some 'false positives' conservative roots that can make the app hold a lot of memory.
Turns out that conservative GC root scanning is going too deep into the stack and this may prevent the collection of objects - see: WebPlatformForEmbedded#1363 The change introduces 'stack guards' that prevent this in particular cases, that is - if there is gloop main loop routine in the stack the GC root stack search will not go deeper that this frame. This avoids some 'false positives' conservative roots that can make the app hold a lot of memory.
Hello,
On our platform (LGI boxes, ARM / Broadcom soc, e,g BCM72180, and other Broadcom platforms we have) we are experiencing the issue where sometimes wpe GC seems to not able to reclaim the memory after exiting some of the heavier web apps. Some example are Sky apps:
https://stv.prd.sky.ch/store/
https://stv.prd.sky.ch/show/
but similar effect could be observed e.g. with Apple TV+ (https://atve.tv.apple.com/94819831-5404-4438-810e-afb648d6a826/tvw_0e7f0489b969451a9f79941a0d18fad1/). These apps normally needs more than 200MB to work, and we have a container cgroup limit of 550MB, so when the memory is not reclaimed quickly enough, wpe gets killed.
Often when these applications exit, I can see that even after GC Full Collection (seen with JSC_logGC=1) the memory is still not reclaimed. Like here - GC was run, spent quite a lot of time, visited a lot of memory, but still wasn't able to release much ('Full sweep: 142348kb => 141164kb')
Jul 09 10:00:19 E0B7B1-APLSTB-300037462300 wpe.sh[13290]: [GC<0x8aff5050>: START M 142134kb => FullCollection, v=0kb (C:0 M:0 P1:0) o=0 b=15866 i#1:N<CsMsrShDMsm(0)> 1+0 v=109974kb (C:55860 M:0 P1:54113) o=11 b=15866 i#2:N 366+0 v=109990kb (C:55872 M:0 P1:54118) o=13 b=15866 i#3:P 0+0 v=109990kb (C:55872 M:0 P1:54118) o=13 b=15866 i#4:P<WsOJwMsrShCsMsm(0)DDomoCb> => 120237kb, p=2353.949000ms (max 2353.949000), cycle 2353.843000ms END]
Jul 09 10:00:19 E0B7B1-APLSTB-300037462300 wpe.sh[13290]: GC END!
Jul 09 10:00:19 E0B7B1-APLSTB-300037462300 wpe.sh[13290]: [GC<0x8aff5050>: finalize 49.610000ms]
1720512019 209
Jul 09 10:00:19 E0B7B1-APLSTB-300037462300 wpe.sh[13290]: [GC<0x8aff5050>: Full sweep: 142348kb => 141164kb, 78.841000ms]
In our setup, each time we leave any app, we return to metro '#boot' (https://widgets.metrological.com/lightning/liberty/2e3c4fc22f0d35e3eb7fdb47eb7d4658#boot). This 'application' is pretty small, and normally uses maybe 20MB of ram. But even when we wait for some time and then browse to some other webapp - the memory often is not released, keeps on piling up, and so we run OOM after several iterations.
I've tried with some upstream changes, like this one:
#1285
Also tried some additional modifications to GC process, like here:
LibertyGlobal@ab127e3
(for example I've modified Source/bmalloc/bmalloc/AvailableMemory.cpp to check for WPE_RAM_SIZE when determining available memory; we are running wpe in lxc container, so there are different limits that what sysinfo returns I believe. but this doesn't seem to fix the problem by itself; also - I've enabled some bmalloc verbose flags & from what I've seen it doesn't seem that bmalloc is to be blamed here)
I was also checking out many JSC options, like:
JSC_forceRAMSize=576716800
JSC_mediumHeapGrowthFactor=1.1
JSC_smallHeapGrowthFactor=1.1
JSC_largeHeapGrowthFactor=1.1
JSC_smallHeapRAMFraction=0.25
JSC_mediumHeapRAMFraction=0.5
JSC_customFullGCCallbackBailThreshold=1.0
JSC_maximumMutatorUtilization=0.6
JSC_minimumGCPauseMS=1
JSC_useStochasticMutatorScheduler=false
JSC_gcIncrementScale=1
JSC_criticalGCMemoryThreshold=0.5
JSC_forceDidDeferGCWork=true
JSC_useGlobalGC=true
Some of these sometimes seem to help for a while, but then - the app runs into the same problem.
I've also added some logic (on thunder plugin side) to force GC in regular intervals after the application returns to #boot app (via webkit_web_context_garbage_collect_javascript_objects that ends up calling WebProcess::garbageCollectJavaScriptObjects); this generally helps, but even this is not 100% reliable. Even if this sometimes makes the situation better, at times I still run into the scenario when the app memory is not reclaimed, even when GC is invoked manually from the inspector (via $vm.gc(), if enabled via JSC_enableDollarVM=true). This behavior seems a bit random - for example it might start working fine after the reboot. I've checked wpe 2.22, and didn't observe these issues; memory is reclaimed pretty quickly when I leave the app.
Looking into web inspector / Timelines / memory tab, I see that most of this 'unreclaimable' memory is still counted as 'JavaScript' - in line with what I can see in the GC run results.
So for example, on our platform, the memory graph for 2.22 versus 2.38 compares like this:
for 2.22: see attached wpe-2.22-graph.png
for 2.38: see attached wpe-2.38-graph,png
So while 2.22 was running the test scenario just fine, 2.38 OOMed pretty quickly.
I've tried similar test on wpe 2.38 with different platform, the 'Video Accelerator' (https://rdkcentral.com/rdk-video-accelerator). Here, there are no containers, but when I check the WPEWebProcess Rss memory after exiting these apps, I see something like this: (checked after each iteration of 'enter the app - exit to #boot')
RssAnon: 14580 kB
RssAnon: 14580 kB
RssAnon: 211136 kB
RssAnon: 215216 kB
RssAnon: 67124 kB
RssAnon: 228000 kB
RssAnon: 228052 kB
RssAnon: 236304 kB
RssAnon: 381584 kB
RssAnon: 234400 kB
RssAnon: 234400 kB
RssAnon: 235620 kB
So sometimes the memory gets cleaned up, sometimes - keeps on piling up. With our 550MB memory limit, 381584 kB in idle state might be enough to run into OOM when we try to start eg. Sky Store app.
Do you have some clues what could cause the difference in GC behaviour in 2.38, maybe there are some other config options we are not aware of?
The text was updated successfully, but these errors were encountered: