-
-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pulsar-next] Figure out file-watcher approach that avoids crashes #1104
Comments
To elaborate on this: I had a theory that Still, I tried this: function unloadDeferralListener(event) {
event.preventDefault();
console.log('Waiting for unload…');
NativeWatcher.waitForStop().then(() => {
console.log('…done.');
window.removeEventListener('beforeunload', unloadDeferralListener);
location.reload();
});
}
class NativeWatcher {
static promises= new Set();
static initialized = false;
static async waitForStop() {
let { promises } = NativeWatcher;
if (promises.size === 0) {
return Promise.resolve();
}
console.log('Waiting for', promises.size, 'promises');
await Promise.allSettled(promises);
}
static initialize() {
if (this.initialized) return;
addEventListener('beforeunload', unloadDeferralListener);
this.initialized = true;
}
// ...
} The static In the This is hacky, but it works. It's hacky because it'd probably fall apart as a strategy if more than one thing on the page assumed it could use this technique. But it's good as a best-alternative. If we fail to solve this in a more elegant fashion, I could zhuzh this up by creating a more formal internal API around it so that it could safely be used by more than one thing. |
Good news! My horrible idea from the previous comment won't be needed. The specific crash I've been seeing is the fault of our path-watching strategy:
It took me a while to process the fact that the crash seemed to be happening during a call to Why? Because we were disposing of watchers en masse during page unload. None of those watchers realized that the point was to disable all watchers, so they tried to start new ones in a terminating renderer environment. This is fixable simply by setting a flag at the top of So this is the right answer for now; setting that flag seems to consistently prevent renderer process crashes. Still, it's not great to know that there's a way for the renderer process to crash, even if we're using entirely thread-safe native libraries! In fact, it happens because Node has no other way of signaling that something's gone wrong at that point:
The ideal situation would be for I've filed an issue about this with It's not a high-priority thing, but it might be something we start thinking about if we find that a community package is using |
Good news! |
There's a second half of this that I failed to mention. The old node-pathwatcher library is something that we'd love to eliminate our dependency on. In an ideal world we'd use Those classes are thin, pleasant abstractions around file reading/writing and directory iteration, but they also accept callbacks like It's not a big deal to ask people to wait briefly when setting up file monitoring, but the issue is that there isn't an easy way to add asynchronicity to an existing API that lots of unmaintained packages rely on. So there are a few options:
I've spent more than a week on option 2 and made quite a bit of progress despite it being a C++-heavy exercise. It's a huge headache to rewrite the native bindings for a library that largely eschewed even what was provided to it by The hardest part has been Windows support. I can get it to pass locally in almost all cases, but a few edge cases elude me, and I get different results locally than I do in CI. So the best option might actually be to wrap another library. I just discovered efsw — a filesystem watcher written in C++. There's an NPM library that wraps it, but it uses Amazingly, this might be the simplest option of them all; fingers crossed. |
@savetheclocktower a question about the problem you had with combining paths into a single watcher: was that in NSFW, right? I'm asking because, for me, it would be amazing if we had one path watching solution, not two as we have today (like PathWatcher and NSFW). So if we could, somehow, make Some advantages that I see on using the same pathwatcher everywhere:
Am I on the right track, or is this not possible for reasons you found out while investigating this issue? |
The one issue I had with It's by far the most popular filesystem watcher in Node that meets our needs, so I feel much more confident rolling with The upside of |
OK, my best effort so far is this branch. It defers to There are some subtle differences in what I've got CI that tests the library in a multi-context environment (worker threads) in addition to the existing suite of specs, and everything is green right now. The only concession I really had to make in the specs was to introduce a small delay after each test; without it the tests ran together too much (since most tests target the same watched file/directory) and caused flakiness. Using it in an Electron render process is still slightly painful. But it means you've got to be careful about unwatching everything in the case of a window reload! The equivalent scenario in a web browser — a not-yet-fulfilled The fix is pretty simple, as it turns out: be sure to call You might ask “how did I tried to add some logging to the existing stable In the meantime, we'll have to make that change in our cleanup hygiene (an early call to |
Everything was going great with the When trying to figure out why the tests suddenly weren't passing, I opened up Console.app and found a new error message I hadn't seen before:
This isn't a limit that I could see us hitting under normal circumstances… except that each individual request to watch a path counts as a single “client” in Right now, I think Another possibility would be some sort of automatic failover to |
The problem with Since this is only a problem on macOS, I've got a nuclear option in my back pocket: we could sidestep The other option is one that I'm pretty bullish on: there's a weird piece of code in Pulsar whose purpose is to minimize the number of native file-watchers needed for our built-in file-watcher API that lives at There's a huge opportunity here because are watchers are needlessly redundant. On a simple project I observed 91 watchers during a test — albeit a test in dev mode where a lot of packages‘ asset directories were being monitored for changes via I've copied the |
Preliminary results are positive. The 91 native watchers I noticed in a test window a few days ago are, with this new approach, already down to 13. This can surely be streamlined further; I've just got to figure out how to do it. There's lots of redundant watching of all entries inside One of the side-effects of this approach in Luckily, weird things like that can be filtered out pretty easily: in the C++, record a timestamp when you start watching for a given path, and then you'll be able to consult that timestamp later. If the file's own creation time predates the time you started watching, ignore its Anyway, this should be good enough in the near term. I'm sure there are some quirks that make it different from old |
This has been a time sink! Updating The
This is frustrating because I’ve proceeded with the work described above and introduced new strategies for consolidating native watchers:
These strategies reduce the number of native watchers we use, but they vastly complicate the codebase and present hard trade-offs that force us to use imperfect heuristics. Note Some definitions might be helpful here:
Suppose you have two watchers: one at Sharing native watchers increases this burden; that’s the main trade-off. For this reason, there’s a heuristic in place that would prevent the scenario in the above paragraph; the shared ancestor directory would be rejected as too distant. Without a heuristic like this, all watchers on the same volume would eventually be consolidated into a single native watcher at the root of the filesystem, and that’s silly. And, since file-watching APIs vary greatly in design across platforms, it’s ill-advised to use one fixed strategy across all platforms. If native watchers are to be reused at all, then they all must be initialized as recursive in Windows is a bit different — every watcher created via The unit tests are passing and I haven’t had to renegotiate For one thing: In the meantime, there are other options. All of them probably involve reverting to my solution from about a week ago before I brought the registry into the picture, since that was the most reliable solution for non-macOS platforms. The options for macOS are as follows:
There’s one other thing I’d love to fix: right now, every single filesystem event processed in the C++ bindings involves invoking a callback defined in JavaScript. It’s a no-brainer to batch events here (where possible) to reduce the cost of going across that bridge. |
I went with option 2…
…and it's gone beautifully. I'll try this out in Pulsar tomorrow and hopefully bring an end to this saga. |
Thanks in advance for your bug report!
What happened?
As far as I can tell, nsfw is the most widely used library for listening to filesystem events. Atom has implemented a couple of their own , and though it's tempting to try to adapt one of those to the exact needs of modern Electron, it's not much fun.
The latest version of
nsfw
is almost perfect, but it still manages to crash Pulsar on window reload about half the time. After some research, this seems to happen because:NativeWatcher
insrc/path-watcher.js
on startupdispose
and callstop
on the innernsfw
watcher, which is an async operation that returns a promise; because of reasons explained below, this often starts a different watcherstart
method outlives the web page that initiated it; when it resolves, it does something that causes a crash, presumably because it's trying to access something that no longer existsI can think of a few ways to handle this:
nsfw
’s maintainer would accept a patch.stop
method resolves; I've hacked together a proof of concept and it actually works, but it's a bit impractical in the general case.AtomEnvironment
instance is being destroyed. (This is what we've implemented for now.)Pulsar version
pulsar-next
Which OS does this happen on?
🍎 macOS
OS details
Sonoma 14.7
Which CPU architecture are you running this on?
Apple M-series
What steps are needed to reproduce this?
On PulsarNext…
Additional Information:
No response
The text was updated successfully, but these errors were encountered: