-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fseventsd
“too many clients in system” error
#185
Comments
fseventsd
“too many clients in system” error
Hi @savetheclocktower! Initially I would want to apply the quick fix, the fallback method to KQueue / generic watcher. But I'm reading in your issue in the pulsar repository that you have a problem with an 500 ms delay, that's literally a sleep in the implementation. KQueue implementation is really old (before FSEvents existence) and I never used it much given that it's a very limiting API, documentation comment:
So it was extremely easy to reach the limit, so it's an inappropriate API for the job. The sleep you saw in the Kqueue implementation is the frequency we refresh the kqueue events due to the fact that kevent call is a blocking call and a kqueue is created for each watch (I guess similar problem than the one you're mentioning about
This is my major worry for the proposed new implementation. I don't think implementing the changes would be much work but my main problem is that I don't have unit-tests to verify that I didn't break anything, and I know that there are several projects using the library currently which makes me very conservative with changes. efsw needs unit-tests, but I never took the time myself (too many side projects). But, if you're already testing it with some pulsar unit-tests may be you can help me giving me access to those tests so I can verify against that (+ my own testing). Also we can work on a different branch until we are confident enough that changes are stable. I must mention that the library is designed with recursive watchers in mind, and does not expect to watch the same directory more than once, there's currently a bug in Linux implementation regarding this topic, because basically for inotify implementation I need to add a watcher for each watched directory (as you mention in your issue thread). I strongly recommend using recursive watcher for your use case, and I must say I don't completely understand how you reach this limit, and I must clarify that I have plenty experience with editors given that I'm working on my own editor which uses I can see that you ditched efsw for macOS implementation, but anyway I'm open to work on a solution. Regards |
The good news is that this new approach passes all our unit tests and is API-compatible with
Our
The When we deprecate
Because I'm constantly hacking on Pulsar, I am a worst-case scenario; so many of my Pulsar packages are loaded from my filesystem in “dev mode,” and in that mode Pulsar watches certain directories in each package for CSS changes so that they can be instantly applied. That's a situation that cries out for something smart enough to realize that it's watching a bunch of directories that, via symlink, share a very close ancestor in the filesystem, and to listen there instead!
Exactly — Node is historically single-threaded and the entire ecosystem is optimized for that. Workers are far more likely to be implemented as separate processes, even in environments where worker threads are an option (since Node ~12, I think.) And even if we could do it in a separate process/thread, it'd be an awkward architecture, and quite inferior to one in which the filtering is done in the native code. It's a bit less painful with the sort of approach that
I noticed that — but I didn't seem to have much luck with reducing that value. It was a few days ago and I was trying lots of things, but it didn't seem to cause |
Excellent! Your implementation looks correct, I'm glad it worked. I'll implement it and do some testing when I have some time. Thanks for taking the time to investigate the issue. |
I ran into a situation where
efsw
was silently failing to notice filesystem events on my Mac and I couldn't figure out why. After lots of troubleshooting, I spotted this line in Console.app:I can find very, very little information about this, but it does seem to be true that
fseventsd
enforces a hard system-wide limit on clients. Some experiments seemed to confirm that every individual directory added viaaddWatch
counts as a “client” for these purposes — there's a 1:1 correlation between calls toaddWatch
and calls toFSEventStreamCreate
.The error wouldn't actually happen until you called
FSEventStreamStart
—efsw
doesn't check its return value, but it can returnfalse
for opaque reasons:My use case is a Node module that wraps
efsw
and exposes a file-watcher API in JavaScript. It's replacing an older library that did manual filesystem watching. In a single session, as many as 100 directories might be watched, and multiple sessions can run at once. (It's used by an Electron code editor that can have any number of windows open.) So it's extremely plausible for us to run up against this limit of 1024 clients on our own.Our immediate workaround for this problem is to employ strategies to encourage watcher reuse and make it possible for several “wrapped” watchers to use the same underlying “native” (
efsw
) watcher:/foo/bar/baz/
can reuse an native watcher on/foo/bar
if the latter already exists/foo/bar
when the/foo/bar/baz
watcher already exists — creating a new native watcher at/foo/bar
, pointing both wrapped watchers at it, and destroying the/foo/bar/baz
native watcher/foo/bar/zort
, adding a wrapped watcher at/foo/bar/baz
triggers a new native watcher at/foo/bar
that can supply events for both paths and destroys the/foo/bar/zort
native watcherThis works well enough, but it increases the complexity of the implementation quite a bit. Our watchers don't need to be recursive, but this reuse strategy means that each watcher must be initialized as a recursive watcher (in case it is reused later). Since the reuse logic lives in the JavaScript, it's also the JavaScript code's job to receive all filesystem events and decide which ones match up with active watchers.
It's also possible to reuse too much — after all, we could create one watcher at the volume root and have all of our wrapped watchers use it, but we'd be “drinking from the firehose” and asking our wrapped watchers to spend lots of time processing events that they will eventually ignore (because they happen in directories that aren't being watched).
The quickest fix here, I think, would be to check for a
false
return value fromFSEventStreamStart
and handle it by falling back to a generic watcher (or thekqueue
watcher, perhaps). But that would be a pretty shallow fix.The more thorough way to fix this would be to reduce the number of
FSEventStreamRef
s created. The documentation forFSEventStreamCreate
suggests that there is no built-in limit to the number of paths that can be watched by a single stream; but research suggests that you'd have to restart the stream every time you change the list of watched paths.I don't have the C++ experience to implement this, but imagine:
efsw::FileWatcher
aims to manage two differentFSEventStreamRef
s: one for paths that need recursion and one for paths that do not need recursionwatch
is called (or, if it’s called before any paths have been added, the first timeaddWatch
is called), one or both of theseFSEventStreamRef
s is createdaddWatch
work as follows:FSEventStreamRef
is created with the new list of paths and is startedFSEvents
in such a way that no filesystem events are lost during the handover; but if I'm wrong, then the swap could instead be staggered, with the new stream starting before the old one is stoppedefsw::WatchID
s and figuring out how to match up filesystem events to the watchers that initiated them; easier for the non-recursive watchers (each event belongs to exactly oneefsw::WatchID
) than the recursive watchers (each event can belong to multipleefsw::WatchID
s)removeWatch
would behave similarly to calls toaddWatch
— create a new stream without the removed path, then swap it in.The text was updated successfully, but these errors were encountered: