Problem with the g_max_inflight_buffers design? #2

bhaller · 2020-06-07T15:20:52Z

Hi Nick. Your code has been useful for me to understand how to make a screensaver that uses Metal, thanks. I think there might be an issue with it that needs fixing, though. I'm not sure whether this has user-visible consequences or not.

Here's the problem. The CVDisplayLink timer set up in -startAnimation ticks along, firing DisplayLinkCallback periodically, which calls -animateOneFrame. That keeps going unless/until -stopAnimation is called, or the screensaver wakes up. So far, so good.

On the other hand, -animateOneFrame basically makes an autorelease pool and calls -re_render, and -rc_render does the following:

dispatch_semaphore_wait(_inflight_semaphore, DISPATCH_TIME_FOREVER);

This blocks the thread until _inflight_semaphore says we're clear to proceed with rendering the next frame. And that makes sense; we don't want to get ahead of the GPU too much, and we have a limited number of vector buffers to work with. So far, so good.

The issue, it seems to me – if there is an issue! – lies in the way these two pieces of code interact. Suppose the GPU is chronically behind in handling the rendering; it can't keep up with the frame rate we'd like to feed it. Once it gets sufficiently far behind, the dispatch_semaphore_wait() call will block waiting for the GPU to catch up; but the CVDisplayLink will keep firing, and so further calls will be made to -animateOneFrame, and onward to -rc_render, which will also block. So if the GPU is slow enough, and just can't keep up, you'll start burning through threads, all blocked in dispatch_semaphore_wait(), adding a new one every 60th of a second or whatever, and releasing one whenever the GPU finishes rendering a frame and _inflight_semaphore allows one thread to unblock.

Is this not a problem for some reason? I guess I'm assuming that CVDisplayLink fires on whatever free thread it is given by GCD, and that if that thread blocks, it will then fire next time on whatever different thread it is given by GCD the next time around, or something. But maybe if the thread that it fires on blocks, it just doesn't fire any more until that thread returns?

I'm wondering about all this in the context of my own screensaver, Satori, which I just recently ported to Metal (http://www.sticksoftware.com/software/Satori.html). Based on some Apple sample code, I set up my CVDisplayLink timer a bit differently:

- (BOOL)setupCVDisplayLinkForScreen:(NSScreen*)screen
{
	CVReturn cvReturn;
	
	// Run our own timer, synchronized to the display's refresh rate
	if (!displaySource_)
	{
		// The CVDisplayLink callback, DispatchRenderLoop, never executes on the main thread. To execute rendering on the main thread, create a dispatch source
		// using the main queue (the main thread).  DispatchRenderLoop merges this dispatch source in each call to execute rendering on the main thread.
		displaySource_ = dispatch_source_create(DISPATCH_SOURCE_TYPE_DATA_ADD, 0, 0, dispatch_get_main_queue());
		__weak ComStickSatoriView *weakSelf = self;
		dispatch_source_set_event_handler(displaySource_, ^(){
			[weakSelf animationTick];
		});
		dispatch_resume(displaySource_);
	}
	
	if (displayLink_)
		CVDisplayLinkRelease(displayLink_);
	
	// Create a display link capable of being used with all active displays
	cvReturn = CVDisplayLinkCreateWithActiveCGDisplays(&displayLink_);
	
	if (cvReturn != kCVReturnSuccess)
		return NO;
	
	// Set DispatchRenderLoop as the callback function and supply displaySource_ as the argument to the callback.
	cvReturn = CVDisplayLinkSetOutputCallback(displayLink_, &DispatchRenderLoop, (__bridge void *)displaySource_);
	
	if (cvReturn != kCVReturnSuccess)
		return NO;
	
	// Associate the display link with the display on which the view resides
	CGDirectDisplayID viewDisplayID = [self.window.screen.deviceDescription[@"NSScreenNumber"] unsignedIntValue];
	
	cvReturn = CVDisplayLinkSetCurrentCGDisplay(displayLink_, viewDisplayID);
	
	if (cvReturn != kCVReturnSuccess)
		return NO;
	
	CVDisplayLinkStart(displayLink_);
	
    return YES;
}

Even if your code does not suffer from the problem I described above, I'm wondering whether maybe mine does, since I shift the rendering work onto the main thread. The main thread is then the one that blocks in dispatch_semaphore_wait(), in my code; so then perhaps further firings of CVDisplayLink can occur, and clog up background threads, while the main thread is blocked? I'm still trying to get used to how GCD and blocks and such manage threads; this is a new world for me, I've used NSThread for multithreading until this recent work moving Satori to Metal.

If you're willing to take a few minutes to clarify whether this is an issue, for either your code or mine, and if not, why not, I'd greatly appreciate the guidance. Again, thanks for the very useful example code!

The text was updated successfully, but these errors were encountered:

bhaller · 2020-06-07T15:33:30Z

It may be that I answered my own question, in the course of writing up the above message. But since I'm not sure, I figured I'd still send the message to you and see whether you can clarify my thinking on this.

One thing I found while writing up the above is in the doc for CVDisplayLink; it says "A Core Video display link provides a separate high-priority thread to notify your application when a given display will need each frame." That makes it sound like it is a single thread; and so presumably that means that if that thread blocks, doing work or stuck in dispatch_semaphore_wait(), the CVDisplayLink will simply not fire until the thread returns. So maybe, for your code, that is why this is a non-issue.

For my code, the other thing I found is in the doc for dispatch_source_create(), which says: "Dispatch sources are not reentrant. Any events received while the dispatch source is suspended or while the event handler block is currently executing are coalesced and delivered after the dispatch source is resumed or the event handler block has returned." So I guess that means that perhaps my code also does not have a problem; when the CVDisplayLink fires, that will call through my dispatch_source_t to the main thread, and so when the main thread then blocks on dispatch_semaphore_wait(), the main thread is then blocked but the CVDisplayLink thread is also blocked, waiting for the dispatch_source_t to return. So the CVDisplayLink will not fire again, perhaps; but even if it did, the dispatch_source_t would simply coalesce additional calls from the CVDisplayLink together and send them on once it becomes free again. There would be no pileup.

Am I thinking about all this right? It's a confusing new world! :->

nickzman · 2020-06-07T16:30:58Z

Wow, I didn't realize Satori had been ported to the macOS screen saver engine! That screen saver, as well as Lunatic Fringe and You Bet Your Head, were my favorite After Dark screen savers back in the classic macOS days. (You must get that a lot.) I just installed it. Thanks for working on that!

It's been a while since I worked on this, but from what I recall, CVDisplayLink will always fire on the same thread, so it's okay to pause the thread if the CPU gets ahead of the GPU; no more threads will be created. Is there a reason why the drawing needs to happen on the main thread? Metal really wants you to set up your drawables on a background thread.

At the time I wrote this screen saver, I used CVDisplayLink and CAMetalLayer because I wanted to figure out how to make a Metal-powered animation at a low level. If I didn't care about figuring out how this all worked at a low level, I would've just used MTKView, which automates media timing to be synchronized with the display. Maybe that would work better for you?

bhaller · 2020-06-07T18:15:23Z

Wow, I didn't realize Satori had been ported to the macOS screen saver engine! That screen saver, as well as Lunatic Fringe and You Bet Your Head, were my favorite After Dark screen savers back in the classic macOS days. (You must get that a lot.) I just installed it. Thanks for working on that!

Hey, that's nice that you still remember my old stuff. Not as many people do any more. :->

It's been a while since I worked on this, but from what I recall, CVDisplayLink will always fire on the same thread, so it's okay to pause the thread if the CPU gets ahead of the GPU; no more threads will be created.

OK, good. I think I understand how it's all working, then.

Is there a reason why the drawing needs to happen on the main thread? Metal really wants you to set up your drawables on a background thread.

There was indeed a reason, but I forget what it was. I don't think it really matters, though; the drawable setup for Satori is utterly trivial: two triangles filled with a pre-existing texture, basically. The texture gets calculated and set up in the background, so the main thread is pretty free.

At the time I wrote this screen saver, I used CVDisplayLink and CAMetalLayer because I wanted to figure out how to make a Metal-powered animation at a low level. If I didn't care about figuring out how this all worked at a low level, I would've just used MTKView, which automates media timing to be synchronized with the display. Maybe that would work better for you?

Yeah; when I started doing this my thinking was "ScreenSaverView doesn't inherit from MTKView, so I have to use CAMetalLayer", but of course I could have just created an MTKView subview of my ScreenSaverView. Oh duh. But the CAMetalLayer design seems to be working very well, thanks to your example code.

Thanks for the reply! I think I get all this stuff now, so feel free to close this issue unless you want to keep chatting. :-> Nice to meet you!

bhaller · 2020-06-07T23:28:08Z

Hi again Nick. Actually: while I've got you on the horn, let me ask you something unrelated to the above issues. I'm setting up a display link associated with the current screen in pretty much the way you do it, and everything seems to work well, EXCEPT that the frame rate is inconsistent; it speeds up and slows down by a factor of two for no apparent reason. Because of your comment above, I have now moved my Metal rendering onto the CVDisplayLink's background thread, rather than the main thread, so that isn't the issue. With simple code similar to yours that just samples the clock at the start of each frame and does (current time - previous time), I get a log of time-between-frames like this:

.232561 time since previous: 0.0167
.241131 time spent rendering: 0.0086
.249151 time since previous: 0.0166
.257429 time spent rendering: 0.0083
.265918 time since previous: 0.0168
.274366 time spent rendering: 0.0085
.282520 time since previous: 0.0166
.291020 time spent rendering: 0.0085
.299184 time since previous: 0.0167
.307693 time spent rendering: 0.0085
.315856 time since previous: 0.0167
.324387 time spent rendering: 0.0085
.332522 time since previous: 0.0167
.341052 time spent rendering: 0.0085
.348293 time since previous: 0.0158
.374675 time spent rendering: 0.0264
.390583 time since previous: 0.0423
.417895 time spent rendering: 0.0273
.432359 time since previous: 0.0418
.457420 time spent rendering: 0.0251
.465868 time since previous: 0.0335
.495187 time spent rendering: 0.0293
.498300 time since previous: 0.0324
.524050 time spent rendering: 0.0258
.531619 time since previous: 0.0333
.540694 time spent rendering: 0.0091
.548320 time since previous: 0.0167
.574073 time spent rendering: 0.0258
.582547 time since previous: 0.0342
.591141 time spent rendering: 0.0086
.599153 time since previous: 0.0166
.607387 time spent rendering: 0.0082
.615878 time since previous: 0.0167
.624060 time spent rendering: 0.0082
.632577 time since previous: 0.0167
.640751 time spent rendering: 0.0082
.648625 time since previous: 0.0161
.657407 time spent rendering: 0.0088
.665915 time since previous: 0.0173
.674076 time spent rendering: 0.0082
.682558 time since previous: 0.0166
.690718 time spent rendering: 0.0082

As you can see, at the beginning and the end of these logs the time spent rendering is around 0.008-0.009 seconds, which is more than fast enough to sustain a frame rate of 1/60th of a second, and so the "time since previous" for each frame is around 0.0166. But then there's a period in the middle where the time spent rendering jumps to 0.0270, and so I start missing frames and drop to a frame rate of 1/30th of a second. This is quite visible in Satori's animation. You don't really see anything in between; the time spent rendering is never really in between the two values, when it stalls it stalls for about a full frame. As far as I can tell from Instruments, the main time-sinks in the rendering are [metalLayer_ nextDrawable] and [commandBuffer renderCommandEncoderWithDescriptor:]. When things stall, I think it is probably [metalLayer_ nextDrawable], and I'm suspicious of a function shown in Instruments under nextDrawable named CAImageQueueCollect_ but it's hard to know. I've done a bunch of googling, and as far as I can tell this is out of my control – I'm being pretty careful to return drawables quickly, not request them until needed, etc., I think, and I don't start rendering the next frame until the previous frame has finished rendering so I should never run out of drawables (I have a frames-in-flight semaphore, and it never stalls).

To cope with this problem, I've introduced variable-speed logic: I look at the time that elapsed from the start of the previous render to the start of this render, and I advance my animation state proportionally to that time. So if the previous render stalled and we missed a frame, the animation state gets advanced by double the usual amount, making the speed of the animation smooth even if frames get dropped. It seems like the best I can do. I'm wondering if you've confronted this issue, though. I'm surprised that it's so hard for me to get a steady 60fps, since my rendering is so trivial (two triangles per frame, four textures that don't change from one frame to the next, a couple very small constant buffers). I'm wondering if I'm just doing something very basic wrong – failing to set some flag, and thus putting Metal into a less efficient mode than it could be in.

Anyway, this is probably all gibberish; sorry to waste your time, if you're not interested. If you are interested, though, I'd be happy to share the code with you as long as you promise to keep it to yourself (I don't own the copyright on it, as it happens; I'm able to release Satori on OS X because of a licensing deal). Thanks!

bhaller · 2020-06-07T23:39:31Z

Update: it is indeed -nextDrawable. It occurred to me that I could time that call specifically to see how it performs. Usually it returns in about 0.008 s; but now and then it takes about 0.025 s, which kills the next frame. Nothing in between; the method clearly has two different "modes" that it falls into, although it may be that I'm doing something wrong that pushes it into the bad "mode". Anyway, there it is. At this point I think I'll log a bug on Apple and move on, unless you have any ideas. :-> Nice chatting with/at you, enjoy your weekend. :->

nickzman · 2020-06-08T02:44:59Z

How complicated is your shader program? Since macOS isn't guaranteed to run on the same GPU across all installations, Metal has to recompile the shader under certain circumstances, which causes drawing to chug a little until the MTLCompilerService task is done recompiling the shader. Maybe that might be why you're seeing a little chugging? Do you see MTLCompilerService doing something while the screen saver runs?

bhaller · 2020-06-08T02:53:40Z

I've never seen MTLCompilerService in a backtrace. My shaders are pretty simple. I've got less than ten of them, all pretty similar to this:

fragment float4 texturedFragmentShader_MIX_AVG(TexturedRasterizerData in [[stage_in]],
									   const texture2d<SATORI_DATA_SIZE_M> satoriTexture [[ texture(SatoriTextureIndexBaseImage) ]],
									   const texture1d<half> colorTable [[ texture(SatoriTextureIndexColorTable) ]],
									   const texture2d<SATORI_DATA_SIZE_M> satoriTexture2 [[ texture(SatoriTextureIndexBaseImage2) ]],
									   const texture1d<half> colorTable2 [[ texture(SatoriTextureIndexColorTable2) ]],
									   constant float &animationIndex [[buffer(SatoriFragmentInputIndexAnimationIndex)]],
									   constant float &animationIndex2 [[buffer(SatoriFragmentInputIndexAnimationIndex2)]],
									   constant float &fade [[buffer(SatoriFragmentInputIndexFade)]])
{
    constexpr sampler textureSampler (mag_filter::nearest, min_filter::nearest);
    constexpr sampler colorTableSampler (mag_filter::linear, min_filter::linear);
	
    // Sample the texture to obtain an index value
    const SATORI_DATA_SIZE_M4 indexSample = satoriTexture.sample(textureSampler, in.textureCoordinate);
    const SATORI_DATA_SIZE_M4 indexSample2 = satoriTexture2.sample(textureSampler, in.textureCoordinate);
	
	// Offset the index value by the animation index
	const float colorTableIndex = fmod(indexSample.r + animationIndex, 1.0);
	const float colorTableIndex2 = fmod(indexSample2.r + animationIndex2, 1.0);
	
	// Look up the fragment color from the color table using the index value
	const half4 colorSample = colorTable.sample(colorTableSampler, colorTableIndex);
	const half4 colorSample2 = colorTable2.sample(colorTableSampler, colorTableIndex2);
	
	// Mix the color components to get the final fragment color
	const half4 mixedColor = (colorSample + colorSample2) / 2.0;
	
    // Return the color value, scaled down by the fade factor
	return float4(mixedColor) * fade;
}

Shouldn't take it more than a second to compile them all, I wouldn't think, but the missed frames persist for at least ten seconds; I think they just keep happening no matter how long I run for, actually, but I haven't actually tried. I really think it's -nextDrawable; I logged a bug on Apple about it. :->

I just released Satori 2.1.1, a minor bugfix release on top of what you recently downloaded. It starts up faster (the initial image is now calculated in a compute shader in Metal, even though all later images are done on the CPU so as to keep the GPU free for animation). It's also smoother, as a result of smoothing over the missed frames with better animation management – the frames still get dropped but the user can't see it. :-> I'm really happy with it now; I think it's the best Satori has ever been. Metal is a truly amazing technology, it has been really fun learning how to use it.

Thanks for your help! Stay well!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with the g_max_inflight_buffers design? #2

Problem with the g_max_inflight_buffers design? #2

bhaller commented Jun 7, 2020 •

edited

Loading

bhaller commented Jun 7, 2020

nickzman commented Jun 7, 2020

bhaller commented Jun 7, 2020

bhaller commented Jun 7, 2020

bhaller commented Jun 7, 2020

nickzman commented Jun 8, 2020

bhaller commented Jun 8, 2020

Problem with the g_max_inflight_buffers design? #2

Problem with the g_max_inflight_buffers design? #2

Comments

bhaller commented Jun 7, 2020 • edited Loading

bhaller commented Jun 7, 2020

nickzman commented Jun 7, 2020

bhaller commented Jun 7, 2020

bhaller commented Jun 7, 2020

bhaller commented Jun 7, 2020

nickzman commented Jun 8, 2020

bhaller commented Jun 8, 2020

bhaller commented Jun 7, 2020 •

edited

Loading