Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BackdropBlurContainer #6393

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

minetoblend
Copy link

@minetoblend minetoblend commented Oct 19, 2024

Companion to ppy/osu#30347

Adds a BackdropBlurContainer that blurs the background behind it's children.
It requires a parent of type IBackbufferProvider (a BufferedContainer or a RefCountedBackbufferProvider), who's framebuffer it will blur & then draw it back onto the framebuffer with a masking shader based on the BackdropBlurContainer's children.

Also adds a RefCountedBackbufferProvider which can be used to i.e. wrap the entire game in an on-demand buffercontainer, which automatically gets enabled when any of it's children need a backbuffer.

There's still a bit of weirdness with texture clamping going on when EffectBufferScale goes above 1, but I couldn't quite figure that out on my own.

2024-10-19.01-04-00.mp4

@bdach
Copy link
Collaborator

bdach commented Oct 21, 2024

OP is missing discussion of performance of this, which I'd expect to see given reliance on buffered containers which we have found to be rather expensive (especially in terms of vram usage).

@minetoblend
Copy link
Author

minetoblend commented Oct 21, 2024

OP is missing discussion of performance of this, which I'd expect to see given reliance on buffered containers which we have found to be rather expensive (especially in terms of vram usage).

Sure thing, I'll need a bit to do some proper benchmarks but for now I can go a bit into the performance considerations/testing I did so far. Is there anything specific you'd like to see in the benchmarks?

In the bit of testing I did so far with lazer (frame limiter on "basically unlimited", nvidia 980ti, 2560x1440) I saw a drop from around 950 to 700-800fps on more slider-heavy sections at 50% blur resolution and down to 700-600 at full resolution. Lowering the resolution below that had deminishing returns, so I'm assuming the framebuffer overhead and/or blur shader is a key contributor here (though I can't say anything for sure before I get around to actually measuring things).

The whole thing is using a single BufferedContainer at the root, acting as the backbuffer. Whenever a BackdropBlurContainer renders, it works similar to a BufferedContainer but uses part of the backbuffer as the input for the blur shader instead of the MainBuffer.
I added support for blurring at a different resolution than MainBuffer (that one would usually stay at 100%), and when testing found that even at 25% resolution (so 6.25% texture memory usage compared to full size if we ignore overhead + other gpu memory weirdness) the results still look acceptable.
The primary memory overhead memory wise is the MainBuffer, which in the case of sliders would add a +100% on top of the framebuffer used for drawing the slider path.

The most effective way to reduce vram overhead would probably be to use some form of texture/framebuffer pooling, requesting a temporary framebuffer with a minimum required size from the pool on each draw & returning it right after. I contributed the same effect to pixijs a couple months ago, which makes extensive use of texture pooling for filters/effects. (Example)

Some other ideas for performance improvements are lowering the kernel size in the blur shader, unrolling the blur loop and/or using precomputed values for the gaussian distribution. I don't really know if those would improve performance in a meaningful way, they're just things I know pixijs does to speed up their blur shader, which has a shader pool that generates shader code on the fly for any requested kernel size. Can't really say anything conclusive about it though without testing & measuring first.

@minetoblend
Copy link
Author

minetoblend commented Oct 21, 2024

I managed to get rid of most of the framebuffers. I made a BackdropBlurPath drawable which uses only one extra framebuffer per path/slider (before it was 3 extra) . If the buffer has a resolution of 25% that brings a reduction from previously x2.125 vram usage per slider to x1.0625, so it should barely make a dent now in terms of vram.

I also tried using 2 buffers so I can do both blur passes at a lower resolution and I think that was slightly faster. My guess is that this is since the blur shader seems to have quite a bit of overhead on its own. But I'll need to do more measuring to say for sure.

The drawing logic is now:

  • content gets rendered to MainBuffer at full resolution.
  • Backbuffer gets drawn to effect buffer at lower resolution, blurred on x axis
  • Effect buffer gets drawn onto Backbuffer scaled up to full resolution, blurred on y axis. It gets masked and blended with the MainBuffer in the same pass.

The blending formula in the final pass is still a bit incorrect which shows as a slightly darkish tint when alpha gets low, as well as slight artifacts on the edges with aa. There is some kinda premultiplied alpha weirdness going on that I cannot fully wrap my head around.

@peppy
Copy link
Member

peppy commented Oct 22, 2024

Also a touch concerned about performance of this given that we've seen the blur algorithm we're using perform pretty badly on some hardware already..

@minetoblend
Copy link
Author

minetoblend commented Oct 22, 2024

If the blur shader in particular is of concern, I think it's probably best if I use 2 lowres buffers and only do the blending in the final pass so the second blur pass can happen at a lower resolution too. Assuming 25% resolution that would increase the amount of vram used per slider to 1.125x compared to master.

@minetoblend
Copy link
Author

minetoblend commented Oct 22, 2024

With the second blur pass being at a lower resolution too now, assuming 25% resolution and 16px blur sigma it should be doing 5 texture samples per pixel for each blur pass.

I also noticed that performance was degraded when BlurSigma was zero, so I made some changes to only activate the backbuffer when a container is actually doing some blurring. With the new changes I'm seeing pretty much identical performance with BlurSigma=0 compared to master now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants