Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postprocessing: Add SSILVB GI and AO #29668

Open
zalo opened this issue Oct 15, 2024 · 31 comments
Open

Postprocessing: Add SSILVB GI and AO #29668

zalo opened this issue Oct 15, 2024 · 31 comments

Comments

@zalo
Copy link
Contributor

zalo commented Oct 15, 2024

Current Demo

Description

Recent advances in screen-space global illumination have yielded exceptional improvements to the realism and quality of real-time scenes, even over GTAO. One particular advancement is SSILVB, a screen-space global illumination technique that eases the computational burden by keeping track of the occluded horizon via a bitmask (allowing elements in the scene to have finite thickness and for more samples to be collected).

image

image

image

Solution

There are three MIT-Licensed implementations:

I have a flawed attempt at porting @cybereality 's here onto @Rabbid76's GTAO ( it's flawed because the samples seem unbalanced; AO-only, no illumination):
https://raw.githack.com/zalo/three.js/feat-ssilvb/examples/webgl_postprocessing_ssilvb.html

Alternatives

For reference, compare to the existing GTAO algorithm, which exhibits:

  • Short range / Garish contrast
  • No indirect illumination
  • No shadows

Additional context

These noisy GI techniques may benefit from screen-space temporal accumulation as well... But this is a request for another issue 😄

@zalo
Copy link
Contributor Author

zalo commented Oct 15, 2024

Another comparison from my port-in-progress...

GTAO
image
SSILVB
image

@cybereality
Copy link

cybereality commented Oct 16, 2024

Hey, thanks for looking at my implementation of the SSILVB paper. I'm fairly confident the code works as a reference implementation, but there are a lot of details which aren't in the blog post (some of which I'm still figuring out). In terms of denoising and how to blend it properly with the rest of the lighting, I think is still an open question. Your demo looks pretty close and definitely a step up from GTAO. I probably won't update the blog until I release something, but you can try different sampling strategies, and also using an accumulation buffer over a couple frames.

Here are some shots I took right now (since I've been tweaking things since the blog went up). Granted, the effect is a bit boosted (e.g. not exactly accurate) but I wanted it to be obvious as it's taking a chunk of the frametime. You probably have to open the images in a new tab to A/B test them, as the thumbnails don't really show it.

SSILVB_Oct24_NO_AO

SSILVB_Oct24_GTAO

SSILVB_Oct24_SSILVB

I may make another post once it's 100% done, but I'd be interested to see what you discover as the demo you posted actually looks pretty promising and better in some ways to what I have now. Thanks.

@liuyehua
Copy link

amazing

@zalo
Copy link
Contributor Author

zalo commented Oct 16, 2024

Aha, @ cybereality I think the root of the "error" in my port is the coordinate space that my position is in...
Where you define vec3 direction = vec3(omega.x, omega.y, 0.0);, I need vec3(omega.y, 0.0, omega.x); (to get similar artifacts to the one shown in the low-sector count image below).

image

However, I realized that my position vector is in view space, so that direction changes as the camera moves around... I suspect it should be in world space 😅

As far as how the blending works... I'm just piggybacking off of the GTAO that @Rabbid76 wrote; it should be around here:
https://github.com/mrdoob/three.js/blob/dev/examples/jsm/shaders/GTAOShader.js

@zalo
Copy link
Contributor Author

zalo commented Oct 16, 2024

Actually, scratch what I said earlier; it wasn't world vs. screen coordinates or the direction variable.... it was the normals!

They were being packed 0-1 via normal.xyz * 0.5 + 0.5;, and I wasn't unpacking them, so all the normals had a left-shift.

In addition to creating that leftward shadow, it also happened to also make the scene have much softer lighting... I like the softer lighting, so I'll leave it in as a checkbox until I can figure out how to do it properly...

Now just to figure out why it's casting rays left/right much farther than up/down...

@cybereality
Copy link

cybereality commented Oct 16, 2024

Oh yeah. I think I might have posted a bug in the code. This line is actually supposed to be this:
float sliceRotation = pi / float(sliceCount - 1);
There were a few other small things, but I think that was the only error.

@zalo
Copy link
Contributor Author

zalo commented Oct 16, 2024

Thanks! I think that doubles the sample efficiency 😄
I also found out where the left/right shadows were coming from... I had the aspect ratio in upside down... 😅

I think that's all for obvious bugs... which is interesting, because I think I liked the aesthetic more when some of the bugs were left in 🤔 ; the results are much more similar to GTAO now (which, I suppose makes sense, given that it's meant to be a refinement on the concept).

I'm setting up the SSTR3 Reference Unity Repo now to see how it compares...

@zalo
Copy link
Contributor Author

zalo commented Oct 16, 2024

Here are some recordings:
Flying around the classroom on default settings:
https://www.youtube.com/watch?v=WdWV6FIzNNM

Afterward, I discovered this "Mip Optimization" flag was responsible for most of the flickering in the GI:

MipOptimizationAliasing.mp4

The ambient occlusion-only mode looks phenomenal; very few of the artifacts from the GI mode:

SSRTAOTest.mp4

The denoiser they're using isn't great; the one in three.js is much better, I believe 😄

With the Mip Optimization "Bug" fixed, it feels very nice; feels similar to how CryEngine used to feel

HighestQuality.mp4

With some temporal reprojection and the three.js denoiser, I bet it would feel pretty solid.

Then, I tested the LittlestTokyo Scene: https://youtu.be/BGg_Z5icnl4

WOW! The effect that this shader has on this scene is truly unreal. The Unity version is definitely doing something my three.js version is not 😅
Will have to dig into it later...

@Mugen87
Copy link
Collaborator

Mugen87 commented Oct 16, 2024

@zalo Do you mind implementing this feature with TSL? TBH, I don't think we want to add new post processing passes to the old effect composer since the new post processing in WebGPURenderer is more efficient and provides MRT support.

https://threejs.org/examples/?q=webgpu%20postprocessing

E.g. the new motion blur and TRAA implementations are WebGPURenderer only as well.

@Mugen87
Copy link
Collaborator

Mugen87 commented Oct 16, 2024

The TSL based GTAO implementation is here:

https://github.com/mrdoob/three.js/blob/dev/examples/jsm/tsl/display/GTAONode.js

@zalo
Copy link
Contributor Author

zalo commented Oct 18, 2024

I think I might have made a minor breakthrough...

I decided to port the reference HLSL implementation from the SSRT3 repo and found an interesting bug... when it's accumulating the two halves of the horizon, only the second half of the accumulation comes out properly!
https://github.com/cdrinmatane/SSRT3/blob/39654c0df5415505749977fc01cfda0b4710b125/HDRP/Shaders/Resources/SSRTCS.compute#L362-L363

One slice in the good hemisphere:
image

One slice in the bad hemisphere:
image

It seems like the problem is somewhere inside of the ComputeOccludedBitfield (or updateSectors function in @cybereality 's code), either with the bit twiddling itself, or the WebGL shim for the HLSL countbits function.

I don't see this two-part hemisphere accumulation in cybereality's code, so it's possible he already noticed something funny about this 😅

Anyways, if I accumulate 8 slices of only the working hemisphere, then the AO starts to look preeeetty good:
image

Unlike the picture at the beginning of the thread, this one is also mostly correct (no weird shadows going off in one direction).... though, the thickness is turned up pretty high...

The code is pretty messy right now... a battlefield of default "uniforms" defined in-line and failed debugging codepaths... but I'll see if I can get it pushed soon for folks to look at. EDIT: Just pushed it here: https://raw.githack.com/zalo/three.js/feat-ssilvb/examples/webgl_postprocessing_ssilvb.html

If I resolve the hemisphere thing to my satisfaction, then I'll look at reflected light (GI), and then maybe a TSL port...

@cybereality
Copy link

Interesting. My original code was doing two horizons, but I found it wasn't needed (and doubles the amount of samples you need). How I think it should work is you have a vector (lets say the surface normal) and then you sample 180 degrees (Pi) centered around the normal. The horizon itself captures the hemisphere, so you don't need to explicitly check each side. But that was the most confusing part, and I did a lot of trial and error on those variables, so it's perhaps incorrect.

@zalo
Copy link
Contributor Author

zalo commented Oct 18, 2024

Interesting. My original code was doing two horizons, but I found it wasn't needed (and doubles the amount of samples you need). How I think it should work is you have a vector (lets say the surface normal) and then you sample 180 degrees (Pi) centered around the normal. The horizon itself captures the hemisphere, so you don't need to explicitly check each side. But that was the most confusing part, and I did a lot of trial and error on those variables, so it's perhaps incorrect.

After going back to the one based on your code, I think you're right, and I can see that they're pretty much doing the same thing (just with needing twice as many slices for the same number of samples and different default constants). The original code does guarantee that every left sample will be balanced by a right sample, which could have some subtle aesthetic quality though 🤔 (though, yours should also have it if there are an even number of slices)...

However, I think this correction float sliceRotation = pi / float(sliceCount - 1); might be mistaken... it seems like it's not covering the full circle when it's in (jitter disabled):
image

vs. float sliceRotation = twoPi / float(sliceCount - 1.0);
image

The latter does seem like it's doing two samples really close to each other, so perhaps that's what the perceived bug was... try it out in your engine and to see if I made a mistake somewhere else 😅

I think the GI really shines when it's in a scene with both light and shadow nearby each other... neither of our test scenes seem very good for this...

Also, fwiw this is the blending code I found in SSRT3: https://github.com/cdrinmatane/SSRT3/blob/main/HDRP/Shaders/Resources/SSRT.shader#L242-L267

@cybereality
Copy link

Actually, you're right. It was a bug in my code. I was originally doing 2*PI and changed it later (after I published the blog post). It seems because I had a lot of jitter, it kind of compensated, but it was incorrect. Thanks a bunch for catching this.

@cybereality
Copy link

I just made that change to 2 * PI, and yeah it looks a lot better. Here are the screenshots.

SSIL_Fix_Color

SSIL_Fix_GI

@zalo
Copy link
Contributor Author

zalo commented Oct 18, 2024

Alrighty, as of now, I have three versions:

GLSL Port of SSRT3 (this is the best atm): https://raw.githack.com/zalo/three.js/feat-ssilvb/examples/webgl_postprocessing_ssilvb.html
GLSL Port of cybereality's simplification: https://raw.githack.com/zalo/three.js/feat-ssilvb-cybereality/examples/webgl_postprocessing_ssilvb.html
TSL Port of the GLSL port of cybereality's simplification: https://raw.githack.com/zalo/three.js/feat-ssilvb-tsl/examples/webgpu_postprocessing_ao.html (which doesn't work via githack for some reason 💀 )

I assume there are subtle bugs in the way I did the ports that account for the differences; I was only able to step-by-step debug the SSRT3 one against a running reference implementation 😅

The TSL version is a little shiny for some reason... that'll have to wait for tomorrow...

@zalo
Copy link
Contributor Author

zalo commented Oct 18, 2024

It also occurs to me that (when I figure out the GI half) this technique should probably also handle Screen Space Reflections (and maybe contact shadows), since it’s already tapping the textures in the right way… I guess that means it needs a roughness GBuffer 🧐

And, if we want to go crazy, we can probably solve the pop-in issues

  • at the edges of the screen
  • around occluded corners

by sampling from a stochastic depth buffer cubemap at the player’s head. 🤯
https://xevius.org/papers/approxradpaper2010.pdf

If anyone remembers my old stochastic depth buffering demo…
https://zalo.github.io/three.js/examples/webgl_shadowmap_progressive.html

The cost might be worth it if we can also accumulate temporal reprojection samples to surfaces outside of the FoV and around corners… 😅

Perhaps just multiple layered cubemaps for an approximate depth peeling that builds up and reprojects over time…. 🤯

@cybereality
Copy link

I've thought about using this for other screen space techniques, but I'm not sure it's the best idea. Cause for light you mostly want to sample in an even uniform way. But for something like reflection, you are sampling in increasing increments. There are also ways do to SSR with Hi-Z that I don't think would be optimal for AO (but perhaps it could work, I didn't try it).

@zalo
Copy link
Contributor Author

zalo commented Oct 18, 2024

I think reflective surfaces don't accumulate AO or GI in the same way that diffuse surfaces do. A mirror doesn't need AO or GI... there's a continuum between diffuse reflection and mirror reflection, so perhaps this algorithm can just tighten the cone on smoother surfaces?

Hi-Z looks like fun (if a bit time-consuming to implement); I think it would work for SSILVB too.
I'd be more inclined to try temporal stochastic cube maps first though, since it sounds like more fun 😅

@zalo
Copy link
Contributor Author

zalo commented Oct 21, 2024

It’s seems like AO is in the air 😅
https://x.com/sonic_ether/status/1848025383028871649?s=46

Time to find out if there are any tricks we’re not using yet… perhaps something good in here:
https://x.com/mirko_salm/status/1833211198009184650?s=46

EDIT: The author of the SSILVB/SAOVB shader I'm porting says it is indeed a straight up improvement:
https://x.com/volfaze/status/1849437680506909016?s=46

Thank goodness it is still licensed under MIT 😄

@zalo
Copy link
Contributor Author

zalo commented Oct 30, 2024

Had a go at porting the GT-VBAO shadertoy by @Mirko-Salm : https://raw.githack.com/zalo/three.js/feat-ssilvb-gtvbao/examples/webgl_postprocessing_ssilvb.html

Observations compared to the SSRT3 (original "VBAO") version...

  • There's a lot more contrast, dark areas go all the way to full black, etc.
  • The noise function is different and the Poisson Denoiser doesn't like it... might have to switch to the temporal denoiser...
  • It feels more expensive per quality increment... maybe it's just the noise or the sound my GPU makes
  • It seems to approximate physically accurate AO the best as the settings are turned up...
  • The code is ginormous! May need to tree-shake it before trying to port it over to TSL...

I think I'd need to see it with potentially a better(?) noise function or temporal antialiasing to truly compare them...

I'm still annoyed at the popping artifacts (off the side of the screen and around corners)... likewise, defining a fixed "thickness" is rough too... stochastic depth buffer cubemaps would solve all of these (at the cost of 9-25x increased texture samples 💀 )...

I just found a paper trying it, and it seems to do alright:
https://graphics.tudelft.nl/Publications-new/2021/VSE21/SDAO.pdf
https://www.youtube.com/watch?v=X7JnEF__ZsQ
But it does >3x the cost of the shader... which is sad...
Perhaps if the stochastic depth layers are separated (and reprojected forward through time) in a separate pass, the cost to the AO/GI can be largely reduced... And it can be used for GI and SSR as well 🤔

@Mirko-Salm
Copy link

"The noise function is different and the Poisson Denoiser doesn't like it..." - The particular choice of noise function is not integral to GT-VBAO. You can use whatever noise function you like as long as it follows a uniform distribution. In my experience interleaved gradient noise works better than R1-Hilbert noise for denoising. I chose R1-Hilbert noise as the default for the Shadertoy demo simply because its screen space characteristics are more isotropic than those of IGN (as long as you don't denoise).

@zalo
Copy link
Contributor Author

zalo commented Oct 30, 2024

Thank you for the advice; I really appreciate the work that you've done on this! With interleaved gradient noise, the noise pattern now seems to match the other implementations... however, it still seems like the increased contrast from this (more correct, natural looking AO) is still too much for the current atemporal denoiser to smooth out.. 🫠 I did manage to tune the denoiser parameters a little bit to preserve edges better, but it's not quite enough without accumulation.

The side-by-side is quite striking:
GTVBAOvsVBAO

For context again, that's:

Short (compressed) video futzing around with the two implementations:

Screen.Recording.2024-10-30.140235.mp4

three.js is getting a new temporal antialiasing technique that should be able to handle denoising over time, but I'm also considering the atemporal option (there are always folks who are up-in-arms at the temporal smearing artifacts)

Considering the broader denoising landscape, there are only a few options:

And others that probably aren't viable:

I wonder how hard it would be to train a new neural denoiser on low sample-count ao, converged ao, normals, etc...

@zalo
Copy link
Contributor Author

zalo commented Oct 31, 2024

Scratch the other denoiser suggestions; the three.js denoiser looks insanely good in comparison to them 😅

Enabling the temporal jitter with the denoiser starts to look really good on my 240hz monitor (biological TRAA 😅), so I'll just rest my hopes there for now...

Thanks again @Mirko-Salm for publishing your implementation!

@Mirko-Salm
Copy link

I'm glad to see that people find it useful!
Just as an aside with regard to the deltaPosBack calculation, I think the 'normalize'-variant is probably a bit faster than the 'VPos_from_SPos'-variant while not really being less accurate (just a different assumption about the finite thickness behavior). I chose 'VPos_from_SPos' as the default because it matches the implicit behavior of the reference ray-marcher (which could also be modified to use the 'normalize'-variant, but here it would be the less performant choice since here we get the 'VPos_from_SPos'-behavior for free).

@zalo
Copy link
Contributor Author

zalo commented Oct 31, 2024

@Mirko-Salm I’m glad you mentioned that; I actually switched it to the normalized version while porting since the first version didn’t render correctly in three.js 🫠
https://github.com/zalo/three.js/blob/58d5c9d5e9e509e0f1bd75af70b7fbfbb089526d/examples/jsm/shaders/SSILVBShader.js#L1150

I suspect the depth unit is different enough (not-linear? reversed-z?) that applying the thickness to it directly behaved funnily 😅

As an aside; do you feel like the modifications to the VBAO would preclude adding GI functionality back into the shader? I’ll admit I can’t fully extrapolate whether the additional cosine weighting is just as good for reflected ambient illumination as it is for ambient occlusion…

Also what do you think about doing contact shadows and maybe screen space reflections in the same pass?

Also, according to a reply tweet, it seems like the GLSL/Githack version has a memory leak that could be artificially slowing it down… rather than troubleshoot it; I’ll just hope that will go away with the port to the simpler TSL system or WebGPU. 😅

@Mirko-Salm
Copy link

Mirko-Salm commented Nov 1, 2024

I suspect the depth unit is different enough (not-linear? reversed-z?)

it's just linear z from 0 to inf.

As an aside; do you feel like the modifications to the VBAO would preclude adding GI functionality back into the shader? I’ll admit I can’t fully extrapolate whether the additional cosine weighting is just as good for reflected ambient illumination as it is for ambient occlusion…

GI should also be possible. I haven't looked into it yet since I'm still busy with the uni-directional variant of GT-VBAO (having to always ray march bi-directionally is a bit of a downer). I can't really say how big of an improvement the cosine weighting would be for GI, but it shouldn't be hard to set up a reference GI ray marcher to get an upper bound on the possible quality improvements.

Also what do you think about doing contact shadows and maybe screen space reflections in the same pass?

Hard to say if doing in it all in one pass is going to be beneficial. Seems like something you just have to try and profile.

RE performance optimizations: try disabling #define USE_HQ_ACOS; the approximation should be good enough while being quite a bit faster. I only had the high quality variants on by default to show that the converged results exactly match those of the reference ray marcher. Disabling #define USE_HQ_APPROX_SLICE_IMPORTANCE_SAMPLING might not be worth it, though.

EDIT: I have just added an improved version of ACos_Approx() to the GT-VBAO shadertoy code. The error introduced due to using the acos approximation is now so small that there is basically no reason not to do so (by disabling #define USE_HQ_ACOS).

@cdrintherrieno
Copy link

It's amazing to see those implementations in three.js! The GT-VBAO method is a great improvement over VBAO and makes it more physically correct (ie: cosine weighted vs uniform sampling).

I think the visual difference should be more subtle though (maybe there is a sampling problem in the three.js SSRT3 implementation?).

SSRT3 (three.js):

image

SSRT3 (Unity implementation, from the video above with same parameters):

image

GT-VBAO:

image

It looks like a lot of details are missing in the three.js version.

In any case, very nice work, thanks for sharing :)

@zalo
Copy link
Contributor Author

zalo commented Nov 1, 2024

I think the visual difference should be more subtle though (maybe there is a sampling problem in the three.js SSRT3 implementation?).
It looks like a lot of details are missing in the three.js version.

Thanks for bringing that up; that's a great observation! I only did line-by-line, side-by-side comparisons of the two implementations right up until the first slice (without noise/jitter), just to see if the inputs were the same handedness, but the odds are very good I messed up somewhere else in the Unity HLSL -> three.js GLSL port...

The GTVBAO port looks much more similar to the SSRT3 Unity screenshot (though, iirc, the Unity screenshot also has some tonemapping/postprocessing I forgot to disable before recording, leading to the soft brown coloration 😅 ). It should also be noted that the three.js screenshots also have noise, cartoon outlines, and incorrect transparency... so I wonder if that accounts for the majority of the remaining visual difference between three-GTVBAO and unity-VBAO...

I'll try pushing forward with the GTVBAO implementation for now and hope that it closes the visual fidelity gap after accounting for these differences 🧐

I'll include hedging notes here suggesting my port is buggy (and I'm sorry if three.js tweet inadvertently maligned SSRT3! It's an awesome package for Unity worth double the price 😄 And the SSRT3 scene looks so nice in AO; I want to live there... )

@zalo
Copy link
Contributor Author

zalo commented Nov 8, 2024

@Mirko-Salm Should I switch to the new Unidirectional Variant? What are the benefits?

@Mirko-Salm
Copy link

The benefit is that you cut the number of depth buffer samples in half without doing the same to the quality. Well, at least if your scene doesn't primarily consist of camera facing surfaces. In that case you lose about as much quality as you gain performance. Whether you would want to switch probably depends on whether marching a single direction per-pixel gives you sufficiently good results for the denoiser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants
@zalo @cybereality @Mugen87 @liuyehua @cdrintherrieno @Mirko-Salm and others