Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: Traversal #3

Open
ib00 opened this issue Aug 16, 2023 · 2 comments
Open

Q: Traversal #3

ib00 opened this issue Aug 16, 2023 · 2 comments

Comments

@ib00
Copy link

ib00 commented Aug 16, 2023

What algorithm (paper) do you use for BVH traversal?

How does performance of compute traversal compare to hardware (Vulkan) traversal?

@sergcpp
Copy link
Owner

sergcpp commented Aug 16, 2023

On GPU I use fairly simple two-level binary BVH traversal with stack siting in shared memory. I tested a stackless approach some time ago, but ended up with this after all. You can find the source code in intersect_scene.comp.glsl (functions are Traverse_BLAS_WithStack and Traverse_TLAS_WithStack). I wanted to try "Compressed Wide BVHs" some time ago, but it seems you can rely on hardware raytracing these days as it is supported by all recent GPUs.
By enabling HWRT on RTX3080 I get about 4x overall speedup. The difference is less pronounced on AMD.
You can try it yourself with the sample application: https://github.com/sergcpp/RayDemo/releases by adding "--nohwrt" argument to disable hardware raytracing.

On CPU it is a little bit more complicated. I use the idea from "Shallow Bounding Volume Hierarchies for Fast SIMD Ray Tracing of Incoherent Rays". Binary BVH gets flattened into 8-children one. During traversal single ray is tested against 8 bboxes (using SSE/AVX), which should be better for incoherent rays than packet traversal. But it is still quite ineffective and I plan to improve it.

@ib00
Copy link
Author

ib00 commented Aug 16, 2023

Thanks! Very cool project.

So, for GPU (explicit compute shader, not HW), you would try this:
https://research.nvidia.com/publication/2017-07_efficient-incoherent-ray-traversal-gpus-through-compressed-wide-bvhs

I think https://github.com/pablode/gatling had CWBVH, but he moved to HW intersection.
It would be interesting to see how far a hand-written BVH traversal can be pushed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants