Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebGPURenderer Performance Significantly Lower Than WebGLRenderer #30560

Open
tenkkov opened this issue Feb 19, 2025 · 12 comments
Open

WebGPURenderer Performance Significantly Lower Than WebGLRenderer #30560

tenkkov opened this issue Feb 19, 2025 · 12 comments

Comments

@tenkkov
Copy link

tenkkov commented Feb 19, 2025

Description

Summary

When switching from WebGLRenderer to WebGPURenderer, I experience a significant drop in performance. The same scene, containing thousands of non-instanced meshes, runs smoothly at 60 FPS on WebGL but drops to 15 FPS on WebGPU, a 4x decrease in performance.

Expected Behavior

WebGPURenderer should provide comparable or better performance than WebGLRenderer, given its modern API and intended improvements over WebGL.

Current Behavior

Rendering 20,000 non-instanced basic cube meshes:

WebGLRenderer: ~60 FPS on Mac (Apple Silicon M1 Pro)

WebGPURenderer: ~15 FPS (4x slower)

No errors or warnings appear in the Chrome console.

Reproduction steps

  1. Create a Three.js scene with 20,000 Mesh instances.
  2. Use WebGLRenderer and observe smooth 60 FPS performance.
  3. Switch to WebGPURenderer by uncommenting the renderer swap.
  4. Observe FPS dropping significantly (down to 15 FPS).

Code

see live example below

Live example

https://jsfiddle.net/15zfestk/1/

Screenshots

No response

Version

r.0.173.0

Device

Desktop

Browser

Chrome

OS

MacOS

@Mugen87
Copy link
Collaborator

Mugen87 commented Feb 19, 2025

@Mugen87
Copy link
Collaborator

Mugen87 commented Feb 19, 2025

The live example uses more or less the worst case setup for the renderer. Many objects which update their transformation every frame. Since the example uses no instancing or batching, existing performance issues in WebGPURenderer are exhibited.

Because every object has its own UBO for managing their object scope uniforms, each frame all UBOs must be bound and updated which seems to cause a considerable amount of overhead. The WebGL backend spends most time for the bindBufferBase(), bindBuffer() and bufferData() calls.

Image

@Mugen87
Copy link
Collaborator

Mugen87 commented Feb 19, 2025

To further explain the major performance gap: This is how WebGLRenderer renders the scene when four cubes are configured:

Image

There are no major state changes between the draw calls (except for some single uniform updates which are not displayed in the list). Compared to that, WebGPURenderer does the following:

Image

As you can see, there is a considerable amount of state changes between each drawElements() command. Many scenes won't have an issue with this because the number of render objects is low. But the more render objects you have in a scene, the sooner WebGPURenderer gets CPU limited.

#30562 fixes the VAO related issues but they are unfortunately negligible compared to the UBO related overhead. I guess we need a different approach in the renderer to minimize these state changes.

@sunag @RenaudRohlinger @aardgoose Would be a single UBO for all object scope uniforms a potential solution?

@RenaudRohlinger
Copy link
Collaborator

Nice catch with the VAO!

Related (I like the CommonUniformBuffer interface):
#27388

Unless we implement a pool system I don't think we can use a single UBO for all object-scope uniforms as a potential solution since we'd be very limited by the number of meshes. With a typical 16KB max block size and each mat4 taking 64 bytes in std140, that limits us to about 256 meshes.

@Mugen87
Copy link
Collaborator

Mugen87 commented Feb 20, 2025

Good to know that. I hope we can revisit #27388 soon.

@CodyJasonBennett
Copy link
Contributor

With a typical 16KB max block size

This is the limitation per draw call as guaranteed by the WebGL 2 specification. You can have a larger buffer bound and adjust the offset dynamically. I've shared many words of this and scheduling in general, but I'm not sure they've been heard, reading this.

@aardgoose
Copy link
Contributor

aardgoose commented Feb 21, 2025

Can this issue be clarified, is it performance with the webGL fallback backend that the OP has an issue with or the WebGPU backend or both?

Re #27388, I'll revisit it in a few weeks time, I recall looking at applying a similar mechanism to the WebGL fallback, but found that more complicated because of the different api styles rather than a buffer size issue, although I'd have to check.

@Mugen87
Copy link
Collaborator

Mugen87 commented Feb 21, 2025

Can this issue be clarified, is it performance with the webGL fallback backend that the OP has an issue with or the WebGPU backend or both?

Both backends have the performance issue.

@Mugen87
Copy link
Collaborator

Mugen87 commented Feb 24, 2025

. You can have a larger buffer bound and adjust the offset dynamically.

When I understand the spec correctly, you can use gl.bindBufferRange() in WebGL 2 for that purpose.

https://registry.khronos.org/OpenGL-Refpages/es3.0/html/glBindBufferRange.xhtml

In WebGPU, it should be the dynamicOffsets parameter of setBindGroup().

https://www.w3.org/TR/webgpu/#gpubindingcommandsmixin-setbindgroup

It seems both APIs are not used in #27388 yet.

@sunag
Copy link
Collaborator

sunag commented Feb 24, 2025

@mrdoob @Mugen87 My apologies for the delay. I'm currently dealing with a health issue, but I will look into it as soon as I recover.

@aardgoose
Copy link
Contributor

#27388 only applies to webGPU, the dynamicOffsets isn't really useful in the current renderer AFAICS. The offset in createBindGroup is all that is required to use a single buffer.

The issue with webGL is that the buffer updates and draw calls are interleaved and executed in a single pass, whereas the webGPU renderer updates the arrayBuffer and queues the draw calls for later execution, this allows the single buffer update to be inserted before the queued draw calls are executed.

For a WebGL solution you need to have two passes through the render list.

  1. pass 1: Update the intermediate arrayBuffers only.
  2. write the gl buffer.
  3. pass 2: draw calls.

This doesn't match the current code structure.

@Spiri0
Copy link
Contributor

Spiri0 commented Mar 3, 2025

I don't know if this is related to this point or if it is a separate topic. Since r173 I have noticed a frame drop (WebGPURenderer). Suddenly the frame rate drops from 120 fps to 30 fps. Since I haven't changed anything in the app itself, just the threejs release from r172 to r173 and now r174, I keep noticing this. There is no error message, which makes the analysis more difficult. The app runs at 120 fps and suddenly it drops to 30 fps. Sporadically it peaks back to 120fps. Since I'm not allocating any new buffers or new geometries, that's strange. Since up until r172 it always ran at 120fps, something must have happened from r172 to r173.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants