Instanced rendering not the fastest #4

mtsr · 2021-05-13T13:29:40Z

According to some comments and https://www.slideshare.net/DevCentralAMD/vertex-shader-tricks-bill-bilodeau, instanced rendering of small meshes is not the fastest and can be outperformed by one or more larger draw calls.

I've tested a simple 4 vertex index buffer alternative in this branch, but it's not any faster and possibly slower. This was already suggested to be slower by cwfitzgerald/DrawCallYeeter, who tested this in another case.

The current method using the vertex-buffer-as-instance-buffer, storing [p0, p1, p2, ..] with an instance stride of 1 point, only works well for instanced rendering. For non-instanced a 4x (indexed) or 6x (non-indexed) duplication of points is required.

One alternative is to store the points in a SSBO and manually index into it using instance_index = vertex_index / 6. The indexing into coefficients in the shader can be done using coefficient_index = vertex_index % 6.

One downside is SSBOs have a WGPU default limit of 128MB, which means we can store at most 128MB / 12 bytes segments per SSBO (without querying for SSBO size limit). So we'd need to render in chunks for anything larger than that.

We should also test whether doing the above SSBO solution is faster with or without an index buffer. A single index buffer of say 8096 can be reused for every line and chunk, which is something at least.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instanced rendering not the fastest #4

Instanced rendering not the fastest #4

mtsr commented May 13, 2021

Instanced rendering not the fastest #4

Instanced rendering not the fastest #4

Comments

mtsr commented May 13, 2021