-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bevy egui performance improvements #359
Bevy egui performance improvements #359
Conversation
PPakalns
commented
Feb 8, 2025
•
edited
Loading
edited
- Move wgpu::Buffer preperation to bevy render world system running in Prepare system set. This allows parallel execution with other Prepare system set systems. If it is done in render graph, execution is serial.
- Parallelize render command encoding.
- Optimize extraction of EguiRenderOutput to render world. Huge performance gain (More than 1ms in my case per frame). Instead of cloning whole array, Vec is wrapped inside an Arc and just simple Clone operation is done. This operation allows Main world and render world to resume processing instead of waiting until all data is cloned.
- Optimize RenderPass scissor rect handling. Bevy TrackedRenderPass doesn't track scissor rect changes and manual handling is needed to remove unnecessary commands.
- Remove unnecessary memory allocations (Frequent small vector creation) during vertex/index buffer preparation.
This should be merged after egui 0.31 support is merged. |
Hi! This is awesome, thank you a lot for the PR! |
/// and processed during [`egui_node::EguiNode`]'s `update`. | ||
/// | ||
/// Value is wrapped in [`Arc`] to improve [`ExtractComponent`] performance. | ||
pub paint_jobs: Arc<Vec<egui::ClippedPrimitive>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what's the etiquette for drive-by commenting on random projects on the internet but:
Arc<[egui::ClippedPrimitive]>
would avoid double allocation and I don't see the array being appended to after being created.
Apologies if this is unwanted nitpicking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviews are always welcome ;)
Suggested approach introduces additional full array memory copy.
Because slice in Arc<[T]> can contain only initialized members and Vec may contain additional capacity.
See source code: https://doc.rust-lang.org/src/alloc/sync.rs.html#3719
Maybe some compiler optimizations can optimize this (but I am not sure because memory allocators are involved), but current approach would work even in debug builds.
Moving Vec<> into Arc<Vec<>> guarantees that underlaying Vec memory is not touched, but only Vec structure that contains pointer, capacity, used length is copied into Arc. This is a lot faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked in compiler explorer.
In suggested appraoch more allocation / memcpy operations are introduced:
https://godbolt.org/z/zx7js9E4q
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it would be possible to avoid copies/reallocations if Egui itself returned paint jobs as Box<[egui::ClippedPrimitive]>
(but I'm not sure if there's a way to cast Box
to Arc
without copying) or just Arc<[egui::ClippedPrimitive]>
. And I imagine that would involve some unsafe machinery to initialise such an array on the Egui end as well, which would probably bring only negligible gains anyway.
Arc<Vec<egui::ClippedPrimitive>>
seems like the best option for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I learned something today. There's no optimization in the stdlib for moving Box
(e.g. Vec::into_boxed_slice
) into an Arc
because Arc
sticks the refcounts in the allocation and so can't reuse Box's allocation.
Sorry for the static, I'm fresh off the heals of converting a bunch of Vec
, String
and PathBuf
s to Box<[T]>
, Box<str>
and Box<Path>
and saw modest gains overall so I guess I was overzealous.
I've just tested it and confirmed the performance gains, thank you! When I ran the Thank you again! |
* update to egui 0.31 * update examples * Support better parallelization for egui UI rendering * Scissor rect optimization and do not allocate unneeded Vec * Improve EguiRenderOutput extraction performance, fix index buffer bug. * Fix cargo clippy suggestions * Remove redundant logic to round up to power of 2 --------- Co-authored-by: tomara_x <[email protected]>