Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bevy egui performance improvements #359

Merged
merged 7 commits into from
Feb 16, 2025

Conversation

PPakalns
Copy link
Contributor

@PPakalns PPakalns commented Feb 8, 2025

  1. Move wgpu::Buffer preperation to bevy render world system running in Prepare system set. This allows parallel execution with other Prepare system set systems. If it is done in render graph, execution is serial.
  2. Parallelize render command encoding.
  3. Optimize extraction of EguiRenderOutput to render world. Huge performance gain (More than 1ms in my case per frame). Instead of cloning whole array, Vec is wrapped inside an Arc and just simple Clone operation is done. This operation allows Main world and render world to resume processing instead of waiting until all data is cloned.
  4. Optimize RenderPass scissor rect handling. Bevy TrackedRenderPass doesn't track scissor rect changes and manual handling is needed to remove unnecessary commands.
  5. Remove unnecessary memory allocations (Frequent small vector creation) during vertex/index buffer preparation.

@PPakalns
Copy link
Contributor Author

PPakalns commented Feb 8, 2025

This should be merged after egui 0.31 support is merged.

@vladbat00
Copy link
Owner

Hi! This is awesome, thank you a lot for the PR!
I might need some time before I can review and test it, as I'm a bit too busy with IRL stuff atm, but I hope to get to the review this weekend.

/// and processed during [`egui_node::EguiNode`]'s `update`.
///
/// Value is wrapped in [`Arc`] to improve [`ExtractComponent`] performance.
pub paint_jobs: Arc<Vec<egui::ClippedPrimitive>>,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what's the etiquette for drive-by commenting on random projects on the internet but:

Arc<[egui::ClippedPrimitive]> would avoid double allocation and I don't see the array being appended to after being created.

Apologies if this is unwanted nitpicking.

Copy link
Contributor Author

@PPakalns PPakalns Feb 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviews are always welcome ;)

Suggested approach introduces additional full array memory copy.
Because slice in Arc<[T]> can contain only initialized members and Vec may contain additional capacity.
See source code: https://doc.rust-lang.org/src/alloc/sync.rs.html#3719
Maybe some compiler optimizations can optimize this (but I am not sure because memory allocators are involved), but current approach would work even in debug builds.

Moving Vec<> into Arc<Vec<>> guarantees that underlaying Vec memory is not touched, but only Vec structure that contains pointer, capacity, used length is copied into Arc. This is a lot faster.

Copy link
Contributor Author

@PPakalns PPakalns Feb 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked in compiler explorer.

In suggested appraoch more allocation / memcpy operations are introduced:
https://godbolt.org/z/zx7js9E4q

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it would be possible to avoid copies/reallocations if Egui itself returned paint jobs as Box<[egui::ClippedPrimitive]> (but I'm not sure if there's a way to cast Box to Arc without copying) or just Arc<[egui::ClippedPrimitive]>. And I imagine that would involve some unsafe machinery to initialise such an array on the Egui end as well, which would probably bring only negligible gains anyway.

Arc<Vec<egui::ClippedPrimitive>> seems like the best option for now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I learned something today. There's no optimization in the stdlib for moving Box (e.g. Vec::into_boxed_slice) into an Arc because Arc sticks the refcounts in the allocation and so can't reuse Box's allocation.

Rust playground link

Sorry for the static, I'm fresh off the heals of converting a bunch of Vec, String and PathBufs to Box<[T]>, Box<str> and Box<Path> and saw modest gains overall so I guess I was overzealous.

@PPakalns PPakalns requested a review from nsabovic February 16, 2025 09:06
@vladbat00
Copy link
Owner

I've just tested it and confirmed the performance gains, thank you! When I ran the ui example on my mac, extracting render output was about 16 microseconds, with the changes in your PR it's now like a couple of microseconds. Probably not the best example to test the performance difference, but it is there! 😄

Thank you again!

@vladbat00 vladbat00 merged commit d351599 into vladbat00:main Feb 16, 2025
39 checks passed
gogo2077 pushed a commit to gogo2077/bevy_egui that referenced this pull request Feb 21, 2025
* update to egui 0.31

* update examples

* Support better parallelization for egui UI rendering

* Scissor rect optimization and do not allocate unneeded Vec

* Improve EguiRenderOutput extraction performance, fix index buffer bug.

* Fix cargo clippy suggestions

* Remove redundant logic to round up to power of 2

---------

Co-authored-by: tomara_x <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants