Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

register_component / register_resource calls cause noticeable growth in compile time #643

Open
philpax opened this issue Sep 13, 2024 · 3 comments

Comments

@philpax
Copy link
Contributor

philpax commented Sep 13, 2024

I was investigating why our protocol crate was so slow to build, and narrowed it down to register_component / register_resource. To reproduce this, I created compiletime_test as an example in examples:

examples/compiletime_test/Cargo.toml:

[package]
name = "compiletime_test"
version = "0.1.0"
edition = "2021"

[dependencies]
lightyear = { path = "../../lightyear", features = ["webtransport"] }
bevy = { version = "0.14", default-features = false, features = [
    "multi_threaded",
    "bevy_state",
    "serialize",
] }
serde = { version = "1.0", features = ["derive"] }

examples/compiletime_test/src/main.rs:

use bevy::prelude::*;
use lightyear::prelude::*;

macro_rules! register_components {
    ($($component:ident)*) => {
        fn register_components(app: &mut App) {
            $(
                #[derive(Component, Serialize, Deserialize, PartialEq, Clone)]
                struct $component;

                app.register_component::<$component>(ChannelDirection::Bidirectional);
            )*

            $(
                app.world_mut().spawn(($component,));
            )*
        }
    };
}

register_components!(C1);
// register_components!(C1 C2 C3 C4 C5);
// register_components!(C1 C2 C3 C4 C5 C6 C7 C8 C9 C10);
// register_components!(C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 C23 C24 C25 C26 C27 C28 C29 C30 C31 C32 C33 C34 C35 C36 C37 C38 C39 C40 C41 C42 C43 C44 C45 C46 C47 C48 C49 C50);
// register_components!(C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 C23 C24 C25 C26 C27 C28 C29 C30 C31 C32 C33 C34 C35 C36 C37 C38 C39 C40 C41 C42 C43 C44 C45 C46 C47 C48 C49 C50 C51 C52 C53 C54 C55 C56 C57 C58 C59 C60 C61 C62 C63 C64 C65 C66 C67 C68 C69 C70 C71 C72 C73 C74 C75 C76 C77 C78 C79 C80 C81 C82 C83 C84 C85 C86 C87 C88 C89 C90 C91 C92 C93 C94 C95 C96 C97 C98 C99 C100);
// 1000 example elided

fn main() {
    let mut app = App::new();
    register_components(&mut app);
    app.run();
}

and then rebuilt it with the following command to ensure the dependencies remained built, but the binary itself was being built from scratch:

rm -rf target/debug/.fingerprint/compiletime_test* && rm -rf target/debug/incremental/compiletime_test* && rm -rf target/debug/compiletime_test* && cargo build -p compiletime_test

Here's the data I collected:

n without register_component with register_component
1 3.00 9.13
5 3.00 10.57
10 3.13 12.09
50 3.28 24.09
100 3.88 41.87
1000 14.28 391.00

Plotted, just for the fun of it:
image

This is also true of register_resource. I tested with

        fn register_components(app: &mut App) {
            $(
                #[derive(Resource, Serialize, Deserialize, PartialEq, Clone)]
                struct $component;

                app.register_resource::<$component>(ChannelDirection::Bidirectional);
            )*

            $(
                app.world_mut().insert_resource($component);
            )*
        }

as the body of the macro, and found the following (didn't test all of the cases):

n without register_resource with register_resource
5 3.12 11.99
50 3.22 25.11

My suspicion is that this is due to monomorphisation generating junk code, but I haven't been able to confirm that yet.

@cBournhonesque
Copy link
Owner

Yes a big amount of code is currently being generated upon registration! I think it's possible to cut down this by type-erasing some of the systems (message replication, prediction, interpolation), similar to what is done with replication

@cBournhonesque
Copy link
Owner

cBournhonesque commented Jan 5, 2025

I'll add your benchmark to the repo, i can reproduce roughly the same results.
image

I tried removing the register_component_send function, and it made a big difference in compile times (from 16.6 to 4.7sec)

I've been using cargo llvm-lines to check the number of monomorphized lines:

torch ❯ CARGO_PROFILE_RELEASE_LTO=fat cargo llvm-lines | grep lightyear | head -30
   Compiling lightyear v0.18.0 (/Users/cbournhonesque/Snapchat/dev/rust/lightyear/lightyear)
   Compiling compiletime v0.18.0 (/Users/cbournhonesque/Snapchat/dev/rust/lightyear/benches/compiletime)
    Finished `dev` profile [optimized + debuginfo] target(s) in 1m 02s
    21650 (1.0%, 45.4%)     50 (0.1%, 13.3%)  lightyear::server::replication::send::send_component_removed::{{closure}}
    13900 (0.6%, 59.5%)     50 (0.1%, 22.3%)  lightyear::client::replication::send::send_component_removed
    12350 (0.5%, 63.5%)     50 (0.1%, 25.4%)  lightyear::protocol::component::register_component_send
    11950 (0.5%, 64.5%)     50 (0.1%, 26.1%)  lightyear::protocol::component::replication::<impl lightyear::protocol::component::ComponentRegistry>::write
     6600 (0.3%, 76.1%)    300 (0.6%, 40.4%)  lightyear::shared::events::systems::push_component_events::{{closure}}
     5750 (0.3%, 78.1%)     50 (0.1%, 42.0%)  <bevy_app::app::App as lightyear::protocol::component::AppComponentExt>::register_component::{{closure}}
     3700 (0.2%, 84.7%)    100 (0.2%, 53.0%)  lightyear::shared::events::systems::push_component_events
     3600 (0.2%, 85.6%)     50 (0.1%, 54.2%)  lightyear::protocol::component::serialize::<impl lightyear::protocol::component::ComponentRegistry>::raw_deserialize
     3550 (0.2%, 85.9%)     50 (0.1%, 54.4%)  lightyear::protocol::registry::TypeMapper<K>::add
     3400 (0.1%, 86.3%)     50 (0.1%, 54.6%)  lightyear::protocol::serialize::ErasedSerializeFns::typed
     3350 (0.1%, 86.4%)     50 (0.1%, 54.7%)  lightyear::protocol::serialize::ErasedSerializeFns::deserialize
     3200 (0.1%, 86.8%)     50 (0.1%, 56.0%)  <lightyear::server::events::ServerEvents as lightyear::shared::events::connection::IterComponentInsertEvent<lightyear::connection::id::ClientId>>::iter_component_insert::{{closure}}
     3200 (0.1%, 87.0%)     50 (0.1%, 56.1%)  <lightyear::server::events::ServerEvents as lightyear::shared::events::connection::IterComponentRemoveEvent<lightyear::connection::id::ClientId>>::iter_component_remove::{{closure}}
     3200 (0.1%, 87.1%)     50 (0.1%, 56.2%)  <lightyear::server::events::ServerEvents as lightyear::shared::events::connection::IterComponentUpdateEvent<lightyear::connection::id::ClientId>>::iter_component_update::{{closure}}
     3100 (0.1%, 87.5%)    250 (0.5%, 56.9%)  lightyear::server::replication::send::send_component_removed::{{closure}}::{{closure}}
     2550 (0.1%, 89.1%)     50 (0.1%, 59.8%)  <lightyear::shared::events::connection::ConnectionEvents as lightyear::shared::events::connection::IterComponentInsertEvent>::iter_component_insert
     2550 (0.1%, 89.2%)     50 (0.1%, 59.9%)  <lightyear::shared::events::connection::ConnectionEvents as lightyear::shared::events::connection::IterComponentRemoveEvent>::iter_component_remove
     2550 (0.1%, 89.3%)     50 (0.1%, 60.0%)  <lightyear::shared::events::connection::ConnectionEvents as lightyear::shared::events::connection::IterComponentUpdateEvent>::iter_component_update
     2350 (0.1%, 90.4%)     50 (0.1%, 62.6%)  lightyear::protocol::component::replication::<impl lightyear::protocol::component::ComponentRegistry>::set_replication_fns
     2200 (0.1%, 90.9%)     50 (0.1%, 63.1%)  lightyear::protocol::serialize::erased_serialize_fn
     1700 (0.1%, 92.1%)     50 (0.1%, 65.9%)  lightyear::protocol::serialize::ErasedSerializeFns::new
     1700 (0.1%, 92.2%)     50 (0.1%, 66.0%)  lightyear::protocol::serialize::serialize_map_entities
     1550 (0.1%, 92.6%)     50 (0.1%, 67.1%)  lightyear::protocol::serialize::default_serialize
     1350 (0.1%, 93.4%)    100 (0.2%, 70.7%)  lightyear::shared::events::components::ComponentInsertEvent<C,Ctx>::new
     1350 (0.1%, 93.4%)    100 (0.2%, 70.9%)  lightyear::shared::events::components::ComponentRemoveEvent<C,Ctx>::new
     1350 (0.1%, 93.5%)    100 (0.2%, 71.1%)  lightyear::shared::events::components::ComponentUpdateEvent<C,Ctx>::new
     1300 (0.1%, 93.6%)     50 (0.1%, 71.2%)  lightyear::protocol::serialize::default_deserialize
     1100 (0.0%, 94.5%)     50 (0.1%, 75.8%)  lightyear::protocol::component::ComponentRegistry::net_id::{{closure}}
     1000 (0.0%, 94.9%)     50 (0.1%, 78.2%)  lightyear::protocol::component::ComponentRegistry::register_component
     1000 (0.0%, 95.0%)     50 (0.1%, 78.3%)  lightyear::server::replication::send::send_component_removed

One possible trick that I found is to use a non-generic inner function to reduce the amount of monomorphized code.
I tried it in the send_component_removed function but it didn't make too much of a difference.

    /// Send component remove message when a component gets removed
    pub(crate) fn send_component_removed<C: Component>(
        trigger: Trigger<OnRemove, C>,
        registry: Res<ComponentRegistry>,
        mut sender: ResMut<ConnectionManager>,
        // only remove the component for entities that are being actively replicated
        query: Query<
            (&ReplicationGroup, Option<&DisabledComponents>),
            (With<Replicating>, With<ReplicateToServer>),
        >,
    ) {
        let entity = trigger.entity();
        let kind = ComponentKind::of::<C>();
        let net_id = registry.net_id::<C>();
        trace!(?entity, kind = ?std::any::type_name::<C>(), "Sending RemoveComponent");

        // Inner function with no generics to reduce compilation time
        // https://www.possiblerust.com/pattern/non-generic-inner-functions
        fn send_component_removed_inner(
            mut entity: Entity,
            registry: &ComponentRegistry,
            sender: &mut ConnectionManager,
            query: &Query<
                (&ReplicationGroup, Option<&DisabledComponents>),
                (With<Replicating>, With<ReplicateToServer>),
            >,
            kind: ComponentKind,
            net_id: ComponentNetId,
        ) {
            // convert the entity to a network entity (possibly mapped)
            entity = sender
                .replication_receiver
                .remote_entity_map
                .to_remote(entity);
            if let Ok((group, disabled_components)) = query.get(entity) {
                // do not replicate components (even removals) that are disabled
                if disabled_components
                    .is_some_and(|disabled_components| !disabled_components.enabled_kind(kind))
                {
                    return;
                }
                let group_id = group.group_id(Some(entity));
                sender
                    .replication_sender
                    .prepare_component_remove(entity, group_id, net_id);
            }
        }
        send_component_removed_inner(
            entity,
            registry.as_ref(),
            sender.as_mut(),
            &query,
            kind,
            net_id,
        );
    }

It reduces the number of lines by about 25%

10500 (0.5%, 66.5%)     50 (0.1%, 28.6%)  lightyear::client::replication::send::send_component_removed

I will have to think of more ways to reduce the amount of generated code.

@cBournhonesque
Copy link
Owner

Actually just removing the replication Events code (ComponentInsertEvent, ComponentRemoveEvent, etc.) results in a significant reduction in compile time, since they generate a lot of code.
image

I might gate them behind a feature, or just get rid of them altogether, I'm not sure that they provide much compared to the bevy Changed, Removed, Added filters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants