-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce address comparisons for network topology replica calculation #532
base: master
Are you sure you want to change the base?
Conversation
This uses a `DenseHashSet` to keep prevent duplicate replicas instead of doing a linear scan through the existing replicas. I'm seeing around a 4.5x speed up for larger replication factors (rf = 54).
size_t replication_factor = 3; | ||
size_t total_replicas = std::min(num_hosts, replication_factor) * num_dcs; | ||
size_t replication_factor = 54; | ||
size_t total_replicas = replication_factor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should likely be reverted. This is a pathological use case though.
@@ -79,6 +79,8 @@ class BufferBuilder { | |||
|
|||
static size_t size_of(const String& value) { return value.size(); } | |||
|
|||
static size_t size_of(const Address& value) { char buf[16]; return value.to_inet(buf); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes fix test warnings.
src/token_map_impl.hpp
Outdated
++replica_count_this_dc; | ||
} | ||
} else { | ||
TokenHostQueue& skipped_endpoints_this_dc = dc_rack_info.skipped_endpoints; | ||
if (racks_observed_this_dc.count(rack) > 0) { | ||
skipped_endpoints_this_dc.push_back(curr_token_it); | ||
} else { | ||
if (add_replica(replicas, Host::Ptr(host))) { | ||
if (replicas_set.insert(host).second) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just embed the changes in the add_replica()
?
another option is to try keep replicas
sorted and use binary search? i.e. use
std::sort()
, or even std::stable_sort()
that should be faster for sorted or, almost sorted data, then just use std::lower_bound()
with custom Comparator
to check if the host needs to be added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The token order of replicas
matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still could embed the logic in to add_replica()
This uses a
DenseHashSet
to keep prevent duplicate replicas instead of doing a linear scan through the existing replicas. I'm seeing around a 4.5x speed up for larger replication factors (rf = 54).