-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Balanced channel measures RTT, hoping to stay within AZ (and save $) #794
Conversation
Generate changelog in
|
@Override | ||
public String toString() { | ||
return "SortableChannel{score=" + score + ", delegate=" + delegate + '}'; | ||
return "SortableChannel{" + "score=" + score + ", rtt=" + rtt + ", delegate=" + delegate + '}'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return "SortableChannel{" + "score=" + score + ", rtt=" + rtt + ", delegate=" + delegate + '}'; | |
return "SortableChannel{score=" + score + ", rtt=" + rtt + ", delegate=" + delegate + '}'; |
...ogue-core/src/main/java/com/palantir/dialogue/core/BalancedNodeSelectionStrategyChannel.java
Outdated
Show resolved
Hide resolved
EDIT solved. |
...ogue-core/src/main/java/com/palantir/dialogue/core/BalancedNodeSelectionStrategyChannel.java
Outdated
Show resolved
Hide resolved
@@ -56,13 +68,24 @@ | |||
final class BalancedNodeSelectionStrategyChannel implements LimitedChannel { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should update the description so that it includes the additional parameters to our cost function
...ogue-core/src/main/java/com/palantir/dialogue/core/BalancedNodeSelectionStrategyChannel.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would it take to support a flag to opt in/out of this feature for initial rollout now that we have excavated out BALANCED?
Added a few comments from an initial read through, still need to do a higher level pass.
.describedAs("%s: Constant 4xxs did move the needle %s", Duration.ofNanos(clock.read()), channel) | ||
.containsExactly(1, 0); | ||
|
||
incrementClockBy(Duration.ofSeconds(5)); | ||
|
||
assertThat(channel.getScores()) | ||
assertThat(channel.getScoresForTesting().map(c -> c.getScore())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
peanut gallery: I want an error-prone rule that can refactor this automatically
assertThat(channel.getScoresForTesting().map(c -> c.getScore())) | |
assertThat(channel.getScoresForTesting().map(BalancedNodeSelectionStrategyChannel::getScore)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be pretty cool! Think we just need to improve https://github.com/palantir/gradle-baseline/blob/15661f333a76997d0a5b79a52af8767e01a86ca1/baseline-error-prone/src/main/java/com/palantir/baseline/errorprone/LambdaMethodReference.java#L45 to handle more than suppliers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there ya go: palantir/gradle-baseline#1359
dialogue-target/src/main/java/com/palantir/dialogue/HttpMethod.java
Outdated
Show resolved
Hide resolved
...ogue-core/src/main/java/com/palantir/dialogue/core/BalancedNodeSelectionStrategyChannel.java
Outdated
Show resolved
Hide resolved
...ogue-core/src/main/java/com/palantir/dialogue/core/BalancedNodeSelectionStrategyChannel.java
Outdated
Show resolved
Hide resolved
...ogue-core/src/main/java/com/palantir/dialogue/core/BalancedNodeSelectionStrategyChannel.java
Outdated
Show resolved
Hide resolved
...ogue-core/src/main/java/com/palantir/dialogue/core/BalancedNodeSelectionStrategyChannel.java
Outdated
Show resolved
Hide resolved
...ogue-core/src/main/java/com/palantir/dialogue/core/BalancedNodeSelectionStrategyChannel.java
Outdated
Show resolved
Hide resolved
// can return in ~1 ms but others return in ~5ms, the 1ms nodes will all have a similar rttScore (near zero). | ||
// Note, this can only be computed when we have all the snapshots in front of us. | ||
long rttRange = worstRttNanos - bestRttNanos; | ||
if (bestRttNanos != Long.MAX_VALUE && worstRttNanos != 0 && rttRange > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps a negative sentinel value, extracted to a static field to make the code easier to read? bestRttNanos != UNKNOWN_RTT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is actually the result of doing the min
and max
things above, so it's not actually an arbitrary sentinel. I tried using streams to compute these:
Arrays.stream(snapshotArray)
.filter(snapshot -> snapshot.rttNanos.isPresent())
.mapToLong(snapshot -> snapshot.rttNanos.getAsLong())
.summaryStatistics();
but the summaryStatistics also gives you a Long.MIN_VALUE/MAX_VALUE if there is no inputs.
...ogue-core/src/main/java/com/palantir/dialogue/core/BalancedNodeSelectionStrategyChannel.java
Outdated
Show resolved
Hide resolved
...ogue-core/src/main/java/com/palantir/dialogue/core/BalancedNodeSelectionStrategyChannel.java
Outdated
Show resolved
Hide resolved
Arrays.fill(samples, newMeasurement); | ||
bestRttNanos = newMeasurement; | ||
} else { | ||
System.arraycopy(samples, 1, samples, 0, NUM_MEASUREMENTS - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of copying the array could we treat the array as a circular buffer keeping track of our current location
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I considered this, but figured it was actually gonna be more readable to try and keep it simple here - especially when this codepath only gets invoked at most once per second, and in a non-blocking way :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks great! excited to try this out
👍 |
Before this PR
There's an outstanding FR to make dialogue smart enough to reduce our AWS spend (#686).
After this PR
There are a variety of ways to achieve this, calling the AWS API directly, baking in a hostname convention, but this PR implements a version which just measures the RTT for a HTTP OPTIONS call.
==COMMIT_MSG==
Balanced channel should now bias towards whichever node has the lowest latency, which should reduce AWS spend by routing requests within AZ.
==COMMIT_MSG==
TODO:
maybe just go back to HEAD, OPTIONS doesn't buy us muchPossible downsides?
after merging this, the graphs will probably show 'balanced' sending lots of traffic to one node (within AZ) and very little to another. this might confuse people.EDIT it's now opt-in