Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] Fix rate limiter token bucket and clock consistency issues causing excessive throttling and connection timeouts #23930

Merged

Conversation

lhotari
Copy link
Member

@lhotari lhotari commented Feb 5, 2025

Fixes #23920

Motivation

Prior to this PR, connections in Pulsar could experience excessive throttling and timeouts due to issues in rate limiting implementation. This was caused by both calculation errors in token bucket logic and problems with clock source handling in virtualized environments. The PR addresses these issues to prevent unintended long throttling periods and connection timeouts.

Two key issues are addressed in this PR:

  1. Calculation Issue in AsyncTokenBucket:

    • The current token calculation logic is flawed when handling consumed tokens in conjunction with newly added tokens
    • When adding new tokens, the eventually consistent token calculation should happen before limiting the tokens to the bucket capacity limit. Since the new tokens were first added and capped to the capacity limit, this led to a situation where the tokens would become negative when the DefaultMonotonicSnapshotClock's thread was not executed with the expected interval for example due to high CPU usage.
  2. Clock Source Consistency:

    • The current rate limiting implementation in Pulsar broker can be affected by clock source issues where System.nanoTime() isn't strictly monotonic and consistent across multiple CPUs
    • This is a known issue in virtualized environments and certain hardware configurations, particularly:
      • Multi-socket systems where each CPU may have its own TSC (Time Stamp Counter)
      • When CPUs are affected by power management features and running at different frequencies
      • The previous solution was impacted since System.nanoTime() was read on different threads.

For details about timekeeping challenges in virtualized environments, see Linux kernel documentation on timekeeping.

Without this fix, when time leaps backwards (for example due to System.nanoTime being read by a thread running on a different CPU with unsynchronized TSC), the rate limiter would incorrectly throttle connections for the duration of the time leap. In virtualized environments, the time difference between CPU cores can be significant. Even though such large variances are uncommon, they can cause extended connection timeouts when they occur. This PR significantly improves reliability by preventing such time-variance-induced throttling entirely.

While both forward and backward time leaps can occur in these environments, only backward leaps are explicitly handled. Forward leaps cannot be reliably detected within the eventually consistent logic used in AsyncTokenBucket, and their impact is bounded - at most one rate period (default 1 second) of potential overuse. This design choice favors implementation simplicity and performance while still protecting against the more problematic backward leap case that could cause extended throttling.

Although this PR also covers clock source consistency issues, the most probable root cause for the reported issue #23920 is, however, the calculation logic that was assuming that the DefaultMonotonicSnapshotClock's update thread would always be executing consistently.

Modifications

  1. AsyncTokenBucket improvements:

    • Fixed token calculation logic to properly handle consumed tokens:
      • Subtract pendingConsumedTokens from new tokens before capping at bucket capacity
      • This ensures correct token balance even when DefaultMonotonicSnapshotClock's update thread is delayed
    • Added comprehensive test coverage for negative balance and eventual consistency scenarios
    • Added test cases to verify proper token bucket behavior under various conditions
  2. DefaultMonotonicSnapshotClock enhancements:

    • Implemented a single dedicated thread approach to minimize clock inconsistencies
      • Since System.nanoTime() values can vary slightly between CPUs in multi-core/multi-socket systems, using a single thread ensures time values are read consistently from one CPU
      • This effectively eliminates time variance issues by avoiding cross-CPU clock reads
    • Added handling of backward time leaps in clock source
    • Forward time leaps are intentionally not handled since they cannot be reliably detected in the eventually consistent token bucket logic
    • Forward leaps have minimal impact - at most 1 second of potential overuse with default rate period settings
    • Implemented a low-overhead synchronization mechanism for obtaining consistent clock values when required
    • The implementation is optimized for the common path while providing guarantees when needed
  3. Rate limiter implementations:

    • Updated PublishRateLimiterImpl to use proper executor for unthrottling
    • Added error handling for producer unthrottling
    • Modified DispatchRateLimiter and SubscribeRateLimiter to use the monotonic clock instance

Performance

AsyncTokenBucketBenchmark and DefaultMonotonicSnapshotClockBenchmark results with JMH demonstrate high throughput (test results with Apple M3 Max):

Benchmark                                                                      Mode  Cnt            Score             Error  Units
AsyncTokenBucketBenchmark.consumeTokensBenchmark001Threads                    thrpt    3    235889973.656 ±    20309920.612  ops/s
AsyncTokenBucketBenchmark.consumeTokensBenchmark010Threads                    thrpt    3   2092331094.972 ±     4154666.893  ops/s
AsyncTokenBucketBenchmark.consumeTokensBenchmark100Threads                    thrpt    3   2423947092.794 ±  1185164011.463  ops/s
DefaultMonotonicSnapshotClockBenchmark.getTickNanos001Threads                 thrpt    3   2031791421.024 ±   281179462.872  ops/s
DefaultMonotonicSnapshotClockBenchmark.getTickNanos010Threads                 thrpt    3  19192224651.486 ±  3492465614.920  ops/s
DefaultMonotonicSnapshotClockBenchmark.getTickNanos100Threads                 thrpt    3  22500950540.764 ± 15333188461.467  ops/s
DefaultMonotonicSnapshotClockBenchmark.getTickNanosRequestSnapshot001Threads  thrpt    3     71259586.362 ±     7340997.362  ops/s
DefaultMonotonicSnapshotClockBenchmark.getTickNanosRequestSnapshot010Threads  thrpt    3     22880953.901 ±     1521620.667  ops/s
DefaultMonotonicSnapshotClockBenchmark.getTickNanosRequestSnapshot100Threads  thrpt    3     16236151.209 ±     5642180.371  ops/s

Results with Dell XPS 15 7590 i9-9980HK on Linux:

Benchmark                                                                      Mode  Cnt            Score            Error  Units
AsyncTokenBucketBenchmark.consumeTokensBenchmark001Threads                    thrpt    3     54622720.701 ±    4953944.847  ops/s
AsyncTokenBucketBenchmark.consumeTokensBenchmark010Threads                    thrpt    3    361746065.391 ±  155306621.346  ops/s
AsyncTokenBucketBenchmark.consumeTokensBenchmark100Threads                    thrpt    3    410425438.712 ±  524401828.340  ops/s
DefaultMonotonicSnapshotClockBenchmark.getTickNanos001Threads                 thrpt    3   1776174139.570 ±  604272160.441  ops/s
DefaultMonotonicSnapshotClockBenchmark.getTickNanos010Threads                 thrpt    3  10480409935.377 ± 3681168439.649  ops/s
DefaultMonotonicSnapshotClockBenchmark.getTickNanos100Threads                 thrpt    3  10572683864.234 ±  783977752.777  ops/s
DefaultMonotonicSnapshotClockBenchmark.getTickNanosRequestSnapshot001Threads  thrpt    3     30378314.738 ±   11642686.027  ops/s
DefaultMonotonicSnapshotClockBenchmark.getTickNanosRequestSnapshot010Threads  thrpt    3    222770085.250 ±   42935412.356  ops/s
DefaultMonotonicSnapshotClockBenchmark.getTickNanosRequestSnapshot100Threads  thrpt    3    287689452.858 ±  153899991.642  ops/s

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@codecov-commenter
Copy link

codecov-commenter commented Feb 5, 2025

Codecov Report

Attention: Patch coverage is 95.09202% with 8 lines in your changes missing coverage. Please review.

Project coverage is 74.29%. Comparing base (bbc6224) to head (42fb876).
Report is 887 commits behind head on master.

Files with missing lines Patch % Lines
...lsar/broker/qos/DefaultMonotonicSnapshotClock.java 96.66% 3 Missing ⚠️
.../pulsar/broker/service/PublishRateLimiterImpl.java 66.66% 2 Missing ⚠️
...org/apache/pulsar/broker/qos/AsyncTokenBucket.java 96.15% 0 Missing and 1 partial ⚠️
...rg/apache/pulsar/broker/service/BrokerService.java 75.00% 0 Missing and 1 partial ⚠️
...roker/service/persistent/SubscribeRateLimiter.java 83.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #23930      +/-   ##
============================================
+ Coverage     73.57%   74.29%   +0.72%     
+ Complexity    32624    31923     -701     
============================================
  Files          1877     1853      -24     
  Lines        139502   143855    +4353     
  Branches      15299    16357    +1058     
============================================
+ Hits         102638   106877    +4239     
+ Misses        28908    28586     -322     
- Partials       7956     8392     +436     
Flag Coverage Δ
inttests 26.76% <35.58%> (+2.18%) ⬆️
systests 23.19% <35.58%> (-1.13%) ⬇️
unittests 73.82% <95.09%> (+0.97%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...che/pulsar/broker/qos/AsyncTokenBucketBuilder.java 100.00% <100.00%> (+22.22%) ⬆️
...pulsar/broker/qos/DynamicRateAsyncTokenBucket.java 75.00% <100.00%> (-8.34%) ⬇️
...broker/qos/DynamicRateAsyncTokenBucketBuilder.java 62.50% <ø> (ø)
...e/pulsar/broker/qos/FinalRateAsyncTokenBucket.java 91.66% <100.00%> (ø)
...r/broker/qos/FinalRateAsyncTokenBucketBuilder.java 100.00% <ø> (ø)
...broker/service/persistent/DispatchRateLimiter.java 80.46% <100.00%> (+1.83%) ⬆️
...org/apache/pulsar/broker/qos/AsyncTokenBucket.java 92.85% <96.15%> (-0.11%) ⬇️
...rg/apache/pulsar/broker/service/BrokerService.java 85.25% <75.00%> (+4.46%) ⬆️
...roker/service/persistent/SubscribeRateLimiter.java 56.79% <83.33%> (+2.24%) ⬆️
.../pulsar/broker/service/PublishRateLimiterImpl.java 91.30% <66.66%> (+5.15%) ⬆️
... and 1 more

... and 1032 files with indirect coverage changes

@lhotari
Copy link
Member Author

lhotari commented Feb 7, 2025

I fixed the flaky throttling tests and moved them to broker-api group. There were multiple issues in the flaky tests.

Copy link
Member

@nodece nodece left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM

@lhotari lhotari merged commit f8e4c11 into apache:master Feb 8, 2025
52 checks passed
lhotari added a commit that referenced this pull request Feb 8, 2025
…ues causing excessive throttling and connection timeouts (#23930)

(cherry picked from commit f8e4c11)
lhotari added a commit that referenced this pull request Feb 8, 2025
…ues causing excessive throttling and connection timeouts (#23930)

(cherry picked from commit f8e4c11)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Producers getting timed out intermittently
5 participants