-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add walqueue sharding #2665
Add walqueue sharding #2665
Conversation
docs/sources/reference/components/prometheus/prometheus.write.queue.md
Outdated
Show resolved
Hide resolved
docs/sources/reference/components/prometheus/prometheus.write.queue.md
Outdated
Show resolved
Hide resolved
docs/sources/reference/components/prometheus/prometheus.write.queue.md
Outdated
Show resolved
Hide resolved
docs/sources/reference/components/prometheus/prometheus.write.queue.md
Outdated
Show resolved
Hide resolved
`allowed_network_error_percent` | `float` | The allowed error rate before scaling down. | `0.50` | no | ||
|
||
Parralelism determines when to scale up or down the number of desired connections. This is accomplished by a variety of inputs. | ||
By determining the drift between the incoming and outgoing timestamps that will determine whether to increase or decrease the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this connects / flows with previous sentence. Is it meant to be a bullet point?
docs/sources/reference/components/prometheus/prometheus.write.queue.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Piotr <[email protected]>
…queue.md Co-authored-by: Piotr <[email protected]>
…queue.md Co-authored-by: Piotr <[email protected]>
…queue.md Co-authored-by: Piotr <[email protected]>
docs/sources/reference/components/prometheus/prometheus.write.queue.md
Outdated
Show resolved
Hide resolved
docs/sources/reference/components/prometheus/prometheus.write.queue.md
Outdated
Show resolved
Hide resolved
Parallelism determines when to scale up or down the number of desired connections. This is accomplished by a variety of inputs: | ||
|
||
By determining the drift between the incoming and outgoing timestamps that will determine whether to increase or decrease the | ||
desired connections. This is represented by `drift_scale_up_seconds` and `drift_scale_down_seconds`, if the drift is between these | ||
two values then the value will stay the same. | ||
|
||
Network success and failures are recorded and kept in memory, this helps determine | ||
the nature of the drift. For instance if the drift is increasing but the network failures are increasing we should not increase | ||
desired connections since that would only increase load on the endpoint. | ||
|
||
Flapping prevention accomplished with `desired_check_interval`, each time a desired connection is calculated it is added to a list, before actually changing the | ||
desired connection the system will choose the highest value in the lookback buffer. Example; for the past 5 minutes desired connections have been: [2,1,1] the check runs | ||
and determines that the desired connections are 1, but will not change the value since the value 2 is still in the lookback. On the next check we have [1,1,1], | ||
now it will change to 1. In general the system is fast to increase and slow to decrease. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parallelism determines when to scale up or down the number of desired connections. This is accomplished by a variety of inputs: | |
By determining the drift between the incoming and outgoing timestamps that will determine whether to increase or decrease the | |
desired connections. This is represented by `drift_scale_up_seconds` and `drift_scale_down_seconds`, if the drift is between these | |
two values then the value will stay the same. | |
Network success and failures are recorded and kept in memory, this helps determine | |
the nature of the drift. For instance if the drift is increasing but the network failures are increasing we should not increase | |
desired connections since that would only increase load on the endpoint. | |
Flapping prevention accomplished with `desired_check_interval`, each time a desired connection is calculated it is added to a list, before actually changing the | |
desired connection the system will choose the highest value in the lookback buffer. Example; for the past 5 minutes desired connections have been: [2,1,1] the check runs | |
and determines that the desired connections are 1, but will not change the value since the value 2 is still in the lookback. On the next check we have [1,1,1], | |
now it will change to 1. In general the system is fast to increase and slow to decrease. | |
Parallelism determines when to scale up or down the number of desired connections. | |
The drift between the incoming and outgoing timestamps determines whether to increase or decrease the desired connections. | |
The value stays the same if the drift is between `drift_scale_up_seconds` and `drift_scale_down_seconds`. | |
Network successes and failures are recorded and kept in memory. | |
This data helps determine the nature of the drift. | |
For example, if the drift is increasing and the network failures are increasing, the desired connections should not increase because that would increase the load on the endpoint. | |
The `desired_check_interval` prevents connection flapping. | |
Each time a desired connection is calculated, the connection is added to a list. | |
Before changing the desired connection, the system will choose the highest value in the lookback buffer. | |
For example, for the past 5 minutes, desired connections have been: [2,1,1]. | |
The check determines that the desired connections are 1, and the number of desired connections will not change because the value `2` is still in the lookback buffer. | |
On the next check, the desired connections are [1,1,1]. | |
Now, it will change to 1. In general, the system is fast to increase and slow to decrease. |
This is a first pass at reworking the description here. I don't think I've accurately distilled what you were trying to explain here... so I expect we will have to go over this at least once more.
Questions...
- What is
the system
here? Alloy? The connector? - In the 4th paragraph example, what changes to
1
when the lookback is[1,1,1]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Yes, but more specifically the component
prometheus.write.queue
so will be specific there. -
The
desired_connections
change to 1.
…queue.md Co-authored-by: Clayton Cornell <[email protected]>
…queue.md Co-authored-by: Clayton Cornell <[email protected]>
…queue.md Co-authored-by: Clayton Cornell <[email protected]>
docs/sources/reference/components/prometheus/prometheus.write.queue.md
Outdated
Show resolved
Hide resolved
docs/sources/reference/components/prometheus/prometheus.write.queue.md
Outdated
Show resolved
Hide resolved
docs/sources/reference/components/prometheus/prometheus.write.queue.md
Outdated
Show resolved
Hide resolved
docs/sources/reference/components/prometheus/prometheus.write.queue.md
Outdated
Show resolved
Hide resolved
…queue.md Co-authored-by: Clayton Cornell <[email protected]>
…queue.md Co-authored-by: Clayton Cornell <[email protected]>
…queue.md Co-authored-by: Clayton Cornell <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say the docs for this new block are fine as-is now. I'll do a pass though later when I update the Prometheus topics for overall style/consistency... for now it matches the other Prometheus topics :-)
PR Description
This adds dynamic sharding configuration into prometheus.write.queue
Which issue(s) this PR fixes
Notes to the Reviewer
PR Checklist