Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FM-751] Add system task offset evaluation strategy #179

Merged
merged 6 commits into from
Jan 8, 2025
Merged

Conversation

jaro0149
Copy link
Collaborator

@jaro0149 jaro0149 commented Nov 11, 2024

FM-751 Add system task offset evaluation strategy

Pull Request type

  • Bugfix
  • Feature
  • Refactoring (no functional changes, no api changes)
  • Build related changes (Please run ./gradlew generateLock saveLock to refresh dependencies)
  • WHOSUSING.md
  • Other (please describe):

Changes in this PR

- Added option to customize strategy used for computation
  of a postponed system task per task type:

  conductor.app.system-task-offset-evaluation.[task-type]=[strategy]

  [task-type] - type of the task, e.g. join, simple, ...
  [strategy] - strategy used for computation of the system task
    offset; currently supported options are:
    a. 'constant_default_offset'
    b. 'backoff_to_default_offset'
    c. 'scaled_by_queue_size'
    d. 'scaled_by_task_duration'

- 'constant_default_offset' - uses constant value of set
  'systemTaskWorkerCallbackDuration' configuration property;
  by default, it is used by all but 'join' system tasks
- 'backoff_to_default_offset' - scales offset based on task
  poll-count in exponential way (2^n) up to value of the
  'systemTaskWorkerCallbackDuration' configuration property;
  by default, it is used by 'join' system task
- 'scaled_by_queue_size' - scales offset based on task poll-count
  and actual queue size in exponential way (2^n) up to value of:
  a. 'backoff_to_default_offset', if queue size == 0
  b. 'backoff_to_default_offset'*'queue_size' otherwise
  this strategy is not used in the default configuration
- 'scaled_by_task_duration' - Computes the evaluation offset for
  a postponed task based on the task's duration and settings that
  define the offset for different levels of task durations.

Reasoning:
- New strategies were implemented primarily to solve performance
  issues on join queues that contain a large number of join tasks
  blocked by wait/human actions in some forks for several
  days/weeks.
- Implemented strategies can easily be extended in the future
  while preserving backwards compatibility.
- Improved configurability of the task offset evaluation.

Alternatives considered

https://nitish1503.medium.com/decoding-challenges-with-netflix-conductor-6a623b47291f - it would require too big changes in the core architecture of the conductor

- Added option to customize strategy used for computation
  of a postponed system task per task type:

  conductor.app.system-task-offset-evaluation.[task-type]=[strategy]

  [task-type] - type of the task, e.g. join, simple, ...
  [strategy] - strategy used for computation of the system task
    offset; currently supported options are:
    a. 'constant_default_offset'
    b. 'backoff_to_default_offset'
    c. 'scaled_by_queue_size'

- 'constant_default_offset' - uses constant value of set
  'systemTaskWorkerCallbackDuration' configuration property;
  by default, it is used by all but 'join' system tasks
- 'backoff_to_default_offset' - scales offset based on task
  poll-count in exponential way (2^n) up to value of the
  'systemTaskWorkerCallbackDuration' configuration property;
  by default, it is used by 'join' system task
- 'scaled_by_queue_size' - scales offset based on task poll-count
  and actual queue size in exponential way (2^n) up to value of:
  a. 'backoff_to_default_offset', if queue size == 0
  b. 'backoff_to_default_offset'*'queue_size' otherwise
  this strategy is not used in the default configuration

- Implemented new 'scaled_by_queue_size' strategy is appropriate
  for relatively big queues (100-1000s tasks) that contain
  long-running tasks (days-weeks) with high number of poll-counts.

Reasoning:
- New strategy was implemented primarily to solve performance
  issues on join queues that contain a large number of join tasks
  blocked by wait/human actions in some forks for several
  days/weeks.
- Implemented strategies can easily be extended in the future
  while preserving backwards compatibility.
- Improved configurability of the task offset evaluation.
@jaro0149 jaro0149 added the enhancement New feature or request label Nov 11, 2024
jaro0149 and others added 5 commits November 11, 2024 14:58
- from BACKOFF_TO_DEFAULT_OFFSET
- to SCALED_BY_QUEUE_SIZE
- goal: cleaner goals, separated configuration and implementation
  aspects
- we can directly inject ConductorProperties into implementations
  of strategies that are represented by Spring components
- introduction of TaskOffsetEvaluationSelector that allows
  other component to load implementation of specific strategy
- Computes the evaluation offset for a postponed task based
  on the task's duration and settings that define the offset
  for different levels of task durations.
- In this strategy offset increases by steps based on settings
  that define the offset for different levels of task durations.
  Task duration is derived from
  {@link TaskModel#getScheduledTime()} and current time.
- This strategy is appropriate for tasks that have a wide range
  of durations and the offset should be scaled based on the task's
  duration.
- The defined keys in the settings compose the duration intervals
  for which the offset will be set to the corresponding value:
  <0, d1) = 0, <d1, d2) = d1, <d2, d3) = d2.
- The order of the keys is not important as the map is sorted by
  the key before the evaluation.
@jaro0149 jaro0149 merged commit 54a66d3 into master Jan 8, 2025
2 checks passed
@jaro0149 jaro0149 deleted the fm-751 branch January 8, 2025 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants