Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrency limiter controller #699

Open
imjasonh opened this issue Jan 26, 2021 · 19 comments
Open

Concurrency limiter controller #699

imjasonh opened this issue Jan 26, 2021 · 19 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@imjasonh
Copy link
Member

Opening this issue to collect ideas, discussion, interest, etc., for a supplemental PipelineRun controller (and possibly TaskRun controller?) that manages Pending PipelineRuns and update them to a Running state to limit execution concurrency.

We've heard a few use cases for limiting execution concurrency, but so far it's been hard to generalize the various needs into one single unified "concurrency" concept that we can apply across all of Tekton Pipelines. Some users might only want to have "deployment" pipeline running at a time, across the whole cluster. Others might want one "deployment" pipeline per namespace, or per deployment target (only one pipeline can deploy to Prod at a time, but you can deploy to Prod and Staging at the same time), or per input source (only deploy my Git repo to one place at a time), or per authorizing user (Alice can only deploy to one place at a time).

Users might also want to limit TaskRun concurrency, either when run as part of a PipelineRun or when executed directly.

We can experiment with supporting these various models and provide a runnable example of limiting concurrency, that users can adapt to their own needs.


As an initial idea, a concurrency controller could be configured with a ConfigMap describing a concurrency key format, and a concurrency limit:

kind: ConfigMap
metadata:
  name: concurrency-controller
data:
  concurrency-key: $(metadata.namespace)-$(spec.pipelineRef.name)
  concurrency-limit: 3

In this example, the key would limit the execution of PipelineRuns referencing the same Pipeline, running in the same namespace, to a max of 3. The concurrency controller would watch for Pending PipelineRuns, derive their keys, count ongoing PipelineRuns with the matching key, and choose to start the new Pending PipelineRun if count < limit. When a PipelineRun finishes, the concurrency controller would reevaluate any Pending runs, and choose one to start if it's under the limit.

(This is just one idea for describing this, if you have something else in mind please contribute it below)

@imjasonh imjasonh added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 26, 2021
@ghost
Copy link

ghost commented Jan 26, 2021

Here are the issues / PRs / TEPs related to this that I have seen so far:

Big +1 from my pov on making this a component external to Pipelines.

@bigkevmcd
Copy link
Member

Should there be some sort of load-shedding?

Can you queue PipelineRuns for ever? Do they timeout?

@imjasonh
Copy link
Member Author

imjasonh commented Jan 26, 2021

cc @jbarrick-mesosphere for his work on the Pending TEP

@imjasonh
Copy link
Member Author

imjasonh commented Jan 26, 2021

Should there be some sort of load-shedding?

Can you queue PipelineRuns for ever? Do they timeout?

Excellent question! This seems like another useful configuration for the limiter, max age before dropping it on the ground.

Users might also want to be able to describe/derive a priority, which would weight a Pending PipelineRun ahead of others in the same concurrency bucket. edit: Along with priority comes preemption -- e.g., a new high-priority Pending PipelineRun should cancel an ongoing run to make room for it.

Ultimately the deliverable here isn't a production-grade maximally configurable controller, just a minimally useful example that operators can potentially modify to their own needs.

@mjgallag
Copy link

mjgallag commented Feb 28, 2021

I'm currently facing this issue trying to do "branch preview", i.e. building and deploying each branch on every push to separate urls. Multiple pushes to multiple branches can run in parallel but multiple pushes to a single branch should be processed in order one at a time. I believe this use case would require the concurrency key format to have access to PipelineRun fields so that branch name could be included.

@julweber
Copy link

+1

1 similar comment
@eccox
Copy link

eccox commented Jul 1, 2021

+1

@dbazhal
Copy link

dbazhal commented Jul 8, 2021

Plusing simultaneous pipelineruns limit.

Would like something as simple as

kind: Pipeline
...
spec:
    runPolicy:
      type: Parallel
      parallel:
          maxLimit: 3

with alternatives as Sequential, and LatestOnly, first executing run requests in natural order, starting next one when previous finishes or cancelled, and last one cancelling any previous runs as the new run is created.

And i expect that this functionality is tekton operator domain, because it would be strange if some external would decide should pipeline operator start processing next run, or should it wait.

I assume pending state is for situations like this.

I refer to openshift operators processing parallelism for BuildConfigs and Builds as straight analogy and good example how it should be done.

https://docs.okd.io/4.7/cicd/builds/advanced-build-operations.html#builds-build-run-policy_advanced-build-operations

https://github.com/openshift/openshift-controller-manager/blob/461fe64e30847a5ae9c361500d7434d2f1756de2/pkg/build/controller/build/build_controller.go#L714

https://github.com/openshift/openshift-controller-manager/blob/461fe64e30847a5ae9c361500d7434d2f1756de2/pkg/build/controller/policy/serial.go

@dbazhal
Copy link

dbazhal commented Jul 8, 2021

I suppose run policy is also somehow connected with tektoncd/operator#209

@dbazhal
Copy link

dbazhal commented Jul 8, 2021

As an alternative to operator functionality, i can make pipeline runs lock on something with first task of pipeline, and release lock with the finally. But it would break any pipeline run timing metrics as pipelines will start running much longer(including lock release wait time). I'd like pipeline duration numbers contain only "useful" info, showing how long task execution took, but not how long pipeline was waiting for another run to complete.

@juliaaano
Copy link

This is an important feature I have found and used in most CI systems.

Usually it is not affordable having two pipeline runs running at the same time if they modify a shared resource, such as if they result in api calls to a single instance of a system.

An approach like the one used in GitHub Actions seems an elegant way of implementing this feature: https://docs.github.com/en/actions/learn-github-actions/workflow-syntax-for-github-actions#concurrency

@dmikalova
Copy link

dmikalova commented Oct 26, 2021

This would be useful for serializing Terraform runs:

  • If I have several triggers around the same time Terraform should only run one at a time in the order they came in.
  • An option to cancel intermediate runs - so that any pending runs are cancelled by newer pending runs, but running runs are not cancelled.
  • The queue should be keyed - it's not so much the Terraform pipeline that needs to be serialized, but Terraform runs of a specific key that need to be serialized.

I was able to implement this in Jenkins but the syntax for it was torturous.

@tekton-robot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 24, 2022
@dbazhal
Copy link

dbazhal commented Jan 24, 2022

/remove-lifecycle stale
/lifecycle frozen

@tekton-robot tekton-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 24, 2022
@david972
Copy link

david972 commented Nov 2, 2022

+1

@shaharb-hs
Copy link

👍

@AshwinSridharan0410
Copy link

Hi. I would like to run my databases parallely so that when I give the flyway command, it should happen parallely to all the databases. I dont want the process to happen sequentially.Any idea would be helpful

@emirot
Copy link

emirot commented Jul 31, 2023

Any updates on that ?
Found that workaround https://holly-k-cummins.medium.com/using-lease-resources-to-manage-concurrency-in-tekton-builds-344ba84df297
but this is not native and does not have ordering.

@jimmyjones2
Copy link

With TEP-0135 coscheduling mode it'll delete PVCs when PipelineRuns are finished. Maybe adding a ResourceQuota for number of PVC will therefore limit the number of concurrent PipelineRuns to that limit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
Status: Todo
Status: Todo
Development

No branches or pull requests