Implement Concurrent Action Execution in `executeTask` #85

naveensrinivasan · 2024-02-26T19:22:55Z

Body:

Summary

The current implementation of the executeTask method in runner.go executes each action within a task sequentially. This approach ensures simplicity and straightforward execution flow but may not efficiently utilize system resources, especially when actions are independent and could be executed concurrently. This proposal suggests enhancing the executeTask method to support concurrent execution of actions, potentially improving performance and reducing task completion time.

Motivation

Tasks that contain multiple independent actions can benefit significantly from concurrent execution. In the current sequential execution model, the total execution time is the sum of all actions' execution times. By executing actions concurrently, we can reduce the total execution time to the duration of the longest-running action, assuming sufficient system resources. This improvement is particularly beneficial for tasks with many independent actions or tasks where actions are I/O bound and spend significant time waiting.

This should reduce the wait time to bootstrap an env

Proposed Changes

Concurrency Model: Utilize Go's concurrency model, leveraging goroutines and channels, to execute actions within a task concurrently.
Error Handling: Implement a mechanism to collect and handle errors from concurrently executed actions. This could involve aggregating errors and returning a composite error if multiple actions fail.
Environment Variables Management: Ensure that the setup of default environment variables (defaultEnv) and task-specific environment variables (task.EnvPath) is thread-safe and correctly applied to each action executed concurrently.
Dependency Management: Introduce a way to specify dependencies between actions, if necessary. This could be important for tasks where certain actions must be executed in sequence due to dependencies.

Implementation Details

Use a sync.WaitGroup to wait for all goroutines (actions) to complete.
Use a buffered channel to collect errors from all concurrently executed actions. This channel allows us to aggregate errors and handle them appropriately after all actions have been executed.
Ensure that the modification of any shared state (e.g., environment variables) is thread-safe. This may involve using mutexes or designing the system in a way that avoids shared mutable state.

Risks and Mitigations

Race Conditions: Introducing concurrency can lead to race conditions if actions modify shared state. We will mitigate this risk by avoiding shared mutable state or using synchronization primitives (e.g., mutexes) where necessary.
Error Handling Complexity: Concurrent execution makes error handling more complex, as multiple actions could fail simultaneously. We will address this by aggregating errors and ensuring that the user is informed of all issues that occur during task execution.
Resource Saturation: Executing many actions concurrently could overwhelm system resources. We will mitigate this by allowing the user to configure the maximum number of concurrent actions, providing a balance between concurrency and resource utilization.

Prototype

func (r *Runner) executeTask(task types.Task) error {
    if len(task.Files) > 0 {
        if err := r.placeFiles(task.Files); err != nil {
            return err
        }
    }

    defaultEnv := []string{}
    for name, inputParam := range task.Inputs {
        d := inputParam.Default
        if d == "" {
            continue
        }
        defaultEnv = append(defaultEnv, formatEnvVar(name, d))
    }

    // load the tasks env file into the runner, can override previous task's env files
    if task.EnvPath != "" {
        r.envFilePath = task.EnvPath
    }

    // Use a WaitGroup to wait for all goroutines to finish
    var wg sync.WaitGroup
    // Channel to collect errors from goroutines
    errChan := make(chan error, len(task.Actions))

    for _, action := range task.Actions {
        wg.Add(1)
        go func(action types.Action) {
            defer wg.Done()
            action.Env = mergeEnv(action.Env, defaultEnv)
            if err := r.performAction(action); err != nil {
                errChan <- err
                return
            }
            errChan <- nil
        }(action)
    }

    // Wait for all actions to complete
    wg.Wait()
    close(errChan)

    // Check for errors
    for err := range errChan {
        if err != nil {
            return err // or accumulate errors if you prefer
        }
    }

    return nil
}

not tested or compiled

Conclusion

Adopting concurrent action execution in the executeTaskmethod has the potential to significantly improve performance for tasks with multiple independent actions. By carefully managing the risks associated with concurrency, we can make the task execution process more efficient and responsive to user needs.

The text was updated successfully, but these errors were encountered:

zachariahmiller · 2024-02-28T14:30:16Z

Hey @naveensrinivasan maybe i'm misunderstanding, but i dont think we ever want any tasks to ever be executed concurrently unless there is a mechanism in the task file's syntax explicitly indicating that should be the behavior or multiple tasks are called in a uds run command execution.

That being said, i had created a related [issue](https://github.com/defenseunicorns/uds-cli/issues/187) around when the task runner got added to this project where concurrent task execution would come into play.

The only other scenarios i can think of would again require another keyword to indicate that a task can run out of order of the other tasks (i think this might get convoluted in terms of the yaml syntax and we dont want that) or to allow multiple tasks to be executed by uds run like uds run lint test build and have a defined, documented behavior that running things like this will execute concurrently.

Outside of those scenarios and the use case described in the linked issue i think we'd want to be careful about how we approach that.

I'll definitely defer to @UncleGedd here if he has more specific thoughts, but just wanted to make you aware of that other issue that had been created. Thanks for the thorough issue definition here!

naveensrinivasan · 2024-02-28T16:58:11Z

Given the concerns raised about concurrent tasks in Kubernetes, it's crucial to understand how Kubernetes' architecture and its principle of eventual consistency effectively manage such scenarios. Kubernetes is designed to handle many operations concurrently, ensuring system stability and reliability. Here's a detailed explanation:

Kubernetes and Eventual Consistency

Kubernetes employs an eventual consistency model through its controllers and control loops. Each controller watches for changes to specific resources, analyzing the current state against the desired state and acting to reconcile any discrepancies. This continuous process allows Kubernetes to adapt to changes and maintain the desired system state over time.

Controller Lifecycle and Concurrent Tasks

The lifecycle of a Kubernetes controller can be summarized in a continuous loop of observing changes, analyzing the current versus desired state, and acting to reconcile. This loop ensures that Kubernetes can handle concurrent tasks without leading to system instability. Here's a simplified diagram illustrating this process:

graph TD;
    A[Start] --> B{Observe Changes};
    B --> C[Analyze Current vs. Desired State];
    C --> D{Is Reconciliation Needed?};
    D -->|Yes| E[Act to Reconcile];
    E --> B;
    D -->|No| F[Wait for Next Change];
    F --> B;

Why Concurrent Tasks Do Not Pose an Issue

Decoupled Operations: Kubernetes controllers operate independently, allowing them to manage their respective resources concurrently without interference.
Continuous Reconciliation: The control loop's design ensures that any inconsistencies introduced by concurrent tasks are detected and corrected in subsequent iterations.
Concurrency Management: Kubernetes employs mechanisms like optimistic concurrency control to handle simultaneous updates, preventing conflicts.
Resilience to Failures: The system's design inherently provides resilience to temporary failures or delays, common in concurrent operations. Controllers will retry operations as needed to achieve the desired state.

Caveats and Considerations

Despite the robustness of Kubernetes' design, certain scenarios require careful management of task sequences and dependencies to avoid issues:

Persistent Volume Claims (PVCs) and Persistent Volumes (PVs): PVCs created before the availability of a suitable PV can remain pending, disrupting dependent workloads.
ConfigMaps and Secrets as Volume Mounts: Pods failing to start if referenced ConfigMaps or Secrets are not pre-existing, necessitating recreation or restart of the Pod.
Service Dependencies: Workloads may encounter errors or fail to function correctly if dependent services are not yet created or are misconfigured.
Ingress and Service Dependencies: Misordered creation of Ingress and Services can lead to non-functional ingress until the Service is correctly configured.
Role-Based Access Control (RBAC) Settings: Workloads requiring specific Kubernetes API permissions might fail if the necessary RBAC settings are not established beforehand.
Custom Resource Definitions (CRDs) and Custom Resources (CRs): CRs applied before their CRDs can fail, common in environments with operators and custom controllers.
Network Policies: Improperly ordered application of Network Policies can block necessary Pod communication.
Init Containers Dependencies: Pods with init containers may be affected if these containers depend on resources that are not yet available.

Mitigation Strategies

For these scenarios, incorporating retry logic, readiness checks, and careful planning of task sequences can mitigate issues by ensuring that dependent resources are fully ready before proceeding with subsequent operations.

Conclusion

While Kubernetes' architecture and eventual consistency model provide a solid foundation for managing concurrent tasks, awareness of and planning for specific dependency scenarios are crucial. By understanding these caveats and employing strategic task management, we can leverage Kubernetes' strengths while mitigating potential issues related to resource dependencies and operation orders.

UncleGedd · 2024-02-29T20:53:23Z

Thanks for the conversation, concurrency isn't a priority atm but I'm sure it will be valuable in the future. I think for now we can table this issue while the runner code gets migrated to its own repo, and I'll transfer this issue to the new repo

zachariahmiller · 2024-02-29T22:38:21Z

I'll be honest I'm kind of confused by this discussion there is a lot of talk about kubernetes, but this function under uds run has nothing to do with kubernetes. Apologies if I'm missing something here but this is my reasoning:

It (uds run) is a task runner that essentially executes shell commands for use in CI and locally similar to what you would use make or Taskfile for. Also similar to say job execution in CI systems such as gitlab ci or GitHub workflows.

You might use a task to run a command that does something in kubernetes (like kubectl) but that is the tool this is executing doing that, not the tasks themselves and therefore assumptions based on how kubernetes works should not be built into the way we handle execution in uds run tasks.

naveensrinivasan · 2024-02-29T23:18:22Z

I'll be honest I'm kind of confused by this discussion there is a lot of talk about kubernetes, but this function under uds run has nothing to do with kubernetes. Apologies if I'm missing something here but this is my reasoning:

It (uds run) is a task runner that essentially executes shell commands for use in CI and locally similar to what you would use make or Taskfile for. Also similar to say job execution in CI systems such as gitlab ci or GitHub workflows.

You might use a task to run a command that does something in kubernetes (like kubectl) but that is the tool this is executing doing that, not the tasks themselves and therefore assumptions based on how kubernetes works should not be built into the way we handle execution in uds run tasks.

My ignorance! Thanks for the clarification! I appreciate it! My k8s post was me being a noob

zachariahmiller · 2024-02-29T23:47:21Z

I'll be honest I'm kind of confused by this discussion there is a lot of talk about kubernetes, but this function under uds run has nothing to do with kubernetes. Apologies if I'm missing something here but this is my reasoning:

It (uds run) is a task runner that essentially executes shell commands for use in CI and locally similar to what you would use make or Taskfile for. Also similar to say job execution in CI systems such as gitlab ci or GitHub workflows.

You might use a task to run a command that does something in kubernetes (like kubectl) but that is the tool this is executing doing that, not the tasks themselves and therefore assumptions based on how kubernetes works should not be built into the way we handle execution in uds run tasks.

My ignorance! Thanks for the clarification! I appreciate it! My k8s post was me being a noob

No worries. Totally understandable... it's a lot to try to digest all at once!

zachariahmiller · 2024-03-01T00:06:53Z

@naveensrinivasan just to close out this thread. We definitely would still want to retain the ability in bundle deployments (ie uds deploy) to explicitly control ordering of package deployments, but this issue is targeted at trying to find a way to sanely implement concurrency for certain packages to complement the sequential ordering that exists today.

defenseunicorns/uds-cli#30

If there are no objections though, I think this specific issue can be closed.

UncleGedd · 2024-03-01T14:11:33Z

We can keep this issue, I'll migrate it over to the new runner repo once that is all set up. FWIW I could see a use case for concurrent tasks when doing things like building multiple, independent Zarf pkgs

naveensrinivasan · 2024-03-01T15:26:40Z

We can keep this issue, I'll migrate it over to the new runner repo once that is all set up. FWIW I could see a use case for concurrent tasks when doing things like building multiple, independent Zarf pkgs

Sounds good. Thanks!

Racer159 · 2024-10-22T22:18:17Z

closing in favor of #138 #131 and #164 since these focus on some of the more specific changes

UncleGedd transferred this issue from defenseunicorns/uds-cli May 15, 2024

Racer159 closed this as completed Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Concurrent Action Execution in `executeTask` #85

Implement Concurrent Action Execution in `executeTask` #85

naveensrinivasan commented Feb 26, 2024

zachariahmiller commented Feb 28, 2024 •

edited

Loading

naveensrinivasan commented Feb 28, 2024

UncleGedd commented Feb 29, 2024

zachariahmiller commented Feb 29, 2024

naveensrinivasan commented Feb 29, 2024

zachariahmiller commented Feb 29, 2024

zachariahmiller commented Mar 1, 2024

UncleGedd commented Mar 1, 2024

naveensrinivasan commented Mar 1, 2024

Racer159 commented Oct 22, 2024

Implement Concurrent Action Execution in executeTask #85

Implement Concurrent Action Execution in executeTask #85

Comments

naveensrinivasan commented Feb 26, 2024

Body:

Summary

Motivation

Proposed Changes

Implementation Details

Risks and Mitigations

Prototype

Conclusion

zachariahmiller commented Feb 28, 2024 • edited Loading

naveensrinivasan commented Feb 28, 2024

Kubernetes and Eventual Consistency

Controller Lifecycle and Concurrent Tasks

Why Concurrent Tasks Do Not Pose an Issue

Caveats and Considerations

Mitigation Strategies

Conclusion

UncleGedd commented Feb 29, 2024

zachariahmiller commented Feb 29, 2024

naveensrinivasan commented Feb 29, 2024

zachariahmiller commented Feb 29, 2024

zachariahmiller commented Mar 1, 2024

UncleGedd commented Mar 1, 2024

naveensrinivasan commented Mar 1, 2024

Racer159 commented Oct 22, 2024

Implement Concurrent Action Execution in `executeTask` #85

Implement Concurrent Action Execution in `executeTask` #85

zachariahmiller commented Feb 28, 2024 •

edited

Loading