-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Argo Workflows to latest version #1639
Update Argo Workflows to latest version #1639
Conversation
You can submit or manage workflows via the Argo CLI. The Argo CLI can be downloaded from the | ||
[Argo Releases](https://github.com/argoproj/argo-workflows/releases) page. After downloading the CLI, you can get your |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be done outside of Nebari. I'm not sure if we want to provide some better solution to install the CLI tool inside Nebari. It's not available via conda that I could find.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Users can download the binary from the releases page and use it inside of Nebari as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested out this PR but I'll like @iameskild to review this PR so that we can have multiple experts with this tool. I think that Argo is critical to Nebari and we need to ensure the user experience is great with this component. kbatch looks like it is not actively being managed and this will likely be our long term workflow solution.
...ri/template/stages/07-kubernetes-services/modules/kubernetes/services/argo-workflows/main.tf
Show resolved
Hide resolved
@iameskild can you review/work with @Adam-D-Lewis on this PR? Also how do we schedule to run the Kubernetes tests? I would like if possible for this PR to aim to have an integration test with using argo so we know that it works. |
To turn off the cluster monitoring on Nebari deployments, simply turn off the feature flag within your | ||
`nebari-config.yaml` file. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should say "To disable Argo Workflows"
I moved the documentation updates to here |
/bot run tests |
@iameskild do you have an idea why the Kubernetes tests are not running for this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks excellent, tested on Local/AWS instance. We need to move the docs to the new one and also we need a bigger task example to run.
@ericdatakelly may have a better example to run |
I think we need several examples to cover different use cases. This looks like a good start, so how about if we make some new issues for other use cases? I'm working on the case in which a user wants to run an ML pipeline with multiple tasks that depend on output/artifacts from previous steps. I was running it from the ARGO UI. Now that Hera Workflows are working, I can move to that implementation (in python code), which is a better fit for the use case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making these update @Adam-D-Lewis!
Although it looks like the Kubernetes test didn't run, I can assume folks have this deployed and working on a fresh install. I instead tried deploying these changes to an existing cluster and unfortunately ran into some problems when trying to update the Argo-Workflow CRDs.
[terraform]: │ Error: rendered manifests contain a resource that already exists. Unable to continue with update: CustomResourceDefinition "clusterworkflowtemplates.argoproj.io" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "argo-workflows"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "dev"
[terraform]: │
[terraform]: │ with module.argo-workflows[0].helm_release.argo-workflows,
[terraform]: │ on modules/kubernetes/services/argo-workflows/main.tf line 6, in resource "helm_release" "argo-workflows":
[terraform]: │ 6: resource "helm_release" "argo-workflows" {
[terraform]: │
[terraform]: ╵
At first I thought adding force_update = true
would do the trick but upon further investigation, this looks like it is an upstream issue with Helm and updating CRDs in general.
- Relevant issue in helm terraform-provider: Upgrading CRDs hashicorp/terraform-provider-helm#944
- Relevant issue in helm: helm template produces different results to stdout vs outputDir helm/helm#11321
At this point, if we need to get around this we might want to "render the CRDs via the helm_template data-source and apply them using kubernetes_manifest..." (quote source).
Thanks for reviewing @iameskild. So what would you say are next steps needed on this PR? |
Hey @Adam-D-Lewis I see two potential workarounds:
Perhaps @costrouc has some other ideas for how to get around this. |
Actually nevermind, I read helm's docs and the idea above doesn't seem likely to work. |
@iameskild I'll work on adding the Argo-Workflow CRDs as kubernetes_manifest since it seems like the most established solution. |
I tested the latest version locally. I think users will need to disable argo_workflows, deploy, then enable, and redeploy before this will work correctly. |
@Adam-D-Lewis is ready for a final review? I think this PR is also needed before #1622. |
@eskild No, it's not quite ready yet. I want to test if users will need to disable argo_workflows, deploy, then enable, and redeploy before this will work correctly. If so, I might want to look at alternatives. I can do this this week. I also want to get some feedback on the update prompt from the other devs before I'd say it's ready for review. There are other things that still need to be done and can be done for #1622, right? This isn't blocking currently, right? |
@iameskild Yeah, I think this is ready for review now. I tested it again today and it seemed to work fine. |
From @dharhas: "You need a link to docs that explain the whole process including how to set up kubectl." |
I made Dharhas's suggested change. Here's the prompt now: I'm assuming Vini's PR (nebari-dev/nebari-docs#315) gets merged before this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great @Adam-D-Lewis!
I just tested this myself and things seem to be working as expected. I just left two minor comments but otherwise, I feel like this is ready to be merged.
Co-authored-by: eskild <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me! As I mentioned yesterday, I tested this on an existing test cluster and everything seemed to work as expected. Thanks @Adam-D-Lewis 🎉
Added a commit to fix some failing test_upgrade tests |
Reference Issues or PRs
Related to #1495, but I wouldn't say this closes it necessarily.
What does this implement/fix?
Enables basic functionality of previously broken argo integration.
The Argo Workflows integration did not allow end users to submit workflows. I looked into this a bit, and found at the moment, you can attempt to submit workflows in 2 ways:
1) From the UI
2) using the argo CLI
Neither way was working for the example argo workflow. This was because the workflow needed some permissions which were not granted by default. This comment (argoproj/argo-workflows#2522 (comment)) gives a decent review of the 3 relevant service accounts at play. We haven't adequately configured the 3rd one listed, the container service account used by the actual pods spun up as part of the workflow. By default, the pods use the default service account for the namespace where the workflow is deployed. That default service account has no permissions by default.
I solved this issue by setting the helm chart configuration to create a service account that workflows can use. I then set the workflow controller config to use that service account by default for all workflows (the service account used can still be overridden). I also upgraded the helm chart, and added some additional user guide documentation on how to submit a workflow from the Argo CLI.
Put a
x
in the boxes that applyTesting