-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extension Mechanism Implementation #1833
Conversation
From discussion with @iameskild we need significant control over how and when certain stages are chosen. For example here are some use cases that need to be considered:
A proposal (from me) is to use priority/name.
@iameskild is suggesting a plugin hook to be added which will modify the ordering of the hooks There should be a pre/post for check, maybe nice to have to render/deploy/destroy. def pre_check(stage_name: str):
.... # run this before stage There should be a way to have a schema for that given stage as input and output. Schemas everywhere. @iameskild mentioned that the NebariTerraformStage should have validation on internal methods e.g. input_vars. Some areas next steps:
Subcommand build in to read arbirarry python module We can give back to the jupyterhub /zero to jupyterhub community to have an easy z2jh install. There should be a way to explicitly override the given stages. Shows importance of Devtools to help us solve what is going on before and after stages. How could we do this def post_check(name='infrastructure'):
breakpoint() |
@costrouc here is a basic POC of what an outer priority manager plugin object might do. If you change the priority level of either From here we can implement more complex logic that handles replacing plugins for the same stage, resolves conflicts if the priority level is the same, etc. |
Thanks @iameskild checkout https://github.com/nebari-dev/nebari/blob/extension-mechanism/src/_nebari/stages/base.py#L89-L106 for the basic implementation in here. I don't have plans to make any additions to this function so I definitely think we should add some of the capabilities that you mentioned with being able to filter stages etc. I've done the initial work (still somewhat broken) of moving all commands to a Also the stages seem to sorta work. I've been able to successfully run the render and validate command. Also spent a lot of time working on the schema (still needs a lot more work) and removing all |
Further work was able to get about midway through a local deployment (to kubernetes ingress). |
As I've been working on the schema (the main schema and the ones specific to the InputVars), it got me thinking that it might make more logical sense to make stages (plugins) that are specific to cloud providers as well. At the moment, the Another example is the Although more work upfront, I see the benefits in the future when others want to (natively) support new cloud providers. Another possible downside to this approach is that the number of current plugins will likely grow quite quickly which might reduce the speed with which we can deploy the whole cluster (given the overhead of each The tricky part of this entire extension-mechanism work seems to be how we handle prioritizing and swapping out plugins. cc @costrouc |
@iameskild I think these are great ideas. This work will make "stages" much cheaper to create. I think it absolutely makes sense. The questions I have here is about stages and how they get selected. In the case you are suggesting here we would have certain stages that only apply when a certain config value is set. How do we want to expose that? |
More thoughts on the schema. If and when we break out the Kubernetes Services (ie. Argo-Workflows, conda-store, etc.) and cloud providers into their own plugins, the main schema will need to be modified (namely removing their sections from the main schema). We might want to make it so the main schema is extendible based on the plugins that are installed. This would require a dynamic schema; pydantic has just the tool for this, This gist gives a taste for how this might be accomplished. In the long run this would reduce the size and scope of the main schema to only those components that are used throughout the deployment (name, domain, etc.) and everything else is the responsibility of the plugin to managed. cc @costrouc |
Thank you for thinking about this! YES this is something that needs to be addressed. I think it is important from the start that we get this part right. I would really appreciate you doing some more research about that. Thoughts that I had each stage would have a "reserved" part of the schema. But this looks and feels too verbose. stages:
mystage: { } # arbitrary dict for given stage name. Stage could pass it's own schema Or each stage could "reserve" several keys at the root level. mystage: {}
othervalue: 1 And I'm struggling on thinking how the stages could collaborate on what the schema would look like. Last example is the best that I could think of. Totally agree that |
How does this work with conda packages? |
wouldn't it make more sense to namespace the keys by the stage name so they can never clash. |
@dharhas there shouldn't be any issue. This is using python entryhooks
I'm split on this. Pros:
Cons:
To me backwards compatibility is the main concern for me. |
Backwards compatibility isn't a huge issue for me as long as we have a clean messaging and upgrade path. Our install base is small enough to be manageable and we have relationships with folks using it. |
@dharhas I'm fine with us making this move. I think this schema per stage will need to happen in another PR then. I'd like to make this PR focus on not changing anything and locking down the current schema. We can create a PR which makes this move and I think it will be extremely beneficial. As of now I have this PR working on all clouds. So now I'd say this PR needs cleanup and checking the features are preserved. |
2d12f8d
to
566c14b
Compare
61ae876
to
493a9b9
Compare
I've been thinking quite a bit about this after the initial presentation in one of our syncs. My biggest concern was that the way the extension mechanism is too flexible. While this is usually a good thing for power users, on the flip side it easily gets detrimental for regular ones. Since nebari is about making stuff easy, I would err on the side of the regular users. I would propose something that hopefully also covers all use cases, but is a little simpler:
As it stands, this will then limit nebari to k8s. However, we don't need the full flexibility proposed by this PR to remove this limit again: since the number of deployment targets is fairly low (right now I can think of k8s, HPC, and local), we could just build sets of fixed stages for all the targets that we want to support and within them go with the scheme I proposed above. |
fc6d7cb
to
45dbb40
Compare
c0758a2
to
9c78594
Compare
This took a long time to properly rebase since I had to incorporate #1832 and #1773. The only errors that exist are the ones that we see in the develop branch currently https://github.com/nebari-dev/nebari/actions/runs/5742626558/job/15565133390 |
eddde88
to
79b9066
Compare
The final (hopefully) rebase has been completed. The Digital Ocean provider tests are failing and this is expected given that they recently deprecated kubernetes version 1.24. |
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Chris Ostrouchov <[email protected]> Co-authored-by: eskild <[email protected]> Co-authored-by: Scott Blair <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on our discussions with the team, we agreed to merge this PR and work through bugs and enhancements as they come up. This issue tracks the current list of enhancements: #1894
@aktech I'm going to merge this PR but it will require some work to get it back to running the test_deployment code you wrote. |
Closes nebari-dev/governance#35
Closes #865
Closes #1046
Introduction
This is an initial POC of the extension mechanism. Goals are to provide a public api of a pluggy interface for managing stages and subcommands. With this work the goal is that a majority of what is Nebari becomes a collection of extensions.\
Nebari can finally be oblivious to the stages that are being run and have a much smaller core.
Emphasis on pydantic schema for nebari-config.yaml
The following is now a valid
nebari-config.yaml
. Significant work has been put intonebari.schema
which will provide much better error messages than before. The hope is thatnebari.schema
is THE way to know valid configuration.How will developers use the extension mechanism?
There are several ways developers with extend Nebari.
Via a pip installed package which has the entryhook which will auto register the plugins within the module
Via a command line option to load a plugin module. Multiple
--import-plugin
statements can be usedCurrently we have two plugin which are mentioned below and worth reviewing
nebari.hookspecs.NebariStage
andnebari.hookspecs.Subcommand
. Subcommand is failrly straightforward and you can see several examples in_nebari.subcommands
. The stages are a bit more complicated and many examples are in_nebari.stages.*
. The important thing is now nebari is now fully embracing the extensions and we use the extensions internally ourselves.You can easily replace a stage by specifying a stage with the same name but higher priority. E.g.
Currently there is a class NeabriStage and NebariTerraformStage but it would be easy to add additional stages. The goal is to make extending the stages as easy as possible. There has already been conversation about a
NeabriTerragruntStage
.Key interfaces to Review
nebabi.schema
is now our public view and source of truth for the nebari configurationnebari.hookspecs.NebariStage
this is base class for all Stagesnebari.hookspecs.Subcommand
which exposes a hook on adding arbtraryTyper
subcommands we follow a similar pattern to datasette and conda here._nebari.stages.base.NebariTerraformStage
is a subclass onNebariStage
which implements convenience features to allow terraform stages to be more concise. A majority of the stages use this class.Render, Deploy, and Destroy logic is significantly simpler. For example here is the deploy logic now.
Progress
Progress towards prior functionality
init
guided init
dev
validate
keycloak
support
upgrade
local
providerrender
,deploy
,destroy
existing
providerrender
,deploy
,destroy
gcp
providerrender
,deploy
,destroy
azure
providerrender
,deploy
,destroy
aws
providerrender
,deploy
,destroy
do
providerrender
,deploy
,destroy
Feature ideas
nebari.hookspecs.NebariStage
nebari.hookspecs.Subcommand
priority
name
(currently if a stage has higher priority and same name it will replace the lower priority stages)--import-plugin _nebari.stages.bootstrap
pre_*
andpost_*
plugins to allow for arbitrary code before and after render, deploy, destroyprovider = aws