Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stack persistent volume claim #72

Closed
wants to merge 5 commits into from

Conversation

pmorillon
Copy link
Contributor

Description

A proposal to fix #42 like PR #63, taking into account the new method to download Hook artifacts with a Job (previously with an init container). PVC accessModes are checked to disable hook downloader Job and inform user when ReadWriteOnce is used (default to ReadWriteMany).

Why is this needed

Fixes: #42

How Has This Been Tested?

Only tested with ReadWriteOnce PVC.
Checked generated manifests with helm install --debug --dry-run...

How are existing users impacted? What migration steps/scripts do we need?

Actual default behavior should be preserved

Checklist:

I have:

  • updated the documentation and/or roadmap (if required)
  • added unit or e2e tests
  • provided instructions on how to upgrade

Signed-off-by: Pascal Morillon <[email protected]>

You are using no persistence or ReadWriteOnce access modes on PersistentVolumeClaim, Job `download-hook` is disabled.
You will need to manage hooks artifacts manually.
Also stack Deployment is not scalable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pmorillon, thanks for this PR. What does this mean, the stack deployment is not scalable? is this only when the HookOS download job is false? or in general?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove this confusing note and add comment into values file and update _helper to apply this behavior :

  • stack.hook.enabled to false : no download job, no PVC, no hook-artifacts Volume
  • stack.hook.enabled to true (default) :
    • stack.hook.persistence.enabled to false : volume is an empty dir, so no download job, no PVC
    • stack.hook.persistence.enabled to true (default) :
      • stack.hook.persistence.type = hostpath (default) : volume is an hostpath, download job enabled, no PVC. Only works with a one worker node cluster for hook downloader and scale up.
      • stack.hook.persistence.type = pvc : volume is based on a PVC
        • stack.hook.persistent.accessModes contains ReadWriteMany (default) : download job enabled. Will works on multinode cluster and scale up is possible.
        • stack.hook.persistent.accessModes contains ReadWriteOnce : download job is disabled (job won't be able to mount already mounted volume), scale up will not be possible.

@@ -1,4 +1,5 @@
{{- if .Values.stack.hook.enabled }}
{{- $hookDownloadJobEnabled := eq (include "stack.hookDownloadJobEnabled" .) "true" -}}
{{- if and .Values.stack.hook.enabled $hookDownloadJobEnabled }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its hard to tell where the corresponding {{- end }} is for this. mind putting the corresponding{{- end }} at the same level indent?

@@ -12,6 +12,20 @@ stack:
lbClass: kube-vip.io/kube-vip-class
# Once the Kubernetes Gateway API is more stable, we will use that for all services instead of nginx.
image: nginx:1.25.1
persistence:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if i understand correctly, persistence is only related to the HookOS download? if that is the case, would this block be more ideal under hook:?

@pmorillon pmorillon marked this pull request as draft October 27, 2023 08:17
@pmorillon pmorillon marked this pull request as ready for review October 27, 2023 08:21
@jacobweinstock
Copy link
Member

Hey @pmorillon , thanks for this. Looks good. Just going to run a few test scenarios locally and then we'll merge it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename this file _hook.tpl.

persistentVolumeClaim:
claimName: {{ .Values.stack.hook.persistence.existingClaim | default "hook-artifacts" }}
{{- else }}
{{- fail "value for .Values.stack.hook.persistence.type is not as expected" }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{{- fail "value for .Values.stack.hook.persistence.type is not as expected" }}
{{- fail "value for .Values.stack.hook.persistence.type is unsupported" }}

Comment on lines 28 to 32
## type hostpath or pvc
# hostpath : only works on a single worker node cluster
type: hostpath
## When type is pvc
# storageClassName: default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## type hostpath or pvc
# hostpath : only works on a single worker node cluster
type: hostpath
## When type is pvc
# storageClassName: default
# The storage type to use for persisting HookOS artifacts. The hostpath
# option can only be used with single worker node clusters.
# Options: hostpath, pvc.
type: hostpath
# The PVC storage class to use. Only valid with type=pvc.
# storageClassName: default

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may also be useful to include links to Kubernetes documentation.

# storageClassName: default

## PVC accessModes :
# Require ReadWriteMany in order to download Hook and scale up kubernetes tink-stack Deployment resource.
Copy link
Member

@chrisdoherty4 chrisdoherty4 Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence has little context which makes it a somewhat confusing on what I should set. Specifically, I'm left wondering what ReadWriteMany has to do with scaling a tink-stack deployment and subsequently need to dive into the templates to try and understand.

Perhaps this could be worded as "If you include ReadWriteMany it will allow X to happen. This will accommodate scaling the Tink Stack because Y".

Also, what happens if multiple access mode are included?

Comment on lines 40 to 41
# annotations: {}
# selectorLabels: {}
Copy link
Member

@chrisdoherty4 chrisdoherty4 Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# annotations: {}
# selectorLabels: {}
# Annotations to include on the PVC object.
annotations: {}
# An optional selector to narrow the Volumes considered for binding to the PVC.
# Should be structured the same as a raw PVC selector (see documentation for more info).
#
# https://kubernetes.io/docs/concepts/storage/persistent-volumes/#selector.
selector: {}

I'm suggesting changing the selector so users can define the selector using expressions as well as labels.

enabled: true
## type hostpath or pvc
# hostpath : only works on a single worker node cluster
type: hostpath
Copy link
Member

@chrisdoherty4 chrisdoherty4 Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to throw a spanner in the works, but when I looked at this again it made me wonder if we'd be better using keys to represent the persistence type avoiding the 'only use this if using pvc' commentary about each of the items below.

E.g.

hook:
  persistence:
    hostpath: {}
    pvc:
      storageClassName: ""
      ...

The user should specify only 1. We would want a comment on persistence indicating how to use it.

@jacobweinstock Do you have a preference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With your suggestion we could still use type: and then both hostpath: and pvc: could be populated without issue.

hook:
  persistence:
    # type must be either "hostpath" or "pvc"
    # hostpath: only works on a single worker node cluster
    type: hostpath
    hostpath: {}
    pvc:
      storageClassName: ""
      ...

Copy link
Member

@chrisdoherty4 chrisdoherty4 Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to that, though I don't think its strictly necessary because we can test for the presence of either hostpath or pvc keys. I'm not sure which is clearer so I'm good either way.

That said, I don't think there would be a need for hostpath: {} at all because we don't expose options yet. We have a path forward if we do want to specify options though.

pmorillon added a commit to Eskemm-Numerique/tinkerbell-charts that referenced this pull request Dec 7, 2023
@pmorillon
Copy link
Contributor Author

I took into account all your comments in the last commit

Signed-off-by: Pascal Morillon <[email protected]>
@chrisdoherty4
Copy link
Member

Thank you, this is awesome.

Copy link
Member

@jacobweinstock jacobweinstock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmorillon, thanks for this. Almost there. I left a few comments for your review. Thank you!

@@ -0,0 +1,14 @@
{{- define "stack.hookDownloadJobEnabled" -}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get a code comment here about what this is and why? Reading the code is a little hard to follow and will require that I re-understand it every time I come back to it. Also, i don't know if I understand it in the first place.

Is this logic to determine if downloading HookOS should occur?
Does this logic say that if there is a pvc with accessMode of ReadWriteOnce then we won't download HookOS? if so, why is that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this logic to determine if downloading HookOS should occur?

Yes

Does this logic say that if there is a pvc with accessMode of ReadWriteOnce then we won't download HookOS? if so, why is that?

Because you can't determine on which node will be scheduled the Kubernetes Job, so stack Deployment resource already mounting the PVC, the download job will fail to mount the PVC

@@ -1,4 +1,5 @@
{{- if .Values.stack.hook.enabled }}
{{- $hookDownloadJobEnabled := eq (include "stack.hookDownloadJobEnabled" .) "true" -}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this comparison redundant?

true == true is true
false == true is false

couldn't we just use include "stack.hookDownloadJobEnabled" directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mind renaming this file to nginx_pvc.yaml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, why not

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious if this empty file provides us something or if it can be removed.

@jacobweinstock
Copy link
Member

Looks like this branch needs a rebase of main.

@mddeff
Copy link
Contributor

mddeff commented May 7, 2024

@chrisdoherty4 @jacobweinstock

I came to this PR after following #42 and #63 while in search of a solution for hook that didnt use hostPath. It looks like we (and by we, I mean @pmorillon) were SO close to getting this merged.

If the final code-comment you requested is added and re-base complete, can we get this merged? Using hostPath could be considered a blocker to running Tinkerbell in many production environments.

I love what the Tinkerbell project is doing, awesome stuff!

jacobweinstock pushed a commit to jacobweinstock/charts that referenced this pull request Oct 5, 2024
jacobweinstock pushed a commit to jacobweinstock/charts that referenced this pull request Oct 5, 2024
jacobweinstock pushed a commit to jacobweinstock/charts that referenced this pull request Oct 5, 2024
@jacobweinstock jacobweinstock mentioned this pull request Oct 5, 2024
3 tasks
jacobweinstock pushed a commit to jacobweinstock/charts that referenced this pull request Oct 21, 2024
jacobweinstock pushed a commit to jacobweinstock/charts that referenced this pull request Oct 21, 2024
jacobweinstock pushed a commit to jacobweinstock/charts that referenced this pull request Oct 21, 2024
jacobweinstock added a commit that referenced this pull request Oct 22, 2024
Persistence:

## Description

<!--- Please describe what this PR is going to change -->
Update Tink-stack to use a PVC for the HookOS artifacts download. This allows single node and multi-node Kubernetes cluster to work for HookOS artifact serving. What used to be a Kubernetes Job to download HookOS, is now a container in the tink-stack pod. This removes the challenges of getting a Kubernetes Job to download HookOS on the same node as the tink-stack pod. This is only an issue with the local persistent volume.

## Why is this needed

<!--- Link to issue you have raised -->

Fixes: #72
Fixes: #63 

## How Has This Been Tested?
<!--- Please describe in detail how you tested your changes. -->
<!--- Include details of your testing environment, and the tests you ran to -->
<!--- see how your change affects other areas of the code, etc. -->


## How are existing users impacted? What migration steps/scripts do we need?

<!--- Fixes a bug, unblocks installation, removes a component of the stack etc -->
<!--- Requires a DB migration script, etc. -->


## Checklist:

I have:

- [ ] updated the documentation and/or roadmap (if required)
- [ ] added unit or e2e tests
- [ ] provided instructions on how to upgrade
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hook artifacts not available for multi node clusters
4 participants