Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: tolerate failed checkins / ready #392

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

redref
Copy link

@redref redref commented Aug 26, 2024

Making checkin process optional (default to activated)

We quite often have apiserver issues which does not allow some pods to update their last-seen annotation, thus some kilo mesh nodes going "unready" and flapping wireguard configuration(s).

With this setup (currently in test our side), we have a more static setup over time. It also reduces calls to apiserver.

I do not see any drawback, but maybe I just lack imagination 😉

Making checkin process optional (default to activated)
Copy link
Owner

@squat squat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over all this looks very reasonable for static/predictable clusters.
Thanks a lot for you contribution @redref!
Please see the suggestions and regenerate the docs and then we'll be good to merge!

cmd/kg/main.go Outdated
@@ -134,6 +135,7 @@ var (

func init() {
cmd.Flags().StringVar(&backend, "backend", k8s.Backend, fmt.Sprintf("The backend for the mesh. Possible values: %s", availableBackends))
cmd.Flags().BoolVar(&checkIn, "check-in", true, "Should kilo regularly check-in in backend")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cmd.Flags().BoolVar(&checkIn, "check-in", true, "Should kilo regularly check-in in backend")
cmd.Flags().BoolVar(&checkIn, "check-in", true, "Should Kilo regularly check in with the backend")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -119,6 +120,7 @@ func main() {
SilenceErrors: true,
}
cmd.PersistentFlags().StringVar(&backend, "backend", k8s.Backend, fmt.Sprintf("The backend for the mesh. Possible values: %s", availableBackends))
cmd.PersistentFlags().BoolVar(&checkIn, "check-in", true, "Should kilo consider check-in (LastSeen) in backend")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cmd.PersistentFlags().BoolVar(&checkIn, "check-in", true, "Should kilo consider check-in (LastSeen) in backend")
cmd.PersistentFlags().BoolVar(&checkIn, "check-in", true, "Should Kilo prune nodes that have not checked in with the backend")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +86 to +91
var checkedIn bool
if (n != nil) && (n.Key != wgtypes.Key{}) && (n.Subnet != nil) && (n.CheckLastSeen) {
checkedIn = time.Now().Unix()-n.LastSeen < int64(checkInPeriod)*2/int64(time.Second)
} else {
checkedIn = true
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to add more conditions to the check-in logic.
Let's keep this scoped to the change at hand for now. If we need to add more safety let's consider that in a different PR / issue.

Suggested change
var checkedIn bool
if (n != nil) && (n.Key != wgtypes.Key{}) && (n.Subnet != nil) && (n.CheckLastSeen) {
checkedIn = time.Now().Unix()-n.LastSeen < int64(checkInPeriod)*2/int64(time.Second)
} else {
checkedIn = true
}
var checkedIn bool
if n.CheckLastSeen {
checkedIn = time.Now().Unix()-n.LastSeen < int64(checkInPeriod)*2/int64(time.Second)
} else {
checkedIn = true
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those tests are here to protect against nil values.
They do not add any logic as they are also in the return statement and would return false anyway.

That was my first implem, but it did not pass the tests. I agree it is not pretty now (return one liner), but did not wanted to refactor too much.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agh yes, I woke up in the middle of the night today and literally said "I understand now". I have a little refactor in mind that might be more legible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants