-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
State Sync Health Check #1892
State Sync Health Check #1892
Conversation
…ithInvalidAncestor`
…put) instead of `O` (output)
During state sync, blocks are accepted without proper verification since the VM lacks state. Add `preRejectedSubs` to tracking if these blocks get rejected when VM transitions to normal operation.
…ot verified on Reject
snow/vm.go
Outdated
stateSync, err := v.stateSyncHealthChecker().HealthCheck(ctx) | ||
if err != nil { | ||
return nil, fmt.Errorf("state sync health check failed: %w", err) | ||
} | ||
|
||
return map[string]interface{}{ | ||
"stateSync": stateSync, | ||
}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will ignore any health checks registered through AddHealthCheck
.
Could we maintain a map of named health checkers, include all of the results when HealthCheck
is called, and return errors.Join(...)
of any errors that occur?
snow/vm.go
Outdated
func (v *VM[I, O, A]) AddHealthCheck(name string, healthChecker health.Checker) { | ||
v.healthCheckers[name] = healthChecker | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a lock around healthCheckers
. HealthCheck
may be called async by AvalancheGo, so we need to support concurrent R/W.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we return an error if there's a name collision?
snow/vm.go
Outdated
func (v *VM[I, O, A]) stateSyncHealthChecker() *stateSyncHealthChecker[I, O, A] { | ||
if checker, ok := v.healthCheckers[StateSyncHealthCheckerNamespace].(*stateSyncHealthChecker[I, O, A]); ok { | ||
return checker | ||
} | ||
newChecker := newStateSyncHealthChecker(v) | ||
v.healthCheckers[StateSyncHealthCheckerNamespace] = newChecker | ||
|
||
return newChecker | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this function, or could we just register this check when we transition from dynamic state sync to normal operation and not provide a helper to access it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed, removed. Thanks!
snow/health_check.go
Outdated
"sync" | ||
) | ||
|
||
const StateSyncHealthCheckerNamespace = "stateSyncHealthChecker" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we update the string to snowStateSync
?
snow/health_check.go
Outdated
// accepted during state sync will eventually be rejected in favor of valid blocks. | ||
type stateSyncHealthChecker[I, O, A Block] struct { | ||
vm *VM[I, O, A] | ||
unresolvedBlocks map[ids.ID]struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use set.Set[ids.ID]
here? This is just a convenience that we commonly use, so we can keep the code consistent.
snow/health_check.go
Outdated
return s | ||
} | ||
|
||
func (s *stateSyncHealthChecker[I, O, A]) HealthCheck(_ context.Context) (interface{}, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: health check will be called async periodically from AvalancheGo, so this may be racy
Co-authored-by: aaronbuchwald <[email protected]> Signed-off-by: Elvis <[email protected]>
…rified blocks as rejected
Closes: #1878
This patch set introduces a new health check mechanism for the Snow VM, which monitors VM readiness and tracks vacuously verified blocks.