-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hotfix(exporter): ensure thread-safe committee observer management #1883
base: stage
Are you sure you want to change the base?
Conversation
c.committeesObserversMutex.Lock() | ||
defer c.committeesObserversMutex.Unlock() | ||
|
||
if msg.MsgType == spectypes.SSVConsensusMsgType { | ||
// Check if the observer already exists | ||
existingObserver := c.committeesObservers.Get(ssvMsg.GetID()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't ttlcache
already thread safe? why we need the mutex here then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the mutex, there’s a risk of race conditions where multiple goroutines might attempt to create and set the same observer simultaneously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be also solved by the cache's GetOrSet
func, but involves creating the observer each time and sometimes for nothing. though it will be GC'd quickly if not used, there is no goroutine in the creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that GetOrSet
avoids race conditions without a mutex, but it can lead to unnecessary observer creation, even if quickly garbage collected. The current mutex ensures observers are only created when needed, minimizing overhead for non-trivial operations. If the cost of unused creation is negligible, switching to GetOrSet
could simplify the code. Let me know if we’d like to explore this direction further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (with minor comments)
var nonCommitteeValidatorTTLs = map[spectypes.RunnerRole]int{ | ||
spectypes.RoleCommittee: 64, | ||
spectypes.RoleProposer: 4, | ||
spectypes.RoleAggregator: 4, | ||
//spectypes.BNRoleSyncCommittee: 4, | ||
var committeeObserverValidatorTTLs = map[spectypes.RunnerRole]int{ | ||
spectypes.RoleCommittee: 64, | ||
spectypes.RoleProposer: 4, | ||
spectypes.RoleAggregator: 4, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe clarify that's denominated in slots (and not seconds or something)
func (c *controller) calculateObserverTTL(roleType spectypes.RunnerRole) time.Duration { | ||
ttlSlots := committeeObserverValidatorTTLs[roleType] | ||
return time.Duration(ttlSlots) * c.beacon.GetBeaconNetwork().SlotDurationSec() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably better to "panic" than "default to 0" in case there is a new type introduced and we forgot to add it to committeeObserverValidatorTTLs
map (so we can discover this issue early)
This change resolves a data race by restructuring the management of
committeeObserver
instances