Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hotfix(exporter): ensure thread-safe committee observer management #1883

Open
wants to merge 8 commits into
base: stage
Choose a base branch
from

Conversation

olegshmuelov
Copy link
Contributor

@olegshmuelov olegshmuelov commented Nov 26, 2024

This change resolves a data race by restructuring the management of committeeObserver instances

@nkryuchkov nkryuchkov changed the title Exporter race fix hotfix(exporter): fix data race Nov 26, 2024
c.committeesObserversMutex.Lock()
defer c.committeesObserversMutex.Unlock()

if msg.MsgType == spectypes.SSVConsensusMsgType {
// Check if the observer already exists
existingObserver := c.committeesObservers.Get(ssvMsg.GetID())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't ttlcache already thread safe? why we need the mutex here then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the mutex, there’s a risk of race conditions where multiple goroutines might attempt to create and set the same observer simultaneously.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be also solved by the cache's GetOrSet func, but involves creating the observer each time and sometimes for nothing. though it will be GC'd quickly if not used, there is no goroutine in the creation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that GetOrSet avoids race conditions without a mutex, but it can lead to unnecessary observer creation, even if quickly garbage collected. The current mutex ensures observers are only created when needed, minimizing overhead for non-trivial operations. If the cost of unused creation is negligible, switching to GetOrSet could simplify the code. Let me know if we’d like to explore this direction further.

@olegshmuelov olegshmuelov self-assigned this Dec 3, 2024
@olegshmuelov olegshmuelov changed the title hotfix(exporter): fix data race hotfix(exporter): ensure thread-safe committee observer management Dec 4, 2024
@olegshmuelov olegshmuelov marked this pull request as ready for review December 4, 2024 09:13
Copy link
Contributor

@iurii-ssv iurii-ssv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (with minor comments)

Comment on lines -368 to +371
var nonCommitteeValidatorTTLs = map[spectypes.RunnerRole]int{
spectypes.RoleCommittee: 64,
spectypes.RoleProposer: 4,
spectypes.RoleAggregator: 4,
//spectypes.BNRoleSyncCommittee: 4,
var committeeObserverValidatorTTLs = map[spectypes.RunnerRole]int{
spectypes.RoleCommittee: 64,
spectypes.RoleProposer: 4,
spectypes.RoleAggregator: 4,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe clarify that's denominated in slots (and not seconds or something)

Comment on lines +425 to +428
func (c *controller) calculateObserverTTL(roleType spectypes.RunnerRole) time.Duration {
ttlSlots := committeeObserverValidatorTTLs[roleType]
return time.Duration(ttlSlots) * c.beacon.GetBeaconNetwork().SlotDurationSec()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably better to "panic" than "default to 0" in case there is a new type introduced and we forgot to add it to committeeObserverValidatorTTLs map (so we can discover this issue early)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants