-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(exporter): memory leak and lock contentions #2034
base: stage
Are you sure you want to change the base?
Conversation
# Conflicts: # message/validation/consensus_validation.go # message/validation/partial_validation.go
# Conflicts: # message/validation/common_checks.go # message/validation/consensus_validation.go # message/validation/const.go # message/validation/validation.go
operator/validator/controller.go
Outdated
c.committeesObserversMutex.RLock() | ||
|
||
// Check if the observer already exists | ||
existingObserver, ok := c.committeesObservers.Get(msgID) | ||
if ok { | ||
c.committeesObserversMutex.RUnlock() | ||
return existingObserver | ||
} | ||
|
||
c.committeesObserversMutex.RUnlock() | ||
|
||
c.committeesObserversMutex.Lock() | ||
defer c.committeesObserversMutex.Unlock() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it'a potential race condition because another goroutine might also see non-existance in-between our RUnlock -> Lock and then overwrite us later
if we want to keep this optimization, we could re-check existance after Lock and exit if it already exists
however if you don't think it's a big deal for performance then we can just do this:
c.committeesObserversMutex.RLock() | |
// Check if the observer already exists | |
existingObserver, ok := c.committeesObservers.Get(msgID) | |
if ok { | |
c.committeesObserversMutex.RUnlock() | |
return existingObserver | |
} | |
c.committeesObserversMutex.RUnlock() | |
c.committeesObserversMutex.Lock() | |
defer c.committeesObserversMutex.Unlock() | |
c.committeesObserversMutex.Lock() | |
defer c.committeesObserversMutex.Unlock() | |
// Check if the observer already exists | |
existingObserver, ok := c.committeesObservers.Get(msgID) | |
if ok { | |
return existingObserver | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this optimization is not necessary, I tried it before the last one that worked. I'll revert.
committeesObservers *ttlcache.Cache[spectypes.MessageID, *committeeObserver] | ||
committeesObserversMutex sync.Mutex | ||
// committeeObservers is a cache of initialized committeeObserver instances | ||
committeesObservers *hashmap.Map[spectypes.MessageID, *committeeObserver] // todo: need to evict? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't have to, it's an optimization to stay lean by evicting:
- removed/liquidated/exited validators
- proposal duties
if you don't delete them they accumulate over time, however we can try without it and revert if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we should eventually evict but I felt ttl cache might not be the best here. maybe we could somehow remove it on actual events in the future.
// nonCommitteeInstanceContainerCapacity returns the capacity of InstanceContainer for non-committee validators | ||
func nonCommitteeInstanceContainerCapacity(fullNode bool) int { | ||
if fullNode { | ||
// Helps full nodes reduce | ||
return 2 | ||
} | ||
return 1 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nkryuchkov why do we need locks in ValidatorState and OperatorState, or really anything under the top level item (ConsensusState?), if the top level item is already locked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@moshe-blox I think you're right, perhaps we could remove them
Codecov ReportAttention: Patch coverage is
Additional details and impacted files☔ View full report in Codecov by Sentry. |
Includes