Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(exporter): memory leak and lock contentions #2034

Open
wants to merge 48 commits into
base: stage
Choose a base branch
from

Conversation

y0sher
Copy link
Contributor

@y0sher y0sher commented Feb 12, 2025

Comment on lines 369 to 382
c.committeesObserversMutex.RLock()

// Check if the observer already exists
existingObserver, ok := c.committeesObservers.Get(msgID)
if ok {
c.committeesObserversMutex.RUnlock()
return existingObserver
}

c.committeesObserversMutex.RUnlock()

c.committeesObserversMutex.Lock()
defer c.committeesObserversMutex.Unlock()

Copy link
Contributor

@moshe-blox moshe-blox Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it'a potential race condition because another goroutine might also see non-existance in-between our RUnlock -> Lock and then overwrite us later

if we want to keep this optimization, we could re-check existance after Lock and exit if it already exists

however if you don't think it's a big deal for performance then we can just do this:

Suggested change
c.committeesObserversMutex.RLock()
// Check if the observer already exists
existingObserver, ok := c.committeesObservers.Get(msgID)
if ok {
c.committeesObserversMutex.RUnlock()
return existingObserver
}
c.committeesObserversMutex.RUnlock()
c.committeesObserversMutex.Lock()
defer c.committeesObserversMutex.Unlock()
c.committeesObserversMutex.Lock()
defer c.committeesObserversMutex.Unlock()
// Check if the observer already exists
existingObserver, ok := c.committeesObservers.Get(msgID)
if ok {
return existingObserver
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this optimization is not necessary, I tried it before the last one that worked. I'll revert.

committeesObservers *ttlcache.Cache[spectypes.MessageID, *committeeObserver]
committeesObserversMutex sync.Mutex
// committeeObservers is a cache of initialized committeeObserver instances
committeesObservers *hashmap.Map[spectypes.MessageID, *committeeObserver] // todo: need to evict?
Copy link
Contributor

@moshe-blox moshe-blox Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have to, it's an optimization to stay lean by evicting:

  • removed/liquidated/exited validators
  • proposal duties

if you don't delete them they accumulate over time, however we can try without it and revert if needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should eventually evict but I felt ttl cache might not be the best here. maybe we could somehow remove it on actual events in the future.

Comment on lines 215 to 222
// nonCommitteeInstanceContainerCapacity returns the capacity of InstanceContainer for non-committee validators
func nonCommitteeInstanceContainerCapacity(fullNode bool) int {
if fullNode {
// Helps full nodes reduce
return 2
}
return 1
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it unused?

@y0sher y0sher changed the title exporter simplify mtx fix(exporter): memory leak and lock contentions Feb 12, 2025
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nkryuchkov why do we need locks in ValidatorState and OperatorState, or really anything under the top level item (ConsensusState?), if the top level item is already locked?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moshe-blox I think you're right, perhaps we could remove them

Copy link

codecov bot commented Feb 12, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants