-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hotfix(message/validation): optimize signer state memory usage #1874
base: stage
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!!
The issue I raised doesn't regard this PR, so approving :)
# Conflicts: # message/validation/consensus_validation.go # message/validation/partial_validation.go
@@ -76,15 +69,15 @@ func (os *OperatorState) Set(slot phase0.Slot, epoch phase0.Epoch, state *Signer | |||
} | |||
|
|||
func (os *OperatorState) MaxSlot() phase0.Slot { | |||
os.mu.RLock() | |||
defer os.mu.RUnlock() | |||
os.mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a specific reason for replacing the RWMutex with a write-only Mutex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oleg-ssvlabs RWMutex
consumes more memory, and the OperatorState
memory consumption seems to be a bottleneck in the exporter. I'd use RWMutex
here only if we benchmark it and see a significant improvement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, interesting. Do you happen to have any numbers for comparison? It would be really compelling to see the difference (specifically between mutex
and rwMutex
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's 64 vs 80 bytes for each OperatorState
IIRC, not much of a difference, but I just wanted to squeeze everything out of the state structure because we allocate a lot of them: for each validator for each role for each operator. So I guess on mainnet the total difference would be a few tens of megabytes (~60K validators * 4 roles * ~5-6 avg committee size * 64 vs 80), which is not very much.
I agree that RWMutex
would reduce mutex block time but I think the difference wouldn't be very big. But generally it looks to me as a trade-off and since we're currently fighting with exporter memory issues, I tend to prefer to reduce the memory use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oleg-ssvlabs actually, perhaps we could remove this mutex as @moshe-blox suggested in #2034 (comment). We have validation lock by msg ID, and the message validation doesn't have concurrent checks, so we shouldn't have any data race in OperatorState
and ValidatorState
# Conflicts: # message/validation/common_checks.go # message/validation/consensus_validation.go # message/validation/const.go # message/validation/validation.go
Codecov ReportAttention: Patch coverage is
Additional details and impacted files☔ View full report in Codecov by Sentry. |
Changes:
Calculations of signer state for 50K validators for
MessageCounts
:We store the state for each validator for each of the 6 roles. For each role, we store the state, of each signer, up to 13. For each signer, we store the state of the 64 slots. Then for each slot, we store the signer state. So overall we have up to 50000 * 6 * 13 * 64 ~= 249_600_000 signer states. I'll count the maximal value but the actual value should be lower because the average amount of operators is less than 13.
So if
MessageCounts
is reduced from 48 bytes to 1 then in such case the max theoretical memory consumption should reduce from ~12 GB to ~250 MBChanges in Pyroscope: