-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2 Whisk bugs due to sampling with replacment #3609
Comments
Thanks for raising this issue! Replacement was already given headaches (e.g. #3481). Agree this should be addressed. I haven't devoted much time to find an engineering solution, but preventing replacement will increase the cost of selection. A naive implementation like while len(selected) < candidate_tracker_count:
index = select_index()
if index not in selected:
selected.add(index) will become an infinite loop in the case where EDIT: Fixed above inequality from @GottfriedHerold's comment below |
Hm. I think you meant I do not recommend using a while loop and (re-)try until something gets selected that was not selected previously. This is slow and consumes a varying amount of randomness.
(Personally, I would prefer to use a stateful deterministic pseudo-random generator object that is initialized from the seed and has Then a default way to sample k out of n elements randomly is
(i.e. use the Fisher-Yates Shuffle and stop after subset_size many rounds) If you are concerned with efficiency (I would be surprised if it was an issue and so would stick with a simple solution), then the thing to be improved is actually the way you extract randomess. Using a XOF (extensible output function) from the seed to output a long pseudo-random string may be much more efficient than repeatedly calling hash(seed,+i). |
This concerns the Whisk proposal, as defined in
consensus-specs/specs/_features/whisk/beacon-chain.md
First bug:
In select_whisk_proposer_trackers(state,epoch),
a given whisk_candidate_tracker can be selected multiple times as a proposer tracker. In fact, any given tracker is selected >1 times with probability roughly 1-1.5*exp(-0.5) \approx 10%.
Observe that as soon as the owner of a proposer tracker reveals himself for the first instance of this tracker later, any further occurrence of in the proposer tracker list provides no anonymity at all: they are the same tracker (no randomization between them) and belong to the same owner. This seriously downgrades the guarantees provided by Whisk by an unacceptable amount (compared to the security levels the parameters have been set for).
Second bug (more severe, actually):
In get_shuffle_indices(randao_reveal),
indices are sampled with replacement. As a consequence, the indices vector may contain the same number multiple times.
I did not check that curdleproofs even works for this; I'm optimistically assuming here it does (anything else would only increase the bug's severity), meaning that in
process_shuffled_trackers(state,body)
pre_shuffle_trackers and body.whisk_post_shuffle_trackers are (rerandomized) permutations of each other, respecting multiplicity.
If some index is contained N times in indices, pre_shuffle_trackers will contain N copies of the same tracker and whisk_post_shuffle_tracker will then again contain N copies of this tracker (that are unlinkable to anyone but the owner and the suffler due to rerandomization).
Observe that in
we now overwrite state.whisk_condidate_trackers[shuffle_index] multiple times in case of repeated entries in indices.
As a consequence, all but the last assignment is lost. This allows the shuffler to drop trackers from the pool and replace them with multiple-selected ones: To give an example:
Assume indices is [0,0,1], whisk_candidate_trackers = [T1,T2, ...]
Then pre_shuffle_trackers = [T1,T1,T2].
The shuffler can select post_shuffle_trackers (assume no rerandomization) as [T2,T1,T1]
Then the above loop will yield
state.whisk_candidate_trackers[0] = T2
state.whisk_candidate_trackers[0] = T1
state.whisk_candidate_trackers[1] = T1
Effectively replacing T2 by T1. Note here that the permutation is completely controlled by the shuffler, who can choose whether he does this or not, knowing whether T1 is his own trackers.
Generally speaking, for each index that is selected N times, the shuffler can choose to replace N-1 other trackers of his choice by copies of the multiply-selected index. Due to rerandomization, this attack is not obvious to detect. It also occurs for a honest shuffler; the problem is that a dishonest shuffler might do so selectively. This gives a large-stake proposer the power to both increase his own number of trackers and to selectively censor other trackers (selective censorship is only feasible at the beginning of a a Whisk epoch, where some trackers have not been shuffled yet).
With the given parameters, we expect approximately 0.5 duplicate indices per shuffle. There are approx. 2^13 shuffles per Whisk epoch. As a consequence, a, say, 25% attacker (who would normally get 2^12 out of the 2^14 candiate trackers) has
2^10 opportunities to duplicate a tracker. Naively, 1/4 of these opportunities concerns his own trackers, which means a 6% profit increase for him. If all these opportunities were helpful, it's a whopping 25% increase.
Note that in reality, the number is significantly > 6%, because
a) during each Whisk epoch, the attacker gains trackers, so later during the Epoch, his change for a duplication opportunity to be helpful increases.
b) the attacker actually knows the result of get_shuffle_indices(randao_reaveal) for its own shuffles in advance, because he knows the value of randao_reveal for his own future slots. So he known which indices are duplicatable in advance and he can shuffle his own tracker into those indices in preceding shuffles.
I strongly recommend changing both selections to selection WITHOUT replacement. This adds 2-3 extra lines of code at most.
For clarity, I would add a helper function (select k-out-of-n using random seed r) to unify the code; in fact, this would improve clarity of the code, because right now, the randomness handling (seed extraction etc.) is mixed with the actual shuffling logic, which hurts readability.
Pinging @asn-d6 (with whom this was discussed already) and @dapplion
The text was updated successfully, but these errors were encountered: