You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Verification of the ecc_montgomerymultiplier submodule.
Expected Behaviour
a,b are inputs and p is prime number.
Montgomery multiplication does the computation of the following output using an efficient algorithm.
output = a.b.R^-1 mod p
where R^-1 = R invmod p
where R = 2^((ceil(bits(p)/RADIX)+1)*RADIX) (RADIX is word size)
n_prime_i = (-invmod(p,2^RADIX))%2^RADIX
Reference: Optimized Algorithms and Architectures for Montgomery Multiplication for Post-Quantum Cryptography Rami El Khatib. section 2.1, 4.2
Constants:
Example: RADIX = 4 and bits in p is 32 then for choosen p
i.e p = 32'hFFFFFF47, then R^-1 = 28'h0E9DC04E , n_prime_i=4'h9
for RADIX = 4 and bits in p is 16 then
p = 16'hFFF1, R^-1 = 16'hFEE0, n_prime_i = 4'hF
The latency:
The rtl design produces correct output exactly after 3*S_num cycles, where S_num=$ceil(bits(p) / RADIX)+1;
for 384 bit and RADIX 32 it is 39 cycles
for 32 bit and RADIX 4 it is 27 cycles
for 16 bit and RADIX 4 it is 15 cycles
Received Counterexample
Assumptions from the FIOS algorithm:
a < p
b < p
Additional information:
p_internal is the register which stores the computed value from the subcomponents.
CEX:
The first counter example is observed when design is checked on REG_SIZE = 16. RADIX = 4. When an overflow occurs or to say that p_internal > n_i(prime) && p_internal[REG_SIZE+1]==1 then the expected output (p_o) should have the p_subtracted_internal register value.
i.e p_subtracted_internal = p_internal - n_i. whereas the resulted output differs this.
The second counter example is observed when design is checked on REG_SIZE = 16. RADIX = 4. When no overflow occurs but still the result greater than prime or to say that p_internal > n_i(prime) and p_internal[REG_SIZE+1]!=1 then the expected output (p_o) should have the p_subtracted_internal register value.
i.e p_subtracted_internal = p_internal - n_i. whereas the resulted output takes p_internal register value.
Solution
Line 295 of "ecc_montgomerymultiplier.sv" should be modified as follows:
current design:
- always_comb sub_b_o[t1] = sub_res[t1][RADIX];
modified design:
+ always_ff @(posedge clk or negedge reset_n) begin
+ if (~reset_n)
+ sub_b_o[t1] <= '0;
+ else if (zeroize)
+ sub_b_o[t1] <= '0;
+ else
+ sub_b_o[t1] <= sub_res[t1][RADIX];
+ end
The text was updated successfully, but these errors were encountered:
Description
Verification of the ecc_montgomerymultiplier submodule.
Expected Behaviour
a,b are inputs and p is prime number.
Montgomery multiplication does the computation of the following output using an efficient algorithm.
output = a.b.R^-1 mod p
where R^-1 = R invmod p
where R = 2^((ceil(bits(p)/RADIX)+1)*RADIX) (RADIX is word size)
n_prime_i = (-invmod(p,2^RADIX))%2^RADIX
Reference: Optimized Algorithms and Architectures for Montgomery Multiplication for Post-Quantum Cryptography Rami El Khatib. section 2.1, 4.2
Constants:
Example: RADIX = 4 and bits in p is 32 then for choosen p
i.e p = 32'hFFFFFF47, then R^-1 = 28'h0E9DC04E , n_prime_i=4'h9
for RADIX = 4 and bits in p is 16 then
p = 16'hFFF1, R^-1 = 16'hFEE0, n_prime_i = 4'hF
The latency:
The rtl design produces correct output exactly after 3*S_num cycles, where S_num=$ceil(bits(p) / RADIX)+1;
for 384 bit and RADIX 32 it is 39 cycles
for 32 bit and RADIX 4 it is 27 cycles
for 16 bit and RADIX 4 it is 15 cycles
Received Counterexample
Assumptions from the FIOS algorithm:
a < p
b < p
Additional information:
p_internal is the register which stores the computed value from the subcomponents.
CEX:
i.e p_subtracted_internal = p_internal - n_i. whereas the resulted output differs this.
i.e p_subtracted_internal = p_internal - n_i. whereas the resulted output takes p_internal register value.
Solution
Line 295 of "ecc_montgomerymultiplier.sv" should be modified as follows:
The text was updated successfully, but these errors were encountered: