Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destruct intermediate values #763

Merged
merged 2 commits into from
Feb 11, 2025
Merged

Destruct intermediate values #763

merged 2 commits into from
Feb 11, 2025

Conversation

hanno-becker
Copy link
Contributor

@hanno-becker hanno-becker commented Feb 8, 2025

FIPS-203, Section 3.3 demands the destruction of intermediate values
before returning to the caller. This commit implements this.

A new function ct_zeroize() is introduced which is used to wipe
intermediate stack buffers prior to function return.
(NB: The constant-time'ness is not relevant here, but
conceptually the function is still best-placed in verify.h alongside
other functions which, when naively implemented, would be prone to
harmful compiler optimization -- ct_zeroize fits this category).

By default, ct_zeroize() is implemented via SecureZeroMemory on
Windows, and a plain memset + compiler barrier otherwise. If neither
makes sense, compilation fails.
Using memset_s would be preferred, but there is no portable way of
detecting whether it is available.

If users need to register a custom ct_zeroize, they can do so by
defining MLK_CT_ZEROIZE_NATIVE and defining ct_zeroize_native,
similar to how the arithmetic and FIPS-202 backends work.
This includes the case where the use does not wish to do stack
zeroization, or has an entirely separate (and more robust) method
for it, such as compiler instrumentation. In this case,
ct_zeroize_native() can be set to a no-op.

@hanno-becker hanno-becker force-pushed the zeroize branch 5 times, most recently from 2d9ffde to 3d726e2 Compare February 8, 2025 06:16
@hanno-becker hanno-becker added the benchmark this PR should be benchmarked in CI label Feb 8, 2025
Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Benchmark suite Current: 200f1da Previous: 695ed0f Ratio
ML-KEM-512 keypair 9612 cycles 9290 cycles 1.03
ML-KEM-512 encaps 11416 cycles 10792 cycles 1.06
ML-KEM-512 decaps 15285 cycles 14743 cycles 1.04
ML-KEM-768 keypair 16422 cycles 15860 cycles 1.04
ML-KEM-768 encaps 17730 cycles 17191 cycles 1.03
ML-KEM-768 decaps 23447 cycles 22916 cycles 1.02
ML-KEM-1024 keypair 22095 cycles 21454 cycles 1.03
ML-KEM-1024 encaps 24142 cycles 23380 cycles 1.03
ML-KEM-1024 decaps 31817 cycles 30800 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Intel Xeon 4th gen (c7i)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 200f1da Previous: 695ed0f Ratio
ML-KEM-512 keypair 9612 cycles 9290 cycles 1.03
ML-KEM-512 encaps 11416 cycles 10792 cycles 1.06
ML-KEM-512 decaps 15285 cycles 14743 cycles 1.04
ML-KEM-768 keypair 16422 cycles 15860 cycles 1.04
ML-KEM-768 encaps 17730 cycles 17191 cycles 1.03
ML-KEM-1024 encaps 24142 cycles 23380 cycles 1.03
ML-KEM-1024 decaps 31817 cycles 30800 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks

Benchmark suite Current: 200f1da Previous: 695ed0f Ratio
ML-KEM-512 keypair 29552 cycles 28963 cycles 1.02
ML-KEM-512 encaps 35000 cycles 34253 cycles 1.02
ML-KEM-512 decaps 45672 cycles 44726 cycles 1.02
ML-KEM-768 keypair 50275 cycles 49306 cycles 1.02
ML-KEM-768 encaps 55681 cycles 54570 cycles 1.02
ML-KEM-768 decaps 70922 cycles 69422 cycles 1.02
ML-KEM-1024 keypair 73254 cycles 71924 cycles 1.02
ML-KEM-1024 encaps 82262 cycles 80613 cycles 1.02
ML-KEM-1024 decaps 102461 cycles 100358 cycles 1.02

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Benchmark suite Current: 200f1da Previous: 695ed0f Ratio
ML-KEM-512 keypair 29331 cycles 29598 cycles 0.99
ML-KEM-512 encaps 34365 cycles 35251 cycles 0.97
ML-KEM-512 decaps 43930 cycles 45900 cycles 0.96
ML-KEM-768 keypair 47669 cycles 47201 cycles 1.01
ML-KEM-768 encaps 54893 cycles 55578 cycles 0.99
ML-KEM-768 decaps 68156 cycles 67573 cycles 1.01
ML-KEM-1024 keypair 69991 cycles 71653 cycles 0.98
ML-KEM-1024 encaps 81599 cycles 82054 cycles 0.99
ML-KEM-1024 decaps 97192 cycles 99289 cycles 0.98

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Benchmark suite Current: 200f1da Previous: 695ed0f Ratio
ML-KEM-512 keypair 17301 cycles 17029 cycles 1.02
ML-KEM-512 encaps 19063 cycles 18689 cycles 1.02
ML-KEM-512 decaps 24514 cycles 24105 cycles 1.02
ML-KEM-768 keypair 29309 cycles 28686 cycles 1.02
ML-KEM-768 encaps 30574 cycles 29786 cycles 1.03
ML-KEM-768 decaps 38404 cycles 37578 cycles 1.02
ML-KEM-1024 keypair 43129 cycles 41705 cycles 1.03
ML-KEM-1024 encaps 45007 cycles 43928 cycles 1.02
ML-KEM-1024 decaps 55999 cycles 54284 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'AMD EPYC 3rd gen (c6a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 200f1da Previous: 695ed0f Ratio
ML-KEM-1024 keypair 43129 cycles 41705 cycles 1.03
ML-KEM-1024 decaps 55999 cycles 54284 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Benchmark suite Current: 200f1da Previous: 695ed0f Ratio
ML-KEM-512 keypair 16137 cycles 15938 cycles 1.01
ML-KEM-512 encaps 18414 cycles 18016 cycles 1.02
ML-KEM-512 decaps 24958 cycles 24523 cycles 1.02
ML-KEM-768 keypair 27798 cycles 27365 cycles 1.02
ML-KEM-768 encaps 29444 cycles 28926 cycles 1.02
ML-KEM-768 decaps 38835 cycles 38397 cycles 1.01
ML-KEM-1024 keypair 38158 cycles 36965 cycles 1.03
ML-KEM-1024 encaps 40696 cycles 39909 cycles 1.02
ML-KEM-1024 decaps 53204 cycles 52429 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Intel Xeon 3rd gen (c6i)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 200f1da Previous: 695ed0f Ratio
ML-KEM-1024 keypair 38158 cycles 36965 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks

Benchmark suite Current: 200f1da Previous: 695ed0f Ratio
ML-KEM-512 keypair 59406 cycles 58132 cycles 1.02
ML-KEM-512 encaps 67051 cycles 65069 cycles 1.03
ML-KEM-512 decaps 86307 cycles 83738 cycles 1.03
ML-KEM-768 keypair 101154 cycles 98815 cycles 1.02
ML-KEM-768 encaps 112372 cycles 109560 cycles 1.03
ML-KEM-768 decaps 139977 cycles 135860 cycles 1.03
ML-KEM-1024 keypair 153654 cycles 149801 cycles 1.03
ML-KEM-1024 encaps 170792 cycles 165774 cycles 1.03
ML-KEM-1024 decaps 206998 cycles 202756 cycles 1.02

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Benchmark suite Current: 200f1da Previous: 695ed0f Ratio
ML-KEM-512 keypair 18041 cycles 17720 cycles 1.02
ML-KEM-512 encaps 21414 cycles 20978 cycles 1.02
ML-KEM-512 decaps 28172 cycles 27648 cycles 1.02
ML-KEM-768 keypair 30986 cycles 30521 cycles 1.02
ML-KEM-768 encaps 34032 cycles 33423 cycles 1.02
ML-KEM-768 decaps 43999 cycles 42976 cycles 1.02
ML-KEM-1024 keypair 44762 cycles 44141 cycles 1.01
ML-KEM-1024 encaps 50325 cycles 49443 cycles 1.02
ML-KEM-1024 decaps 63307 cycles 62368 cycles 1.02

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Benchmark suite Current: 200f1da Previous: 695ed0f Ratio
ML-KEM-512 keypair 39782 cycles 39490 cycles 1.01
ML-KEM-512 encaps 47984 cycles 47498 cycles 1.01
ML-KEM-512 decaps 62376 cycles 61772 cycles 1.01
ML-KEM-768 keypair 65104 cycles 63899 cycles 1.02
ML-KEM-768 encaps 76226 cycles 75261 cycles 1.01
ML-KEM-768 decaps 94828 cycles 93961 cycles 1.01
ML-KEM-1024 keypair 95888 cycles 95477 cycles 1.00
ML-KEM-1024 encaps 109472 cycles 108981 cycles 1.00
ML-KEM-1024 decaps 133245 cycles 132609 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Benchmark suite Current: 200f1da Previous: 695ed0f Ratio
ML-KEM-512 keypair 29561 cycles 28983 cycles 1.02
ML-KEM-512 encaps 35008 cycles 34277 cycles 1.02
ML-KEM-512 decaps 45680 cycles 44774 cycles 1.02
ML-KEM-768 keypair 50300 cycles 49319 cycles 1.02
ML-KEM-768 encaps 55686 cycles 54582 cycles 1.02
ML-KEM-768 decaps 70930 cycles 69415 cycles 1.02
ML-KEM-1024 keypair 73246 cycles 71948 cycles 1.02
ML-KEM-1024 encaps 82284 cycles 80619 cycles 1.02
ML-KEM-1024 decaps 102530 cycles 100421 cycles 1.02

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Benchmark suite Current: 200f1da Previous: 695ed0f Ratio
ML-KEM-512 keypair 46891 cycles 46455 cycles 1.01
ML-KEM-512 encaps 55193 cycles 54673 cycles 1.01
ML-KEM-512 decaps 70980 cycles 70370 cycles 1.01
ML-KEM-768 keypair 77302 cycles 76692 cycles 1.01
ML-KEM-768 encaps 87935 cycles 87298 cycles 1.01
ML-KEM-768 decaps 108417 cycles 107781 cycles 1.01
ML-KEM-1024 keypair 112600 cycles 112271 cycles 1.00
ML-KEM-1024 encaps 126774 cycles 126407 cycles 1.00
ML-KEM-1024 decaps 153054 cycles 152509 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Benchmark suite Current: 200f1da Previous: 695ed0f Ratio
ML-KEM-512 keypair 11521 cycles 11280 cycles 1.02
ML-KEM-512 encaps 13083 cycles 12848 cycles 1.02
ML-KEM-512 decaps 18065 cycles 17674 cycles 1.02
ML-KEM-768 keypair 20614 cycles 19727 cycles 1.04
ML-KEM-768 encaps 21057 cycles 20613 cycles 1.02
ML-KEM-768 decaps 28072 cycles 27686 cycles 1.01
ML-KEM-1024 keypair 27216 cycles 26296 cycles 1.03
ML-KEM-1024 encaps 29121 cycles 28198 cycles 1.03
ML-KEM-1024 decaps 38664 cycles 37875 cycles 1.02

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'AMD EPYC 4th gen (c7a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 200f1da Previous: 695ed0f Ratio
ML-KEM-768 keypair 20614 cycles 19727 cycles 1.04
ML-KEM-1024 keypair 27216 cycles 26296 cycles 1.03
ML-KEM-1024 encaps 29121 cycles 28198 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Benchmark suite Current: 200f1da Previous: 695ed0f Ratio
ML-KEM-512 keypair 35825 cycles 35497 cycles 1.01
ML-KEM-512 encaps 41114 cycles 40640 cycles 1.01
ML-KEM-512 decaps 52129 cycles 51635 cycles 1.01
ML-KEM-768 keypair 59144 cycles 58475 cycles 1.01
ML-KEM-768 encaps 66620 cycles 65240 cycles 1.02
ML-KEM-768 decaps 81277 cycles 80455 cycles 1.01
ML-KEM-1024 keypair 88888 cycles 88157 cycles 1.01
ML-KEM-1024 encaps 98791 cycles 96947 cycles 1.02
ML-KEM-1024 decaps 117721 cycles 116629 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Benchmark suite Current: 200f1da Previous: 695ed0f Ratio
ML-KEM-512 keypair 36508 cycles 36080 cycles 1.01
ML-KEM-512 encaps 42882 cycles 42318 cycles 1.01
ML-KEM-512 decaps 55883 cycles 55443 cycles 1.01
ML-KEM-768 keypair 59029 cycles 58544 cycles 1.01
ML-KEM-768 encaps 67618 cycles 67003 cycles 1.01
ML-KEM-768 decaps 84620 cycles 84120 cycles 1.01
ML-KEM-1024 keypair 87646 cycles 86563 cycles 1.01
ML-KEM-1024 encaps 98157 cycles 97250 cycles 1.01
ML-KEM-1024 decaps 119850 cycles 118853 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Benchmark suite Current: 200f1da Previous: 695ed0f Ratio
ML-KEM-512 keypair 59550 cycles 58979 cycles 1.01
ML-KEM-512 encaps 68184 cycles 67443 cycles 1.01
ML-KEM-512 decaps 86973 cycles 86004 cycles 1.01
ML-KEM-768 keypair 99537 cycles 98295 cycles 1.01
ML-KEM-768 encaps 110811 cycles 109139 cycles 1.02
ML-KEM-768 decaps 135599 cycles 133914 cycles 1.01
ML-KEM-1024 keypair 150220 cycles 147099 cycles 1.02
ML-KEM-1024 encaps 165649 cycles 162161 cycles 1.02
ML-KEM-1024 decaps 197390 cycles 193778 cycles 1.02

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Benchmark suite Current: 200f1da Previous: 695ed0f Ratio
ML-KEM-512 keypair 19187 cycles 18920 cycles 1.01
ML-KEM-512 encaps 22855 cycles 22424 cycles 1.02
ML-KEM-512 decaps 30188 cycles 29673 cycles 1.02
ML-KEM-768 keypair 32794 cycles 32318 cycles 1.01
ML-KEM-768 encaps 36417 cycles 35810 cycles 1.02
ML-KEM-768 decaps 46905 cycles 46193 cycles 1.02
ML-KEM-1024 keypair 47357 cycles 46636 cycles 1.02
ML-KEM-1024 encaps 53359 cycles 52341 cycles 1.02
ML-KEM-1024 decaps 67399 cycles 66378 cycles 1.02

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Benchmark suite Current: 200f1da Previous: 695ed0f Ratio
ML-KEM-512 keypair 38942 cycles 38687 cycles 1.01
ML-KEM-512 encaps 44917 cycles 44313 cycles 1.01
ML-KEM-512 decaps 56642 cycles 56140 cycles 1.01
ML-KEM-768 keypair 64369 cycles 63842 cycles 1.01
ML-KEM-768 encaps 72019 cycles 70974 cycles 1.01
ML-KEM-768 decaps 87781 cycles 86931 cycles 1.01
ML-KEM-1024 keypair 96204 cycles 95403 cycles 1.01
ML-KEM-1024 encaps 106718 cycles 105307 cycles 1.01
ML-KEM-1024 decaps 126817 cycles 125690 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpacemiT K1 8 (Banana Pi F3) benchmarks

Benchmark suite Current: 200f1da Previous: 695ed0f Ratio
ML-KEM-512 keypair 227194 cycles 224414 cycles 1.01
ML-KEM-512 encaps 272230 cycles 269013 cycles 1.01
ML-KEM-512 decaps 345897 cycles 342307 cycles 1.01
ML-KEM-768 keypair 374160 cycles 370362 cycles 1.01
ML-KEM-768 encaps 433097 cycles 429131 cycles 1.01
ML-KEM-768 decaps 530563 cycles 526716 cycles 1.01
ML-KEM-1024 keypair 559994 cycles 551742 cycles 1.01
ML-KEM-1024 encaps 636792 cycles 627734 cycles 1.01
ML-KEM-1024 decaps 757648 cycles 747580 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks

Benchmark suite Current: 200f1da Previous: 695ed0f Ratio
ML-KEM-512 keypair 52733 cycles 51566 cycles 1.02
ML-KEM-512 encaps 60712 cycles 59039 cycles 1.03
ML-KEM-512 decaps 77942 cycles 75037 cycles 1.04
ML-KEM-768 keypair 89628 cycles 87984 cycles 1.02
ML-KEM-768 encaps 97894 cycles 95219 cycles 1.03
ML-KEM-768 decaps 122323 cycles 118664 cycles 1.03
ML-KEM-1024 keypair 135903 cycles 132132 cycles 1.03
ML-KEM-1024 encaps 148223 cycles 144497 cycles 1.03
ML-KEM-1024 decaps 181647 cycles 177187 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

@hanno-becker hanno-becker marked this pull request as ready for review February 8, 2025 06:41
Copy link
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me. I tested that I can provide my own native implementation.

Why did you not zeroize the following functions?
poly_rej_uniform - buf
check_pk - p_reencoded
check_sk - check
indcpa_dec - b + b_cache

mlkem/config.h Outdated Show resolved Hide resolved
mlkem/verify.h Outdated Show resolved Hide resolved
mlkem/config.h Show resolved Hide resolved
@hanno-becker hanno-becker changed the title Add option for destruction of intermediate values Destruct intermediate values Feb 10, 2025
@hanno-becker hanno-becker force-pushed the zeroize branch 2 times, most recently from 0e7a2b9 to 3bc2784 Compare February 10, 2025 13:13
@hanno-becker hanno-becker added benchmark this PR should be benchmarked in CI and removed benchmark this PR should be benchmarked in CI labels Feb 10, 2025
Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A55 (Snapdragon 888) benchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 200f1da Previous: 695ed0f Ratio
ML-KEM-512 encaps 67051 cycles 65069 cycles 1.03
ML-KEM-512 decaps 86307 cycles 83738 cycles 1.03
ML-KEM-768 decaps 139977 cycles 135860 cycles 1.03
ML-KEM-1024 encaps 170792 cycles 165774 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies - I found another missed buffer in polyvec_basemul_acc_montgomery_cached_avx2.

Another question is how we handle Keccak states - should those be zeroed out as well?

@hanno-becker
Copy link
Contributor Author

@mkannwischer I checked what AWS-LC does, which should be our initial orientation. They zeroize intermediate Keccak states as well, so I added it.

@hanno-becker hanno-becker added benchmark this PR should be benchmarked in CI and removed benchmark this PR should be benchmarked in CI labels Feb 11, 2025
@hanno-becker hanno-becker force-pushed the zeroize branch 5 times, most recently from 29ad627 to 6a4cd36 Compare February 11, 2025 12:52
FIPS-203, Section 3.3 demands the destruction of intermediate values
before returning to the caller. This commit implements this.

A new function `ct_zeroize()` is introduced which is used to wipe
intermediate stack buffers prior to function return.
(NB: The constant-time'ness is not relevant here, but
conceptually the function is still best-placed in verify.h alongside
other functions which, when naively implemented, would be prone to
harmful compiler optimization -- ct_zeroize fits this category).

By default, ct_zeroize() is implemented via SecureZeroMemory on
Windows, and a plain memset + compiler barrier otherwise. If neither
makes sense, compilation fails.
Using memset_s would be preferred, but there is no portable way of
detecting whether it is available.

If users need to register a custom ct_zeroize, they can do so by
defining MLK_CT_ZEROIZE_NATIVE and defining ct_zeroize_native,
similar to how the arithmetic and FIPS-202 backends work.
This includes the case where the use does not wish to do stack
zeroization, or has an entirely separate (and more robust) method
for it, such as compiler instrumentation. In this case,
`ct_zeroize_native()` can be set to a no-op.

Signed-off-by: Hanno Becker <[email protected]>
@mkannwischer mkannwischer added benchmark this PR should be benchmarked in CI and removed benchmark this PR should be benchmarked in CI labels Feb 11, 2025
Copy link
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes!

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 200f1da Previous: 695ed0f Ratio
ML-KEM-512 decaps 77942 cycles 75037 cycles 1.04
ML-KEM-768 decaps 122323 cycles 118664 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

@hanno-becker hanno-becker merged commit 8582a06 into main Feb 11, 2025
212 checks passed
@hanno-becker hanno-becker deleted the zeroize branch February 11, 2025 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark this PR should be benchmarked in CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants