Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use randomness more efficiently #335

Merged
merged 8 commits into from
Oct 29, 2024
Merged

Conversation

jan-ferdinand
Copy link
Member

Remove the “randomized trace table,” which stored a copy of the entire trace interleaved with randomness.

Copy link
Collaborator

@aszepieniec aszepieniec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The on-the-fly derivation of column randomizers is as best as I can tell correct and secure, and the interface to get them looks exactly spot on to me.

That said, the current implementation misses out on performance benefits.

The problem is that there are still points in time where the entire randomized trace lives in RAM (yes, in the monomial coefficient basis, but that changes nothing). According to my understanding the RAM cost does drop but not substantially:

Before this PR:

  1. store the entire randomized trace throughout ($2 \cdot N \cdot \mathsf{w}$)
  2. allocate memory for a duplicate of the entire randomized trace ($2 \cdot N \cdot \mathsf{w}$)
  3. run JIT LDE for quotient calculation using that duplicate matrix and no other memory by repeatedly interpolating and evaluating to a new coset before evaluating AIR (0).

Total memory cost: $4 \cdot N \cdot \mathsf{w}$.

With this PR:

  1. store the entire unrandomized trace throughout along with seeds ($N \cdot \mathsf{w} + \epsilon \cdot \mathsf{w})$
  2. allocate memory for a duplicate of the entire randomized trace in monomial coefficient form ($2 \cdot N \cdot \mathsf{w}$)
  3. run JIT LDE for quotient calculation using a new matrix derived from the randomized trace ($2 \cdot N \cdot \mathsf{w}$)

Total memory cost: $5 \cdot N \cdot w + \epsilon \cdot \mathsf{w}$.

So I expect the memory cost to increase, not decrease.

Here are a few workflows you might want to consider instead; select one depending on your level of ambition.

Workflow 1:

  1. store the entire unrandomized trace throughout along with master seed ($N \cdot \mathsf{w} + \epsilon)$
  2. allocate a new matrix containing the entire randomized trace in monomial coefficient basis ($2 \cdot N \cdot \mathsf{w}$)
  3. use it in JIT LDE to compute quotients (0)

Total memory cost: $3 \cdot N \cdot \mathsf{w} + \epsilon$.

Workflow 2:

  1. store the entire unrandomized trace throughout along with master seed ($N \cdot \mathsf{w} + \epsilon)$
  2. allocate a new matrix containing the unrandomized trace in monomial coefficient basis ($N \cdot \mathsf{w}$)
  3. use it in JIT LDE to compute quotients (0), and note that
  • you need to manually add in terms originating from randomizers for every coset
  • you need to tweak segmentify so that it works on the unrandomized instead of randomized domain

Total memory cost: $2 \cdot N \cdot \mathsf{w} + \epsilon$.

Workflow 3:

  • Same as workflow 2 but the trace stored throughout is the same as the matrix used in the course of JIT LDE

Total cost: $N \cdot w + \epsilon$.

triton-vm/src/stark.rs Outdated Show resolved Hide resolved
triton-vm/src/table/master_table.rs Outdated Show resolved Hide resolved
triton-vm/src/table/master_table.rs Outdated Show resolved Hide resolved
@jan-ferdinand
Copy link
Member Author

jan-ferdinand commented Oct 9, 2024

My measurements of proving the program fibonacci with input 10000 indicate that your remarks regarding growth of the memory requirements in the quotienting step are accurate. However, they are offset by savings in other places, making the suggested changes zero-sum with respect to memory consumption. Note that the total numbers are rounded very liberally, while the percentages provide higher precision.

with explicit randomizers (old)

operation abs [GB] relative
total 3.2 100.0%
compute_quotient_segments 1.5 46.7%
MainTable::new 0.8 24.8%
MainTable::extend 0.5 17.2%

without explicit randomizers (new)

operation abs [GB] relative
total 3.2 100.0%
compute_quotient_segments 1.8 57.0%
MainTable::new 0.4 12.4%
MainTable::extend 0.3 8.6%

That said, it is worth some additional effort to bring compute_quotient_segments back down again, netting a win.

@jan-ferdinand jan-ferdinand force-pushed the remove_explicit_rand_trace branch 3 times, most recently from 954bb27 to 30968d2 Compare October 14, 2024 18:10
@jan-ferdinand
Copy link
Member Author

jan-ferdinand commented Oct 14, 2024

With the most recent changes, we now have:

basically “Workflow 2” (new new)

operation abs [GB] relative
total 2.5 100.0%
compute_quotient_segments 1.4 53.6%
MainTable::new 0.4 15.7%
MainTable::extend 0.3 10.9%

The total has dropped by 0.7 GB (~20%). The difference in compute_quotient_segments is a measurable 0.4GB (~12%) but does not account for the entire delta, which I find surprising. It could be that the call graph is somewhat obscured due to parallelization, which is now used slightly differently.

Copy link
Collaborator

@aszepieniec aszepieniec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still some up-for-grabs memory savers, although I am not sure how they compare to the memory costs of other steps in the pipeline. More comments inline.

triton-vm/src/stark.rs Outdated Show resolved Hide resolved
triton-vm/src/stark.rs Outdated Show resolved Hide resolved
triton-vm/src/stark.rs Outdated Show resolved Hide resolved
triton-vm/src/stark.rs Outdated Show resolved Hide resolved
triton-vm/src/stark.rs Outdated Show resolved Hide resolved
triton-vm/src/stark.rs Show resolved Hide resolved
triton-vm/src/stark.rs Outdated Show resolved Hide resolved
triton-vm/src/stark.rs Outdated Show resolved Hide resolved
triton-vm/src/table/master_table.rs Show resolved Hide resolved
triton-vm/src/table/master_table.rs Outdated Show resolved Hide resolved
@jan-ferdinand jan-ferdinand force-pushed the remove_explicit_rand_trace branch 2 times, most recently from f1f7375 to 2a875a9 Compare October 16, 2024 09:51
@jan-ferdinand
Copy link
Member Author

jan-ferdinand commented Oct 26, 2024

With the most recent changes, we now have:

basically “Workflow 3” (new new new)

operation abs [GB] relative
total 1.8 100.0%
compute_quotient_segments 0.8 44.9%
MainTable::new 0.4 21.6%
MainTable::extend 0.3 15.1%

The total has dropped by another 0.7 GB (~28%), all of which seems to be from savings in compute_quotient_segments.

Additionally, my benchmarks indicate that runtime performance has improved a tad.

Copy link
Collaborator

@aszepieniec aszepieniec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall: lgtm. Some minor comments and change requests inline.

No need to bounce back though; I trust that your response to the requested changes will be excellent.

One final thing: do tell please, what's the magnitude of tad?

Cargo.toml Outdated Show resolved Hide resolved
triton-vm/src/stark.rs Outdated Show resolved Hide resolved
triton-vm/src/stark.rs Outdated Show resolved Hide resolved
triton-vm/src/stark.rs Outdated Show resolved Hide resolved
triton-vm/src/stark.rs Outdated Show resolved Hide resolved
triton-vm/src/table/master_table.rs Outdated Show resolved Hide resolved
triton-vm/src/table/master_table.rs Outdated Show resolved Hide resolved
triton-vm/src/table/master_table.rs Show resolved Hide resolved
jan-ferdinand and others added 7 commits October 28, 2024 08:53
In particular, the memory-vs-compute-time trade-off can now be tuned via
constant `RANDOMIZED_TRACE_LEN_TO_WORKING_DOMAIN_LEN_RATIO`, which is
hardcoded for now.

Co-authored-by: Alan <[email protected]>
@jan-ferdinand
Copy link
Member Author

do tell please, what's the magnitude of tad?

The runtime performance gain of the memory efficient path that comes with this PR1 is 15%. 🎉

Profile at branch's base


Prove Fibonacci 10000   time:   [10.097 s 10.112 s 10.128 s]

### Prove Fibonacci 10000 111.46s #Reps Share Category ├─Fiat-Shamir: claim 100.63µs 11 0.00% (hash – 0.00%) ├─derive additional parameters 130.27µs 11 0.00% ├─main tables 33.10s 11 29.70% │ ├─create 3.61s 11 3.24% (gen – 73.94%) │ ├─pad 517.12ms 11 0.46% (gen – 10.58%) │ │ ├─pad original tables 256.78ms 11 0.23% │ │ └─fill degree-lowering table 260.23ms 11 0.23% │ ├─randomize trace 143.19ms 11 0.13% (gen – 2.93%) │ ├─LDE 12.18µs 11 0.00% (LDE – 0.00%) │ ├─Merkle tree 28.33s 11 25.42% (hash – 46.16%) │ │ ├─leafs 27.63s 11 24.78% │ │ └─Merkle tree 702.06ms 11 0.63% │ ├─Fiat-Shamir 340.85µs 11 0.00% (hash – 0.00%) │ └─extend 493.85ms 11 0.44% (gen – 10.11%) │ ├─initialize master table 253.06ms 11 0.23% │ ├─slice master table 95.73µs 11 0.00% │ ├─all tables 120.70ms 11 0.11% │ └─fill degree lowering table 119.82ms 11 0.11% ├─aux tables 32.00s 11 28.71% │ ├─randomize trace 119.03ms 11 0.11% (gen – 2.44%) │ ├─LDE 12.79µs 11 0.00% (LDE – 0.00%) │ ├─Merkle tree 31.88s 11 28.61% (hash – 51.95%) │ │ ├─leafs 31.22s 11 28.01% │ │ └─Merkle tree 660.74ms 11 0.59% │ └─Fiat-Shamir 1.72ms 11 0.00% (hash – 0.00%) ├─quotient calculation (just-in-time) 30.65s 11 27.50% │ ├─zero-initialization 592.32ms 11 0.53% │ ├─calculate quotients 26.37s 11 23.66% │ │ ├─LDE 18.16s 44 16.29% (LDE – 100.00%) │ │ └─AIR evaluation 8.22s 44 7.37% (AIR – 100.00%) │ │ ├─zerofier inverse 1.24s 44 1.11% │ │ └─evaluate AIR, compute quotient codeword 6.96s 44 6.24% │ └─segmentify 2.67s 11 2.40% ├─hash rows of quotient segments 497.55ms 11 0.45% (hash – 0.81%) ├─Merkle tree 657.56ms 11 0.59% (hash – 1.07%) ├─out-of-domain rows 2.75s 11 2.46% ├─Fiat-Shamir 1.00ms 11 0.00% (hash – 0.00%) ├─linear combination 2.81s 11 2.52% │ ├─main 133.07ms 11 0.12% (CC – 8.01%) │ ├─aux 91.31ms 11 0.08% (CC – 5.50%) │ └─quotient 1.23s 11 1.11% (CC – 74.27%) ├─DEEP 1.12s 11 1.01% │ ├─main&aux curr row 358.60ms 11 0.32% │ ├─main&aux next row 381.28ms 11 0.34% │ └─segmented quotient 382.62ms 11 0.34% ├─combined DEEP polynomial 203.16ms 11 0.18% │ └─sum 203.12ms 11 0.18% (CC – 12.23%) ├─FRI 2.30s 11 2.06% └─open trace leafs 3.83s 11 3.43%
### Categories hash 61.37s 55.06% LDE 18.16s 16.29% AIR 8.22s 7.37% gen 4.89s 4.38% CC 1.66s 1.49%
Clock frequency is 9869 Hz (100009 clock cycles / (111459 ms / 11 iterations)) Optimal clock frequency is 12935 Hz (131072 padded height / (111459 ms / 11 iterations)) FRI domain length is 2^20

Profile at branch's tip


Prove Fibonacci 10000   time:   [8.5598 s 8.5781 s 8.5966 s]
                        change: [-15.388% -15.170% -14.928%] (p = 0.00 < 0.05)
                        Performance has improved.

### Prove Fibonacci 10000 94.60s #Reps Share Category ├─Fiat-Shamir: claim 80.16µs 11 0.00% (hash – 0.00%) ├─derive additional parameters 125.57µs 11 0.00% ├─main tables 29.07s 11 30.73% │ ├─create 451.81ms 11 0.48% (gen – 36.97%) │ ├─pad 435.43ms 11 0.46% (gen – 35.63%) │ │ ├─pad original tables 240.52ms 11 0.25% │ │ └─fill degree-lowering table 194.80ms 11 0.21% │ ├─Merkle tree 27.85s 11 29.44% │ │ ├─leafs 27.14s 11 28.69% │ │ │ ├─LDE 14.96s 33 15.81% (LDE – 41.64%) │ │ │ └─hash rows 5.45s 33 5.76% (hash – 43.73%) │ │ └─Merkle tree 700.77ms 11 0.74% (hash – 5.63%) │ ├─Fiat-Shamir 336.35µs 11 0.00% (hash – 0.00%) │ └─extend 335.01ms 11 0.35% (gen – 27.41%) │ ├─initialize master table 150.29ms 11 0.16% │ ├─slice master table 62.92µs 11 0.00% │ ├─all tables 97.81ms 11 0.10% │ └─fill degree lowering table 86.70ms 11 0.09% ├─aux tables 31.09s 11 32.87% │ ├─Merkle tree 31.09s 11 32.86% │ │ ├─leafs 30.42s 11 32.16% │ │ │ ├─LDE 9.61s 11 10.15% (LDE – 26.74%) │ │ │ └─hash rows 4.51s 11 4.77% (hash – 36.21%) │ │ └─Merkle tree 665.64ms 11 0.70% (hash – 5.35%) │ └─Fiat-Shamir 1.83ms 11 0.00% (hash – 0.01%) ├─quotient calculation (just-in-time) 22.62s 11 23.92% │ ├─zero-initialization 332.69ms 11 0.35% │ ├─fetch trace randomizers 3.97ms 11 0.00% │ ├─poly interpolate 696.45ms 11 0.74% (LDE – 1.94%) │ ├─calculate quotients 18.27s 11 19.32% │ │ ├─poly evaluate 4.71s 88 4.98% (LDE – 13.11%) │ │ ├─trace randomizers 5.28s 88 5.58% (LDE – 14.69%) │ │ └─AIR evaluation 8.28s 88 8.76% (AIR – 100.00%) │ │ ├─zerofier inverse 1.30s 88 1.37% │ │ └─evaluate AIR, compute quotient codeword 6.96s 88 7.36% │ ├─segmentify 2.12s 11 2.24% │ └─restore original trace 676.69ms 11 0.72% (LDE – 1.88%) ├─hash rows of quotient segments 469.03ms 11 0.50% (hash – 3.77%) ├─Merkle tree 659.00ms 11 0.70% (hash – 5.29%) ├─out-of-domain rows 1.38s 11 1.46% ├─Fiat-Shamir 1.01ms 11 0.00% (hash – 0.01%) ├─linear combination 2.69s 11 2.84% │ ├─main 220.19ms 11 0.23% (CC – 12.08%) │ ├─aux 193.60ms 11 0.20% (CC – 10.62%) │ └─quotient 1.19s 11 1.26% (CC – 65.39%) ├─DEEP 1.11s 11 1.18% │ ├─main&aux curr row 354.48ms 11 0.37% │ ├─main&aux next row 369.62ms 11 0.39% │ └─segmented quotient 387.42ms 11 0.41% ├─combined DEEP polynomial 217.43ms 11 0.23% │ └─sum 217.39ms 11 0.23% (CC – 11.92%) ├─FRI 2.34s 11 2.47% └─open trace leafs 1.91s 11 2.02% └─recompute rows 1.91s 22 2.02%
### Categories LDE 35.93s 37.98% hash 12.45s 13.16% AIR 8.28s 8.76% CC 1.82s 1.93% gen 1.22s 1.29%
Clock frequency is 11629 Hz (100009 clock cycles / (94596 ms / 11 iterations)) Optimal clock frequency is 15241 Hz (131072 padded height / (94596 ms / 11 iterations)) FRI domain length is 2^20

A substantial amount of the savings are from witness generation. This makes sense, as the tables held in memory are now half as large, and we now only generate exactly as much randomness as required for ZK.

Note that the large shift away from the category “hash” is due to now-correct annotations.

Footnotes

  1. i.e., not exactly the “tad” from above, which was talking about the performance improvements of the latest commit, which are less relevant in the grand scheme of this PR

@jan-ferdinand
Copy link
Member Author

The runtime performance gain of the caching path that comes with this PR is 10%. 🎉

Profile at branch's base

Prove Fibonacci 10000   time:   [6.5011 s 6.5121 s 6.5221 s]

### Prove Fibonacci 10000 72.05s #Reps Share Category ├─Fiat-Shamir: claim 111.03µs 11 0.00% (hash – 0.00%) ├─derive additional parameters 121.68µs 11 0.00% ├─main tables 27.90s 11 38.72% │ ├─create 3.53s 11 4.90% (gen – 73.90%) │ ├─pad 489.70ms 11 0.68% (gen – 10.25%) │ │ ├─pad original tables 237.87ms 11 0.33% │ │ └─fill degree-lowering table 251.72ms 11 0.35% │ ├─randomize trace 141.57ms 11 0.20% (gen – 2.96%) │ ├─LDE 15.69s 11 21.78% (LDE – 51.19%) │ │ ├─polynomial zero-initialization 35.95µs 11 0.00% │ │ ├─interpolation 970.66ms 11 1.35% │ │ ├─resize 1.26s 11 1.75% │ │ ├─evaluation 13.29s 11 18.45% │ │ └─memoize 8.03µs 11 0.00% │ ├─Merkle tree 7.54s 11 10.47% (hash – 50.06%) │ │ ├─leafs 6.87s 11 9.53% │ │ └─Merkle tree 669.20ms 11 0.93% │ ├─Fiat-Shamir 334.26µs 11 0.00% (hash – 0.00%) │ └─extend 499.51ms 11 0.69% (gen – 10.45%) │ ├─initialize master table 256.70ms 11 0.36% │ ├─slice master table 65.68µs 11 0.00% │ ├─all tables 120.33ms 11 0.17% │ └─fill degree lowering table 122.27ms 11 0.17% ├─aux tables 17.84s 11 24.76% │ ├─randomize trace 116.29ms 11 0.16% (gen – 2.43%) │ ├─LDE 11.36s 11 15.77% (LDE – 37.07%) │ │ ├─polynomial zero-initialization 21.19µs 11 0.00% │ │ ├─interpolation 1.15s 11 1.59% │ │ ├─resize 880.49ms 11 1.22% │ │ ├─evaluation 9.34s 11 12.96% │ │ └─memoize 9.10µs 11 0.00% │ ├─Merkle tree 6.36s 11 8.83% (hash – 42.23%) │ │ ├─leafs 5.69s 11 7.90% │ │ └─Merkle tree 667.54ms 11 0.93% │ └─Fiat-Shamir 1.64ms 11 0.00% (hash – 0.01%) ├─quotient calculation (cached) 6.88s 11 9.54% (CC – 80.43%) │ ├─zerofier inverse 992.21ms 11 1.38% │ └─evaluate AIR, compute quotient codeword 5.88s 11 8.17% ├─quotient LDE 3.60s 11 4.99% (LDE – 11.74%) ├─hash rows of quotient segments 493.16ms 11 0.68% (hash – 3.27%) ├─Merkle tree 664.39ms 11 0.92% (hash – 4.41%) ├─out-of-domain rows 2.74s 11 3.80% ├─Fiat-Shamir 1.02ms 11 0.00% (hash – 0.01%) ├─linear combination 2.83s 11 3.92% │ ├─main 134.71ms 11 0.19% (CC – 1.58%) │ ├─aux 93.65ms 11 0.13% (CC – 1.10%) │ └─quotient 1.24s 11 1.72% (CC – 14.50%) ├─DEEP 1.14s 11 1.58% │ ├─main&aux curr row 360.76ms 11 0.50% │ ├─main&aux next row 384.33ms 11 0.53% │ └─segmented quotient 390.91ms 11 0.54% ├─combined DEEP polynomial 205.40ms 11 0.29% │ └─sum 205.36ms 11 0.29% (CC – 2.40%) ├─FRI 2.40s 11 3.33% └─open trace leafs 7.62ms 11 0.01%
### Categories LDE 30.65s 42.55% hash 15.06s 20.90% CC 8.55s 11.87% gen 4.78s 6.63%
Clock frequency is 15269 Hz (100009 clock cycles / (72046 ms / 11 iterations)) Optimal clock frequency is 20012 Hz (131072 padded height / (72046 ms / 11 iterations)) FRI domain length is 2^20

Profile at branch's tip

Prove Fibonacci 10000   time:   [5.8572 s 5.8721 s 5.8835 s]
                        change: [-10.083% -9.8281% -9.5672%] (p = 0.00 < 0.05)
                        Performance has improved.

### Prove Fibonacci 10000 64.80s #Reps Share Category ├─Fiat-Shamir: claim 117.95µs 11 0.00% (hash – 0.00%) ├─derive additional parameters 136.39µs 11 0.00% ├─main tables 23.77s 11 36.68% │ ├─create 443.93ms 11 0.69% (gen – 36.74%) │ ├─pad 437.82ms 11 0.68% (gen – 36.24%) │ │ ├─pad original tables 238.96ms 11 0.37% │ │ └─fill degree-lowering table 198.74ms 11 0.31% │ ├─LDE 15.05s 11 23.23% (LDE – 51.17%) │ │ ├─polynomial zero-initialization 25.70µs 11 0.00% │ │ ├─interpolation 637.42ms 11 0.98% │ │ ├─resize 1.25s 11 1.94% │ │ ├─evaluation 13.16s 11 20.31% │ │ └─memoize 8.25µs 11 0.00% │ ├─Merkle tree 7.51s 11 11.58% │ │ ├─leafs 6.84s 11 10.56% │ │ │ └─hash rows 6.84s 11 10.56% (hash – 45.82%) │ │ └─Merkle tree 661.81ms 11 1.02% (hash – 4.43%) │ ├─Fiat-Shamir 351.57µs 11 0.00% (hash – 0.00%) │ └─extend 326.39ms 11 0.50% (gen – 27.02%) │ ├─initialize master table 148.76ms 11 0.23% │ ├─slice master table 61.77µs 11 0.00% │ ├─all tables 93.12ms 11 0.14% │ └─fill degree lowering table 84.30ms 11 0.13% ├─aux tables 16.89s 11 26.07% │ ├─LDE 10.61s 11 16.37% (LDE – 36.06%) │ │ ├─polynomial zero-initialization 18.20µs 11 0.00% │ │ ├─interpolation 513.83ms 11 0.79% │ │ ├─resize 879.87ms 11 1.36% │ │ ├─evaluation 9.21s 11 14.22% │ │ └─memoize 7.91µs 11 0.00% │ ├─Merkle tree 6.28s 11 9.69% │ │ ├─leafs 5.61s 11 8.66% │ │ │ └─hash rows 5.61s 11 8.66% (hash – 37.59%) │ │ └─Merkle tree 665.64ms 11 1.03% (hash – 4.46%) │ └─Fiat-Shamir 1.70ms 11 0.00% (hash – 0.01%) ├─quotient calculation (cached) 6.80s 11 10.50% (CC – 79.12%) │ ├─zerofier inverse 987.79ms 11 1.52% │ └─evaluate AIR, compute quotient codeword 5.81s 11 8.97% ├─quotient LDE 3.76s 11 5.80% (LDE – 12.77%) ├─hash rows of quotient segments 495.65ms 11 0.76% (hash – 3.32%) ├─Merkle tree 650.83ms 11 1.00% (hash – 4.36%) ├─out-of-domain rows 1.31s 11 2.02% ├─Fiat-Shamir 1.00ms 11 0.00% (hash – 0.01%) ├─linear combination 2.73s 11 4.22% │ ├─main 209.11ms 11 0.32% (CC – 2.43%) │ ├─aux 184.97ms 11 0.29% (CC – 2.15%) │ └─quotient 1.20s 11 1.85% (CC – 13.93%) ├─DEEP 1.09s 11 1.69% │ ├─main&aux curr row 350.01ms 11 0.54% │ ├─main&aux next row 370.70ms 11 0.57% │ └─segmented quotient 372.15ms 11 0.57% ├─combined DEEP polynomial 202.92ms 11 0.31% │ └─sum 202.88ms 11 0.31% (CC – 2.36%) ├─FRI 2.25s 11 3.47% └─open trace leafs 7.42ms 11 0.01%
### Categories LDE 29.41s 45.39% hash 14.93s 23.04% CC 8.60s 13.26% gen 1.21s 1.86%
Clock frequency is 16976 Hz (100009 clock cycles / (64800 ms / 11 iterations)) Optimal clock frequency is 22249 Hz (131072 padded height / (64800 ms / 11 iterations)) FRI domain length is 2^20

Transform codewords into polynomials in monomial coefficient form
in-place, reducing maximum RAM consumption.

Also, clear unusable caches to decrease RAM usage even further.
@jan-ferdinand jan-ferdinand merged commit 833244d into master Oct 29, 2024
4 checks passed
@jan-ferdinand jan-ferdinand deleted the remove_explicit_rand_trace branch October 29, 2024 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants