Use randomness more efficiently #335

jan-ferdinand · 2024-10-08T14:28:30Z

Remove the “randomized trace table,” which stored a copy of the entire trace interleaved with randomness.

triton-vm/src/table/master_table.rs

aszepieniec

The on-the-fly derivation of column randomizers is as best as I can tell correct and secure, and the interface to get them looks exactly spot on to me.

That said, the current implementation misses out on performance benefits.

The problem is that there are still points in time where the entire randomized trace lives in RAM (yes, in the monomial coefficient basis, but that changes nothing). According to my understanding the RAM cost does drop but not substantially:

Before this PR:

store the entire randomized trace throughout ($2 \cdot N \cdot \mathsf{w}$)
allocate memory for a duplicate of the entire randomized trace ($2 \cdot N \cdot \mathsf{w}$)
run JIT LDE for quotient calculation using that duplicate matrix and no other memory by repeatedly interpolating and evaluating to a new coset before evaluating AIR (0).

Total memory cost: $4 \cdot N \cdot \mathsf{w}$.

With this PR:

store the entire unrandomized trace throughout along with seeds ($N \cdot \mathsf{w} + \epsilon \cdot \mathsf{w})$
allocate memory for a duplicate of the entire randomized trace in monomial coefficient form ($2 \cdot N \cdot \mathsf{w}$)
run JIT LDE for quotient calculation using a new matrix derived from the randomized trace ($2 \cdot N \cdot \mathsf{w}$)

Total memory cost: $5 \cdot N \cdot w + \epsilon \cdot \mathsf{w}$.

So I expect the memory cost to increase, not decrease.

Here are a few workflows you might want to consider instead; select one depending on your level of ambition.

Workflow 1:

store the entire unrandomized trace throughout along with master seed ($N \cdot \mathsf{w} + \epsilon)$
allocate a new matrix containing the entire randomized trace in monomial coefficient basis ($2 \cdot N \cdot \mathsf{w}$)
use it in JIT LDE to compute quotients (0)

Total memory cost: $3 \cdot N \cdot \mathsf{w} + \epsilon$.

Workflow 2:

store the entire unrandomized trace throughout along with master seed ($N \cdot \mathsf{w} + \epsilon)$
allocate a new matrix containing the unrandomized trace in monomial coefficient basis ($N \cdot \mathsf{w}$)
use it in JIT LDE to compute quotients (0), and note that

you need to manually add in terms originating from randomizers for every coset
you need to tweak segmentify so that it works on the unrandomized instead of randomized domain

Total memory cost: $2 \cdot N \cdot \mathsf{w} + \epsilon$.

Workflow 3:

Same as workflow 2 but the trace stored throughout is the same as the matrix used in the course of JIT LDE

Total cost: $N \cdot w + \epsilon$.

triton-vm/src/stark.rs

triton-vm/src/table/master_table.rs

jan-ferdinand · 2024-10-09T09:05:05Z

My measurements of proving the program fibonacci with input 10000 indicate that your remarks regarding growth of the memory requirements in the quotienting step are accurate. However, they are offset by savings in other places, making the suggested changes zero-sum with respect to memory consumption. Note that the total numbers are rounded very liberally, while the percentages provide higher precision.

with explicit randomizers (old)

operation	abs [GB]	relative
total	3.2	100.0%
compute_quotient_segments	1.5	46.7%
MainTable::new	0.8	24.8%
MainTable::extend	0.5	17.2%

without explicit randomizers (new)

operation	abs [GB]	relative
total	3.2	100.0%
compute_quotient_segments	1.8	57.0%
MainTable::new	0.4	12.4%
MainTable::extend	0.3	8.6%

That said, it is worth some additional effort to bring compute_quotient_segments back down again, netting a win.

jan-ferdinand · 2024-10-14T22:33:11Z

With the most recent changes, we now have:

basically “Workflow 2” (new new)

operation	abs [GB]	relative
total	2.5	100.0%
compute_quotient_segments	1.4	53.6%
MainTable::new	0.4	15.7%
MainTable::extend	0.3	10.9%

The total has dropped by 0.7 GB (~20%). The difference in compute_quotient_segments is a measurable 0.4GB (~12%) but does not account for the entire delta, which I find surprising. It could be that the call graph is somewhat obscured due to parallelization, which is now used slightly differently.

aszepieniec

There are still some up-for-grabs memory savers, although I am not sure how they compare to the memory costs of other steps in the pipeline. More comments inline.

triton-vm/src/stark.rs

triton-vm/src/table/master_table.rs

jan-ferdinand · 2024-10-26T12:46:31Z

With the most recent changes, we now have:

basically “Workflow 3” (new new new)

operation	abs [GB]	relative
total	1.8	100.0%
compute_quotient_segments	0.8	44.9%
MainTable::new	0.4	21.6%
MainTable::extend	0.3	15.1%

The total has dropped by another 0.7 GB (~28%), all of which seems to be from savings in compute_quotient_segments.

Additionally, my benchmarks indicate that runtime performance has improved a tad.

aszepieniec

Overall: lgtm. Some minor comments and change requests inline.

No need to bounce back though; I trust that your response to the requested changes will be excellent.

One final thing: do tell please, what's the magnitude of tad?

Cargo.toml

triton-vm/src/stark.rs

triton-vm/src/table/master_table.rs

changelog: ignore

In particular, the memory-vs-compute-time trade-off can now be tuned via constant `RANDOMIZED_TRACE_LEN_TO_WORKING_DOMAIN_LEN_RATIO`, which is hardcoded for now. Co-authored-by: Alan <[email protected]>

jan-ferdinand · 2024-10-28T08:38:08Z

do tell please, what's the magnitude of tad?

The runtime performance gain of the memory efficient path that comes with this PR¹ is 15%. 🎉

Profile at branch's base


Prove Fibonacci 10000   time:   [10.097 s 10.112 s 10.128 s]


### Prove Fibonacci 10000                         111.46s    #Reps   Share  Category                                                   
├─Fiat-Shamir: claim                              100.63µs      11   0.00%  (hash –  0.00%)                                            
├─derive additional parameters                    130.27µs      11   0.00%                                                             
├─main tables                                      33.10s       11  29.70%                                                             
│ ├─create                                          3.61s       11   3.24%  (gen  – 73.94%)                                            
│ ├─pad                                           517.12ms      11   0.46%  (gen  – 10.58%)                                            
│ │ ├─pad original tables                         256.78ms      11   0.23%                                                             
│ │ └─fill degree-lowering table                  260.23ms      11   0.23%                                                             
│ ├─randomize trace                               143.19ms      11   0.13%  (gen  –  2.93%)                                            
│ ├─LDE                                            12.18µs      11   0.00%  (LDE  –  0.00%)                                            
│ ├─Merkle tree                                    28.33s       11  25.42%  (hash – 46.16%)                                            
│ │ ├─leafs                                        27.63s       11  24.78%                                                             
│ │ └─Merkle tree                                 702.06ms      11   0.63%                                                             
│ ├─Fiat-Shamir                                   340.85µs      11   0.00%  (hash –  0.00%)                                            
│ └─extend                                        493.85ms      11   0.44%  (gen  – 10.11%)                                            
│   ├─initialize master table                     253.06ms      11   0.23%                                                             
│   ├─slice master table                           95.73µs      11   0.00%                                                             
│   ├─all tables                                  120.70ms      11   0.11%                                                             
│   └─fill degree lowering table                  119.82ms      11   0.11%                                                             
├─aux tables                                       32.00s       11  28.71%                                                             
│ ├─randomize trace                               119.03ms      11   0.11%  (gen  –  2.44%)                                            
│ ├─LDE                                            12.79µs      11   0.00%  (LDE  –  0.00%)                                            
│ ├─Merkle tree                                    31.88s       11  28.61%  (hash – 51.95%)                                            
│ │ ├─leafs                                        31.22s       11  28.01%                                                             
│ │ └─Merkle tree                                 660.74ms      11   0.59%                                                             
│ └─Fiat-Shamir                                     1.72ms      11   0.00%  (hash –  0.00%)                                            
├─quotient calculation (just-in-time)              30.65s       11  27.50%                                                             
│ ├─zero-initialization                           592.32ms      11   0.53%                                                             
│ ├─calculate quotients                            26.37s       11  23.66%                                                             
│ │ ├─LDE                                          18.16s       44  16.29%  (LDE  – 100.00%)                                           
│ │ └─AIR evaluation                                8.22s       44   7.37%  (AIR  – 100.00%)                                           
│ │   ├─zerofier inverse                            1.24s       44   1.11%                                                             
│ │   └─evaluate AIR, compute quotient codeword     6.96s       44   6.24%                                                             
│ └─segmentify                                      2.67s       11   2.40%                                                             
├─hash rows of quotient segments                  497.55ms      11   0.45%  (hash –  0.81%)                                            
├─Merkle tree                                     657.56ms      11   0.59%  (hash –  1.07%)                                            
├─out-of-domain rows                                2.75s       11   2.46%  
├─Fiat-Shamir                                       1.00ms      11   0.00%  (hash –  0.00%)
├─linear combination                                2.81s       11   2.52%  
│ ├─main                                          133.07ms      11   0.12%  (CC   –  8.01%)
│ ├─aux                                            91.31ms      11   0.08%  (CC   –  5.50%)
│ └─quotient                                        1.23s       11   1.11%  (CC   – 74.27%)
├─DEEP                                              1.12s       11   1.01%  
│ ├─main&aux curr row                             358.60ms      11   0.32%  
│ ├─main&aux next row                             381.28ms      11   0.34%  
│ └─segmented quotient                            382.62ms      11   0.34%  
├─combined DEEP polynomial                        203.16ms      11   0.18%  
│ └─sum                                           203.12ms      11   0.18%  (CC   – 12.23%)
├─FRI                                               2.30s       11   2.06%  
└─open trace leafs                                  3.83s       11   3.43%  


### Categories
hash    61.37s  55.06%
LDE     18.16s  16.29%
AIR      8.22s   7.37%
gen      4.89s   4.38%
CC       1.66s   1.49%


Clock frequency is 9869 Hz (100009 clock cycles / (111459 ms / 11 iterations))
Optimal clock frequency is 12935 Hz (131072 padded height / (111459 ms / 11 iterations))
FRI domain length is 2^20

Profile at branch's tip


Prove Fibonacci 10000   time:   [8.5598 s 8.5781 s 8.5966 s]
                        change: [-15.388% -15.170% -14.928%] (p = 0.00 < 0.05)
                        Performance has improved.


### Prove Fibonacci 10000                          94.60s    #Reps   Share  Category
├─Fiat-Shamir: claim                               80.16µs      11   0.00%  (hash –  0.00%)
├─derive additional parameters                    125.57µs      11   0.00%  
├─main tables                                      29.07s       11  30.73%  
│ ├─create                                        451.81ms      11   0.48%  (gen  – 36.97%)
│ ├─pad                                           435.43ms      11   0.46%  (gen  – 35.63%)
│ │ ├─pad original tables                         240.52ms      11   0.25%  
│ │ └─fill degree-lowering table                  194.80ms      11   0.21%  
│ ├─Merkle tree                                    27.85s       11  29.44%  
│ │ ├─leafs                                        27.14s       11  28.69%  
│ │ │ ├─LDE                                        14.96s       33  15.81%  (LDE  – 41.64%)
│ │ │ └─hash rows                                   5.45s       33   5.76%  (hash – 43.73%)
│ │ └─Merkle tree                                 700.77ms      11   0.74%  (hash –  5.63%)
│ ├─Fiat-Shamir                                   336.35µs      11   0.00%  (hash –  0.00%)
│ └─extend                                        335.01ms      11   0.35%  (gen  – 27.41%)
│   ├─initialize master table                     150.29ms      11   0.16%  
│   ├─slice master table                           62.92µs      11   0.00%  
│   ├─all tables                                   97.81ms      11   0.10%  
│   └─fill degree lowering table                   86.70ms      11   0.09%  
├─aux tables                                       31.09s       11  32.87%  
│ ├─Merkle tree                                    31.09s       11  32.86%  
│ │ ├─leafs                                        30.42s       11  32.16%  
│ │ │ ├─LDE                                         9.61s       11  10.15%  (LDE  – 26.74%)
│ │ │ └─hash rows                                   4.51s       11   4.77%  (hash – 36.21%)
│ │ └─Merkle tree                                 665.64ms      11   0.70%  (hash –  5.35%)
│ └─Fiat-Shamir                                     1.83ms      11   0.00%  (hash –  0.01%)
├─quotient calculation (just-in-time)              22.62s       11  23.92%  
│ ├─zero-initialization                           332.69ms      11   0.35%  
│ ├─fetch trace randomizers                         3.97ms      11   0.00%  
│ ├─poly interpolate                              696.45ms      11   0.74%  (LDE  –  1.94%)
│ ├─calculate quotients                            18.27s       11  19.32%  
│ │ ├─poly evaluate                                 4.71s       88   4.98%  (LDE  – 13.11%)
│ │ ├─trace randomizers                             5.28s       88   5.58%  (LDE  – 14.69%)
│ │ └─AIR evaluation                                8.28s       88   8.76%  (AIR  – 100.00%)
│ │   ├─zerofier inverse                            1.30s       88   1.37%  
│ │   └─evaluate AIR, compute quotient codeword     6.96s       88   7.36%  
│ ├─segmentify                                      2.12s       11   2.24%  
│ └─restore original trace                        676.69ms      11   0.72%  (LDE  –  1.88%)
├─hash rows of quotient segments                  469.03ms      11   0.50%  (hash –  3.77%)
├─Merkle tree                                     659.00ms      11   0.70%  (hash –  5.29%)
├─out-of-domain rows                                1.38s       11   1.46%  
├─Fiat-Shamir                                       1.01ms      11   0.00%  (hash –  0.01%)
├─linear combination                                2.69s       11   2.84%  
│ ├─main                                          220.19ms      11   0.23%  (CC   – 12.08%)
│ ├─aux                                           193.60ms      11   0.20%  (CC   – 10.62%)
│ └─quotient                                        1.19s       11   1.26%  (CC   – 65.39%)
├─DEEP                                              1.11s       11   1.18%  
│ ├─main&aux curr row                             354.48ms      11   0.37%  
│ ├─main&aux next row                             369.62ms      11   0.39%  
│ └─segmented quotient                            387.42ms      11   0.41%  
├─combined DEEP polynomial                        217.43ms      11   0.23%  
│ └─sum                                           217.39ms      11   0.23%  (CC   – 11.92%)
├─FRI                                               2.34s       11   2.47%  
└─open trace leafs                                  1.91s       11   2.02%  
  └─recompute rows                                  1.91s       22   2.02%  


### Categories
LDE     35.93s  37.98%
hash    12.45s  13.16%
AIR      8.28s   8.76%
CC       1.82s   1.93%
gen      1.22s   1.29%


Clock frequency is 11629 Hz (100009 clock cycles / (94596 ms / 11 iterations))
Optimal clock frequency is 15241 Hz (131072 padded height / (94596 ms / 11 iterations))
FRI domain length is 2^20

A substantial amount of the savings are from witness generation. This makes sense, as the tables held in memory are now half as large, and we now only generate exactly as much randomness as required for ZK.

Note that the large shift away from the category “hash” is due to now-correct annotations.

i.e., not exactly the “tad” from above, which was talking about the performance improvements of the latest commit, which are less relevant in the grand scheme of this PR ↩

jan-ferdinand · 2024-10-28T09:15:05Z

The runtime performance gain of the caching path that comes with this PR is 10%. 🎉

Profile at branch's base

Prove Fibonacci 10000   time:   [6.5011 s 6.5121 s 6.5221 s]


### Prove Fibonacci 10000                      72.05s    #Reps   Share  Category                                                       
├─Fiat-Shamir: claim                          111.03µs      11   0.00%  (hash –  0.00%)                                                
├─derive additional parameters                121.68µs      11   0.00%  
├─main tables                                  27.90s       11  38.72%  
│ ├─create                                      3.53s       11   4.90%  (gen  – 73.90%)
│ ├─pad                                       489.70ms      11   0.68%  (gen  – 10.25%)
│ │ ├─pad original tables                     237.87ms      11   0.33%  
│ │ └─fill degree-lowering table              251.72ms      11   0.35%  
│ ├─randomize trace                           141.57ms      11   0.20%  (gen  –  2.96%)
│ ├─LDE                                        15.69s       11  21.78%  (LDE  – 51.19%)
│ │ ├─polynomial zero-initialization           35.95µs      11   0.00%  
│ │ ├─interpolation                           970.66ms      11   1.35%  
│ │ ├─resize                                    1.26s       11   1.75%  
│ │ ├─evaluation                               13.29s       11  18.45%  
│ │ └─memoize                                   8.03µs      11   0.00%  
│ ├─Merkle tree                                 7.54s       11  10.47%  (hash – 50.06%)
│ │ ├─leafs                                     6.87s       11   9.53%  
│ │ └─Merkle tree                             669.20ms      11   0.93%  
│ ├─Fiat-Shamir                               334.26µs      11   0.00%  (hash –  0.00%)
│ └─extend                                    499.51ms      11   0.69%  (gen  – 10.45%)
│   ├─initialize master table                 256.70ms      11   0.36%  
│   ├─slice master table                       65.68µs      11   0.00%  
│   ├─all tables                              120.33ms      11   0.17%  
│   └─fill degree lowering table              122.27ms      11   0.17%  
├─aux tables                                   17.84s       11  24.76%  
│ ├─randomize trace                           116.29ms      11   0.16%  (gen  –  2.43%)
│ ├─LDE                                        11.36s       11  15.77%  (LDE  – 37.07%)
│ │ ├─polynomial zero-initialization           21.19µs      11   0.00%  
│ │ ├─interpolation                             1.15s       11   1.59%  
│ │ ├─resize                                  880.49ms      11   1.22%  
│ │ ├─evaluation                                9.34s       11  12.96%  
│ │ └─memoize                                   9.10µs      11   0.00%  
│ ├─Merkle tree                                 6.36s       11   8.83%  (hash – 42.23%)
│ │ ├─leafs                                     5.69s       11   7.90%  
│ │ └─Merkle tree                             667.54ms      11   0.93%  
│ └─Fiat-Shamir                                 1.64ms      11   0.00%  (hash –  0.01%)
├─quotient calculation (cached)                 6.88s       11   9.54%  (CC   – 80.43%)
│ ├─zerofier inverse                          992.21ms      11   1.38%  
│ └─evaluate AIR, compute quotient codeword     5.88s       11   8.17%  
├─quotient LDE                                  3.60s       11   4.99%  (LDE  – 11.74%)
├─hash rows of quotient segments              493.16ms      11   0.68%  (hash –  3.27%)
├─Merkle tree                                 664.39ms      11   0.92%  (hash –  4.41%)
├─out-of-domain rows                            2.74s       11   3.80%  
├─Fiat-Shamir                                   1.02ms      11   0.00%  (hash –  0.01%)
├─linear combination                            2.83s       11   3.92%  
│ ├─main                                      134.71ms      11   0.19%  (CC   –  1.58%)
│ ├─aux                                        93.65ms      11   0.13%  (CC   –  1.10%)
│ └─quotient                                    1.24s       11   1.72%  (CC   – 14.50%)
├─DEEP                                          1.14s       11   1.58%  
│ ├─main&aux curr row                         360.76ms      11   0.50%  
│ ├─main&aux next row                         384.33ms      11   0.53%  
│ └─segmented quotient                        390.91ms      11   0.54%  
├─combined DEEP polynomial                    205.40ms      11   0.29%  
│ └─sum                                       205.36ms      11   0.29%  (CC   –  2.40%)
├─FRI                                           2.40s       11   3.33%  
└─open trace leafs                              7.62ms      11   0.01%  


### Categories
LDE     30.65s  42.55%
hash    15.06s  20.90%
CC       8.55s  11.87%
gen      4.78s   6.63%


Clock frequency is 15269 Hz (100009 clock cycles / (72046 ms / 11 iterations))
Optimal clock frequency is 20012 Hz (131072 padded height / (72046 ms / 11 iterations))
FRI domain length is 2^20

Profile at branch's tip

Prove Fibonacci 10000   time:   [5.8572 s 5.8721 s 5.8835 s]
                        change: [-10.083% -9.8281% -9.5672%] (p = 0.00 < 0.05)
                        Performance has improved.


### Prove Fibonacci 10000                      64.80s    #Reps   Share  Category
├─Fiat-Shamir: claim                          117.95µs      11   0.00%  (hash –  0.00%)
├─derive additional parameters                136.39µs      11   0.00%  
├─main tables                                  23.77s       11  36.68%  
│ ├─create                                    443.93ms      11   0.69%  (gen  – 36.74%)
│ ├─pad                                       437.82ms      11   0.68%  (gen  – 36.24%)
│ │ ├─pad original tables                     238.96ms      11   0.37%  
│ │ └─fill degree-lowering table              198.74ms      11   0.31%  
│ ├─LDE                                        15.05s       11  23.23%  (LDE  – 51.17%)
│ │ ├─polynomial zero-initialization           25.70µs      11   0.00%  
│ │ ├─interpolation                           637.42ms      11   0.98%  
│ │ ├─resize                                    1.25s       11   1.94%  
│ │ ├─evaluation                               13.16s       11  20.31%  
│ │ └─memoize                                   8.25µs      11   0.00%  
│ ├─Merkle tree                                 7.51s       11  11.58%  
│ │ ├─leafs                                     6.84s       11  10.56%  
│ │ │ └─hash rows                               6.84s       11  10.56%  (hash – 45.82%)
│ │ └─Merkle tree                             661.81ms      11   1.02%  (hash –  4.43%)
│ ├─Fiat-Shamir                               351.57µs      11   0.00%  (hash –  0.00%)
│ └─extend                                    326.39ms      11   0.50%  (gen  – 27.02%)
│   ├─initialize master table                 148.76ms      11   0.23%  
│   ├─slice master table                       61.77µs      11   0.00%  
│   ├─all tables                               93.12ms      11   0.14%  
│   └─fill degree lowering table               84.30ms      11   0.13%  
├─aux tables                                   16.89s       11  26.07%  
│ ├─LDE                                        10.61s       11  16.37%  (LDE  – 36.06%)
│ │ ├─polynomial zero-initialization           18.20µs      11   0.00%  
│ │ ├─interpolation                           513.83ms      11   0.79%  
│ │ ├─resize                                  879.87ms      11   1.36%  
│ │ ├─evaluation                                9.21s       11  14.22%  
│ │ └─memoize                                   7.91µs      11   0.00%  
│ ├─Merkle tree                                 6.28s       11   9.69%  
│ │ ├─leafs                                     5.61s       11   8.66%  
│ │ │ └─hash rows                               5.61s       11   8.66%  (hash – 37.59%)
│ │ └─Merkle tree                             665.64ms      11   1.03%  (hash –  4.46%)
│ └─Fiat-Shamir                                 1.70ms      11   0.00%  (hash –  0.01%)
├─quotient calculation (cached)                 6.80s       11  10.50%  (CC   – 79.12%)
│ ├─zerofier inverse                          987.79ms      11   1.52%  
│ └─evaluate AIR, compute quotient codeword     5.81s       11   8.97%  
├─quotient LDE                                  3.76s       11   5.80%  (LDE  – 12.77%)
├─hash rows of quotient segments              495.65ms      11   0.76%  (hash –  3.32%)
├─Merkle tree                                 650.83ms      11   1.00%  (hash –  4.36%)
├─out-of-domain rows                            1.31s       11   2.02%  
├─Fiat-Shamir                                   1.00ms      11   0.00%  (hash –  0.01%)
├─linear combination                            2.73s       11   4.22%  
│ ├─main                                      209.11ms      11   0.32%  (CC   –  2.43%)
│ ├─aux                                       184.97ms      11   0.29%  (CC   –  2.15%)
│ └─quotient                                    1.20s       11   1.85%  (CC   – 13.93%)
├─DEEP                                          1.09s       11   1.69%  
│ ├─main&aux curr row                         350.01ms      11   0.54%  
│ ├─main&aux next row                         370.70ms      11   0.57%  
│ └─segmented quotient                        372.15ms      11   0.57%  
├─combined DEEP polynomial                    202.92ms      11   0.31%  
│ └─sum                                       202.88ms      11   0.31%  (CC   –  2.36%)
├─FRI                                           2.25s       11   3.47%  
└─open trace leafs                              7.42ms      11   0.01%  


### Categories
LDE     29.41s  45.39%
hash    14.93s  23.04%
CC       8.60s  13.26%
gen      1.21s   1.86%


Clock frequency is 16976 Hz (100009 clock cycles / (64800 ms / 11 iterations))
Optimal clock frequency is 22249 Hz (131072 padded height / (64800 ms / 11 iterations))
FRI domain length is 2^20

Transform codewords into polynomials in monomial coefficient form in-place, reducing maximum RAM consumption. Also, clear unusable caches to decrease RAM usage even further.

jan-ferdinand commented Oct 8, 2024

View reviewed changes

triton-vm/src/table/master_table.rs Outdated Show resolved Hide resolved

aszepieniec requested changes Oct 8, 2024

View reviewed changes

triton-vm/src/stark.rs Outdated Show resolved Hide resolved

triton-vm/src/table/master_table.rs Outdated Show resolved Hide resolved

triton-vm/src/table/master_table.rs Outdated Show resolved Hide resolved

jan-ferdinand force-pushed the remove_explicit_rand_trace branch 3 times, most recently from 954bb27 to 30968d2 Compare October 14, 2024 18:10

jan-ferdinand requested a review from aszepieniec October 14, 2024 22:33

aszepieniec requested changes Oct 15, 2024

View reviewed changes

jan-ferdinand force-pushed the remove_explicit_rand_trace branch 2 times, most recently from f1f7375 to 2a875a9 Compare October 16, 2024 09:51

jan-ferdinand force-pushed the remove_explicit_rand_trace branch from 2a875a9 to d948934 Compare October 26, 2024 12:46

jan-ferdinand requested a review from aszepieniec October 26, 2024 19:16

aszepieniec requested changes Oct 26, 2024

View reviewed changes

jan-ferdinand and others added 7 commits October 28, 2024 08:53

perf: Use randomness more efficiently

540d713

changelog: ignore

test: Computations are independent of caching

ecd3108

perf: Speed up zerofier multiplication

09705f7

changelog: ignore

perf: Use one seed for all randomizers

15b15c4

changelog: ignore

test: Verify randomizers' large Hamming distance

ba2e92b

refactor: Generalize ndarray helper par_zeros

b7e478e

changelog: ignore

refactor: Generalize quotient segment computation

f9bb240

In particular, the memory-vs-compute-time trade-off can now be tuned via constant `RANDOMIZED_TRACE_LEN_TO_WORKING_DOMAIN_LEN_RATIO`, which is hardcoded for now. Co-authored-by: Alan <[email protected]>

jan-ferdinand force-pushed the remove_explicit_rand_trace branch from d948934 to 3ffa68f Compare October 28, 2024 08:38

perf(RAM): Reduce number of temporary allocations

4adeda7

Transform codewords into polynomials in monomial coefficient form in-place, reducing maximum RAM consumption. Also, clear unusable caches to decrease RAM usage even further.

jan-ferdinand force-pushed the remove_explicit_rand_trace branch from 3ffa68f to 4adeda7 Compare October 29, 2024 11:51

jan-ferdinand merged commit 833244d into master Oct 29, 2024
4 checks passed

jan-ferdinand deleted the remove_explicit_rand_trace branch October 29, 2024 12:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use randomness more efficiently #335

Use randomness more efficiently #335

jan-ferdinand commented Oct 8, 2024

aszepieniec left a comment

jan-ferdinand commented Oct 9, 2024 •

edited

Loading

jan-ferdinand commented Oct 14, 2024 •

edited

Loading

aszepieniec left a comment

jan-ferdinand commented Oct 26, 2024 •

edited

Loading

aszepieniec left a comment

jan-ferdinand commented Oct 28, 2024

jan-ferdinand commented Oct 28, 2024

Use randomness more efficiently #335

Use randomness more efficiently #335

Conversation

jan-ferdinand commented Oct 8, 2024

aszepieniec left a comment

Choose a reason for hiding this comment

jan-ferdinand commented Oct 9, 2024 • edited Loading

with explicit randomizers (old)

without explicit randomizers (new)

jan-ferdinand commented Oct 14, 2024 • edited Loading

basically “Workflow 2” (new new)

aszepieniec left a comment

Choose a reason for hiding this comment

jan-ferdinand commented Oct 26, 2024 • edited Loading

basically “Workflow 3” (new new new)

aszepieniec left a comment

Choose a reason for hiding this comment

jan-ferdinand commented Oct 28, 2024

Footnotes

jan-ferdinand commented Oct 28, 2024

jan-ferdinand commented Oct 9, 2024 •

edited

Loading

jan-ferdinand commented Oct 14, 2024 •

edited

Loading

jan-ferdinand commented Oct 26, 2024 •

edited

Loading