perf: make fair benchmark comparison #30

meskill · 2024-11-26T20:49:23Z

The previous benchmark implementation was focused only on the called operations itself ignoring the part of deferred calculation that actually should be done to be able to use the calculated result.

Why is it important? To make the comparison more fair since when using Vec and its methods we get the entity that could be used immediately for acquiring the specific values or iterating over. But for Chunk we need to call .as_vec method before we can use similar things.

It changes the results significantly and could represent downsides of using Chunk more clearly.

Here are the results on my machine:

append/Chunk            time:   [1.0212 ms 1.0430 ms 1.0703 ms]
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe
append/Vec              time:   [481.98 µs 483.49 µs 485.36 µs]
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe

Benchmarking prepend/Chunk: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.0s, enable flat sampling, or reduce sample count to 50.
prepend/Chunk           time:   [1.7710 ms 1.7810 ms 1.7927 ms]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
prepend/Vec             time:   [16.131 ms 16.282 ms 16.461 ms]
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe

Benchmarking concat/Chunk: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.1s, or reduce sample count to 90.
concat/Chunk            time:   [51.900 ms 52.134 ms 52.406 ms]
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe
Benchmarking concat/Vec: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.5s, or reduce sample count to 90.
concat/Vec              time:   [50.810 ms 51.041 ms 51.352 ms]
Found 10 outliers among 100 measurements (10.00%)
  9 (9.00%) high mild
  1 (1.00%) high severe

clone/Chunk             time:   [1.8813 µs 1.9128 µs 1.9514 µs]
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) high mild
  10 (10.00%) high severe
clone/Vec               time:   [1.3566 µs 1.3694 µs 1.3856 µs]
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) high mild
  7 (7.00%) high severe

from_iter/Chunk         time:   [3.2536 µs 3.2681 µs 3.2904 µs]
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe
from_iter/Vec           time:   [1.3630 µs 1.3896 µs 1.4259 µs]
Found 15 outliers among 100 measurements (15.00%)
  5 (5.00%) high mild
  10 (10.00%) high severe

tusharmath · 2024-11-26T21:15:42Z

The premise for your BMs is that a chunk needs to be converted to a vector in every case. So the benchmarks showcase that for almost all instances chunks are slower apart from prepend. This contradicts the whole purpose of having this data structure when in reality we are combating the performance penalty we need to deal with while working with vecs.

In the real world, you won't always materialize the value as a vector.

For eg: I might just need to count the total number of values or sum of value or max value etc. There are plenty of operations where materializing the chunk as vec can be skipped.
There are still some cases where materialization needs to be performed only once instead of for each iteration (Like shown in the current benchmark). For eg: Concatening a large number of chunks and then materializing it once.
There are also some cases where instead of materializing a chunk to a Vec, a LinkedList or a Tree or a Triė is more appropriate.

Overall, I don't think the premise that Chunks have to convert to Vec to do anything meaningful is inaccurate.

meskill · 2024-11-26T22:18:04Z

The premise for your BMs is that a chunk needs to be converted to a vector in every case. So the benchmarks showcase that for almost all instances chunks are slower apart from prepend. This contradicts the whole purpose of having this data structure when in reality we are combating the performance penalty we need to deal with while working with vecs.

In the real world, you won't always materialize the value as a vector.

For eg: I might just need to count the total number of values or sum of value or max value etc. There are plenty of operations where materializing the chunk as vec can be skipped.

There are still some cases where materialization needs to be performed only once instead of for each iteration (Like shown in the current benchmark). For eg: Concatening a large number of chunks and then materializing it once.

There are also some cases where instead of materializing a chunk to a Vec, a LinkedList or a Tree or a Triė is more appropriate.

Overall, I don't think the premise that Chunks have to convert to Vec to do anything meaningful is inaccurate.

That's the point of this pr - showcase the example close to the real use case, otherwise the benchmarks could be treated as manually crafted to showcase the benefits that are unlikely achievable.

Let's break the examples you've provided:

If you don't need elements why do you need any data structure at all? And even if you do use data structure to do this, how the Chunk could be used to do this? It seems there is no methods and the iteration over data structure will require CPU time spend anyway that is missed here, right? If you still think it's applicable in this case, let's add more benchmarks and compare it to Vec's speed
That's how the benchmark already implemented for append, prepend, concat - execute the operation N times and then convert it once with as_vec. Please, check the code if you think it's implemented wrong
Converting to different structures doesn't mean that would be fast because the operations are fast and we still should consider the conversion stage, right? What if creating LinkedList/Tree etc. from the start will be faster than using Chunk?

The current benchmark feels more like comparing iterator adapters e.g. filter, map without any consuming iterators and saying that they are faster then for loop. But we know those operations don't run actual calculation and this will be considered unfair comparison.

tusharmath · 2024-11-28T02:13:18Z

Concat comparison currently is fair. Typically in any immutable DS, concat guarantees that if two instances are concatenated, the internal order of values of the resulting DS follows the order of concatenation. It doesn't expect that each value should be accessible by index and considering its immutable, it needs to ensure that none of the two values are modified. The proposed benchmarks don't perform concatenation technically in the vector implementation. It modifies the base value with the new value which is not a fair comparison because it's causing a mutation.

The same is true for append. Appending a value only needs to guarantee that while iterating it should be the last value produced. Again the vector implementation isn't immutable and hence not an apples-to-apples comparison.

meskill added 2 commits November 26, 2024 20:00

chore: cleanup benchmark code

d369b7c

perf: convert Chunk to vec in benchmarks

431cfdd

meskill changed the title ~~perf: make fair benchmark comparison fwii~~ perf: make fair benchmark comparison Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: make fair benchmark comparison #30

perf: make fair benchmark comparison #30

meskill commented Nov 26, 2024

tusharmath commented Nov 26, 2024 •

edited

Loading

meskill commented Nov 26, 2024

tusharmath commented Nov 28, 2024

perf: make fair benchmark comparison #30

Are you sure you want to change the base?

perf: make fair benchmark comparison #30

Conversation

meskill commented Nov 26, 2024

tusharmath commented Nov 26, 2024 • edited Loading

meskill commented Nov 26, 2024

tusharmath commented Nov 28, 2024

tusharmath commented Nov 26, 2024 •

edited

Loading