Custom collectors #444

lud-wj · 2024-12-22T13:41:48Z

edit by @PragTob: This is the original issue, we arrived at the solution though that custom collectors would solve this: #444 (comment)

Hello,

In order to not pollute the computation of something with the process dictionary being already fed with memoized results, I have a function that I would benchmark which boils down to basically this:

fn input -> 
  parent = self()
  spawn_link(fn -> send(parent, acutal_computation(input)) end)
  receive(do: (msg -> msg))
end

Problem is that the "entrypoint" that I can give to benchee must be the external function, to keep the isolation.

Is there a way to provide benchee with a special scenario, something like that:

fn input -> 
  parent = self()
  spawn_link(fn -> 
    _result = acutal_computation(input) # not used for benchmark
    measurements = Benchee.some_function_to_capture_measurements()
    send(parent, measurements)
  end)
  # return the measurements to benchee
  receive(do: (msg -> msg))
end

I want to do that first because the measurements should be taken deeper in the call stack, here it's just for the example. And secondly because for memory consumption ~~I don't know if benchee looks at the whole system or just the current process (I guess it does the segond given the code I have read)~~. So If the function spawn the memory will be off.

Edit:

Like memory measurements, this only tracks reductions directly in the function given to benchee, not processes spawned by it or other processes it uses.

Thank you

PragTob · 2024-12-30T16:19:55Z

Heyo, thanks for reporting an issue!

What you're asking for is tough to accomplish (I think). I haven't looked deeply into the code again, but essentially iirc Benchee needs to know the process to measure memory consumption and reductions.

As the process in your example is spawned dynamically on every benchmark invocation it's not known upfront. That means we'd need to introduce some sort of state in benchee that could be added/called on on demand. I don't really wanna do that - I like benchee to be functional/just a bunch of functions.

You could theoretically call the functions we use to measure memory consumption directly - for starters you could print them out. Theoretically speaking, we could envision a feature where a benchmarking function can return a special value that tells benchee "add this to the measurements" but that'd break with how return values work so far.

I'd ask though - do you really need to spawn to keep the isolation? Each benchmarking scenario is already executed in its own process and if you want to clean up something after the invocation then you make use of after_each to do that cleanup manually.

Not the best, but it should work. What do you think?

lud-wj · 2025-01-06T10:47:44Z

Hello,

Thank you for your answer. I totally understand that you want to keep things simple, it's much better for fixing problems and adding new features.

In my case what I imagined is another configuration entry for benchee where we could pass a runner function. I guess somewhere in the code you have something like this:

def compute(input, implem, remaining_time, acc) when remaining_time > 0 do
  t0 = current_time()
  measurements = take_measurements(fn ->implem.(input) end)
  t1 = current_time()
  compute(input, implem, remaining_time - (t1 - t0), [measurements|acc]
end

So maybe we could pass a different runner that has to provide the same mechanism.

My use case was for advent of code. I have a small library that helps people with all the boilerplate. It provides a command to run the solution with benchee to get an average time of execution. I guess I could allow users to define an optional benchmark_clean/0 callback that I'd call in after_each but I fear this would defocus from AoC itself.

Shutting down the process would automatically cleanup the process dictionary cache, or any ETS table used for caching.

For now I think I will use after_each and delete all process dictionary keys and ETS tables from the current process in after_each, that should be enough!

PragTob · 2025-01-06T17:20:21Z

Heyo,

I don't think another runner makes sense here but another collector might do: https://github.com/bencheeorg/benchee/tree/main/lib/benchee/benchmark/collect

FYI the memory and reduction already spawn a separate process to do their own measurements as precisely as possible.

So, the only one that doesn't run in a separate process is the time collector:

defmodule Benchee.Benchmark.Collect.Time do
  @moduledoc false

  # Measure the time elapsed while executing a given function.
  #
  # In contrast to `:timer.tc/1` it always returns the result in nano seconds instead of micro
  # seconds. This helps us avoid losing precision as both Linux and MacOSX seem to be able to
  # measure in nano seconds. `:timer.tc/n`
  # [forfeits this precision](
  # https://github.com/erlang/otp/blob/main/lib/stdlib/src/timer.erl#L164-L169).

  @behaviour Benchee.Benchmark.Collect

  @spec collect((-> any)) :: {non_neg_integer, any}
  def collect(function) do
    start = :erlang.monotonic_time()
    result = function.()
    finish = :erlang.monotonic_time()

    duration_nano_seconds = :erlang.convert_time_unit(finish - start, :native, :nanosecond)

    {duration_nano_seconds, result}
  end
end

We could ship a version of it that measures the function in a new process. As I understand this, this would solve your problem - right?

I'm not a huge fan of that, there's a reason we don't run it in a new process every time - f.ex. growing process memory interferes and often you do want a "warm" environment for benchmarking, hence the warmup. Sending all the data to a new process also makes it so less measures are taken in total. So, as a default the measurements there are probably worse, although of course by how much I'd need to check.

However:

we already have different collectors
- we could provide another one
- making them exchangable actually plays fine with benchee's philosophy of extensibility, people could then specify their own collectors

So, I'm considering that and if I understand you correctly that'd help your use case.

I do have a couple of quesitons though:

I guess I could allow users to define an optional benchmark_clean/0 callback that I'd call in after_each but I fear this would defocus from AoC itself.

Shutting down the process would automatically cleanup the process dictionary cache, or any ETS table used for caching.

Depends how/where the ETS table is started right? And that's the thing - I think you're trying to automagically take care of avenues people use for caching for them but there's a myriad of ways that aren't solved by a new process (writing out to disk, any sort of DB..., libraries they may use that could have their own cache in their own process). I think it's better to be explicit about needed cleanup work then.

lud-wj · 2025-01-06T20:02:06Z

Yes while typing my comment I thought about a classic use case: a globally shared ETS table.

In the case of Advent Of Code we generally write independent modules and ad-hoc memoization solutions. Personnally I will generally use the pdict because it magically applies memoization without the need to pass a table ref or pid down the stack, so the code layout does not change between the two parts of the puzzle.

But yes in the end being explicit seems to be right.

So yeah maybe we can forget about this issue.

Opening for custom collectors could be nice though. I don't have a use case right now (besides spawning for time) but adding collectors could be great, to report on side effects (like how much http calls have been made, did the cache directory got more filled with files), although that kind of stuff should be deterministic and should not change between different calls.

PragTob · 2025-01-07T08:26:26Z

yeah opening up collectors in the way to have it fully custom is even more involved but is already the way we've been moving. That said, I believe the keys to gather these values are all hard coded in a struct definition right now and so this would change the benchee data structure and so would be up for a major version bump - before which I gotta take care of some other stuff here 😅

as for memoization, personally I always like passing data structures on - it's more explicit and I can see what uses the cache and what doesn't so that's nice. That said, esp. for something like aoc is a quick and good solution.

lud-wj · 2025-01-07T13:30:25Z

Yeah with AoC I kinda like it when it's hacky :D

PragTob · 2025-01-12T14:57:51Z

Renamed issue and added an edit on top so when coming back here I wouldn't need to wonder and read through it all to see what the potential solution would be.

PragTob changed the title ~~Using benchee to benchmark processes that run in another process~~ Custom collectors Jan 12, 2025

PragTob added the enhancement label Jan 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom collectors #444

Custom collectors #444

lud-wj commented Dec 22, 2024 •

edited by PragTob

Loading

PragTob commented Dec 30, 2024

lud-wj commented Jan 6, 2025

PragTob commented Jan 6, 2025

lud-wj commented Jan 6, 2025

PragTob commented Jan 7, 2025

lud-wj commented Jan 7, 2025

PragTob commented Jan 12, 2025

Custom collectors #444

Custom collectors #444

Comments

lud-wj commented Dec 22, 2024 • edited by PragTob Loading

PragTob commented Dec 30, 2024

lud-wj commented Jan 6, 2025

PragTob commented Jan 6, 2025

lud-wj commented Jan 6, 2025

PragTob commented Jan 7, 2025

lud-wj commented Jan 7, 2025

PragTob commented Jan 12, 2025

lud-wj commented Dec 22, 2024 •

edited by PragTob

Loading