Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bench: add memory benchmarks #255

Merged
merged 1 commit into from
Apr 15, 2024
Merged

Conversation

rgrinberg
Copy link
Member

@rgrinberg rgrinberg commented Apr 15, 2024

$ dune exec benchmarks/memory.exe will demonstrate the pathological example

Signed-off-by: Rudi Grinberg <[email protected]>

<!-- ps-id: 4c9c0b62-d252-48a0-85e2-8962244825a6 -->
@rgrinberg rgrinberg force-pushed the ps/rr/bench__add_memory_benchmarks branch from aa84882 to 6ba83be Compare April 15, 2024 20:46
@rgrinberg rgrinberg merged commit ec98adc into master Apr 15, 2024
3 checks passed
@vouillon
Copy link
Member

The automata remembers the last size + 1 characters, which takes an exponential amount of memory.

In this case, one only needs to check that there are size zeroes or ones after the first 1, which would only require a linear amount of memory. For that, when building the automaton one would need to see that repn (set "01") (n + 1) (Some (n + 1) is subsumed by repn (set "01") n (Some n).

But this is very fragile. For a longest match semantics or a first match / greedy semantics, one really needs to remember all these characters. And if we change the regular expression only a little bit, like below, one cannot avoid the exponential behavior either.

  seq
    [ rep (set "01")
    ; char '1'
    ; repn (set "01") size (Some size)
    ; char 'x'
    ]

DFAs are not good at counting...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants