-
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issue with --pattern-from #32
Comments
Just not quite a quick fix, I'm afraid. OOC, do any of those patterns consist of just looking for a literal string? |
A first stab at merging consecutive regexes into a single regex for multiple patterns |
0.2.14 now in the ecosystem. Please let me know how this affected the performance! |
Great I'll test again and let you know ? All patterns are regex. |
ok with the latest version it's better than before 129 sec now (as opposed to 284 sec) for the list of pattern vs 81 sec with the single regex with |. |
ok, that's better. By any chance, are one or more of the regexes literal? Consisting of just |
Sorry I forgot to add the type to the cmd so no pattern is interpreted as literal in my list right ? |
With a literal regex I mean |
my patterns consist of IP addresses , filenames. |
Could you give me an example of the IP address and a filename? |
sure "8.8.8.8" and "test.exe" for example. |
When you have 8.8.8.8 in your regex, you actually mean the string "8.8.8.8", right? But "8a8b8c8" you would not want to match, right? If that is so, why are you using regexes? Why not just the string "8.8.8.8" (which would do a much cheaper '.contains("8.8.8.8")' on the targets. |
yes but I have also tests?.exe in the same list of pattern. |
ok, then I suggest grouping the ones with wildcards together, and see how that affects the performance. As the logic now will group together any consecutive regexes into a single regex |
FWIW, I'm keeping this one open until it's time to rewrite the expression parsing using RakuAST, and optimize from there |
This has now happened with 0.3.4 https://raku.land/zef:lizmat/App::Rak/changes?v=0.3.4 . This now builds a single Please confirm whether this makes things better or worse in your use case. |
Hi Liz,
Another quick fix , when you're using the
--pattern-from
with regex you're doing a loop with each one of the pattern rather than compiling a single regex with all pattern with | this creates a huge performance penalty because you checking the same file over and over based on the number of line in the input file.Based on my test on a large CSV 93 MB it takes 294 secs against a 99 regex pattern file vs 84 sec if you convert this as a regex with |.
Can you fix this ?
The text was updated successfully, but these errors were encountered: