Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of radix trie #153

Merged
merged 9 commits into from
Jan 26, 2025
Merged

Implementation of radix trie #153

merged 9 commits into from
Jan 26, 2025

Conversation

gavlyukovskiy
Copy link
Collaborator

@gavlyukovskiy gavlyukovskiy commented Jul 13, 2024

Radix tree

This PR implements radix trie that has reduced retained memory footprint. I hoped to also see some improvements in benchmarks that affect processing, but I don't see anything conclusive 🤷 Perhaps that is because of branching in the matching methods that is now a bit more complicated.

  • JSONPath tests are not passing, needs further investigation
  • Left a couple of TODOs with further optimizations
  • Look a bit more into why we didn't see any improvements for masking

Copy link

github-actions bot commented Jul 13, 2024

Note

These results are affected by shared workloads on GitHub runners. Use the results only to detect possible regressions, but always rerun on more stable machine before making any conclusions!

Benchmark results (pull-request, a22beae)

Benchmark                                                           (characters)  (jsonPath)  (jsonSize)  (keyLength)  (maskedKeyProbability)  (numberOfTargetKeys)  (streamInputType)  (streamOutputType)   Mode  Cnt         Score        Error   Units
BaselineBenchmark.countBytes                                             unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4   2587665.087 ± 151853.193   ops/s
BaselineBenchmark.countBytes:gc.alloc.rate.norm                          unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4         0.001 ±      0.001    B/op
BaselineBenchmark.jacksonParseAndMask                                    unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4     29672.694 ±    943.417   ops/s
BaselineBenchmark.jacksonParseAndMask:gc.alloc.rate.norm                 unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4     64448.071 ±      0.045    B/op
BaselineBenchmark.jacksonParseOnly                                       unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4     50272.695 ±    619.645   ops/s
BaselineBenchmark.jacksonParseOnly:gc.alloc.rate.norm                    unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4     24312.042 ±      0.028    B/op
BaselineBenchmark.regexReplace                                           unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4      5310.533 ±     45.073   ops/s
BaselineBenchmark.regexReplace:gc.alloc.rate.norm                        unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4     61656.396 ±      0.274    B/op
BaselineBenchmark.writeFile                                              unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4      4313.778 ±    726.036   ops/s
BaselineBenchmark.writeFile:gc.alloc.rate.norm                           unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4     10648.487 ±      0.378    B/op
InstanceCreationBenchmark.jsonMasker                                         N/A         N/A         N/A          N/A                     N/A                  1000                N/A                 N/A  thrpt    4      1508.685 ±     85.417   ops/s
InstanceCreationBenchmark.jsonMasker:gc.alloc.rate.norm                      N/A         N/A         N/A          N/A                     N/A                  1000                N/A                 N/A  thrpt    4   1672691.179 ±     24.145    B/op
JsonMaskerBenchmark.jsonMaskerByteArrayStreams                           unicode       false         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4    227254.718 ±   5475.626   ops/s
JsonMaskerBenchmark.jsonMaskerByteArrayStreams:gc.alloc.rate.norm        unicode       false         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4     10816.009 ±      0.001    B/op
JsonMaskerBenchmark.jsonMaskerByteArrayStreams                           unicode        true         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4    214138.090 ±   3665.709   ops/s
JsonMaskerBenchmark.jsonMaskerByteArrayStreams:gc.alloc.rate.norm        unicode        true         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4     12240.009 ±      0.001    B/op
JsonMaskerBenchmark.jsonMaskerBytes                                      unicode       false         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4    398200.162 ±  12722.187   ops/s
JsonMaskerBenchmark.jsonMaskerBytes:gc.alloc.rate.norm                   unicode       false         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4      2240.005 ±      0.001    B/op
JsonMaskerBenchmark.jsonMaskerBytes                                      unicode        true         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4    361469.933 ±  10073.051   ops/s
JsonMaskerBenchmark.jsonMaskerBytes:gc.alloc.rate.norm                   unicode        true         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4      2072.005 ±      0.001    B/op
JsonMaskerBenchmark.jsonMaskerString                                     unicode       false         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4    216092.411 ±   2268.145   ops/s
JsonMaskerBenchmark.jsonMaskerString:gc.alloc.rate.norm                  unicode       false         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4     10144.009 ±      0.001    B/op
JsonMaskerBenchmark.jsonMaskerString                                     unicode        true         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4    196081.607 ±   1817.209   ops/s
JsonMaskerBenchmark.jsonMaskerString:gc.alloc.rate.norm                  unicode        true         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4     10992.010 ±      0.001    B/op
LargeKeySetInstanceCreationBenchmark.jsonMasker                              N/A         N/A         N/A          100                     N/A                  1000                N/A                 N/A  thrpt    4       137.870 ±      1.591   ops/s
LargeKeySetInstanceCreationBenchmark.jsonMasker:gc.alloc.rate.norm           N/A         N/A         N/A          100                     N/A                  1000                N/A                 N/A  thrpt    4  32420278.352 ±    215.669    B/op
StreamTypeBenchmark.jsonMaskerStreams                                        N/A         N/A         1kb          N/A                     N/A                   N/A    ByteArrayStream     ByteArrayStream  thrpt    4    256799.050 ±   4075.265   ops/s
StreamTypeBenchmark.jsonMaskerStreams:gc.alloc.rate.norm                     N/A         N/A         1kb          N/A                     N/A                   N/A    ByteArrayStream     ByteArrayStream  thrpt    4     12240.008 ±      0.007    B/op
StreamTypeBenchmark.jsonMaskerStreams                                        N/A         N/A         1kb          N/A                     N/A                   N/A    ByteArrayStream          FileStream  thrpt    4      4409.339 ±   1264.289   ops/s
StreamTypeBenchmark.jsonMaskerStreams:gc.alloc.rate.norm                     N/A         N/A         1kb          N/A                     N/A                   N/A    ByteArrayStream          FileStream  thrpt    4      9280.479 ±      0.396    B/op
StreamTypeBenchmark.jsonMaskerStreams                                        N/A         N/A         1kb          N/A                     N/A                   N/A         FileStream     ByteArrayStream  thrpt    4     81257.822 ±    920.347   ops/s
StreamTypeBenchmark.jsonMaskerStreams:gc.alloc.rate.norm                     N/A         N/A         1kb          N/A                     N/A                   N/A         FileStream     ByteArrayStream  thrpt    4     12368.026 ±      0.022    B/op
StreamTypeBenchmark.jsonMaskerStreams                                        N/A         N/A         1kb          N/A                     N/A                   N/A         FileStream          FileStream  thrpt    4      4067.271 ±   1095.568   ops/s
StreamTypeBenchmark.jsonMaskerStreams:gc.alloc.rate.norm                     N/A         N/A         1kb          N/A                     N/A                   N/A         FileStream          FileStream  thrpt    4      9408.521 ±      0.517    B/op
ValueMaskerBenchmark.maskWithRawValueFunction                            unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4    575826.299 ±  10482.694   ops/s
ValueMaskerBenchmark.maskWithRawValueFunction:gc.alloc.rate.norm         unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4      1600.003 ±      0.001    B/op
ValueMaskerBenchmark.maskWithStatic                                      unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4    616368.698 ±  56288.392   ops/s
ValueMaskerBenchmark.maskWithStatic:gc.alloc.rate.norm                   unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4      1240.003 ±      0.001    B/op
ValueMaskerBenchmark.maskWithTextValueFunction                           unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4    519315.122 ±   9345.862   ops/s
ValueMaskerBenchmark.maskWithTextValueFunction:gc.alloc.rate.norm        unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4      1888.004 ±      0.001    B/op

Benchmark results (master, 007bb6a)

Benchmark                                                           (characters)  (jsonPath)  (jsonSize)  (keyLength)  (maskedKeyProbability)  (numberOfTargetKeys)  (streamInputType)  (streamOutputType)   Mode  Cnt         Score        Error   Units
BaselineBenchmark.countBytes                                             unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4   2589161.779 ± 147513.772   ops/s
BaselineBenchmark.countBytes:gc.alloc.rate.norm                          unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4         0.001 ±      0.001    B/op
BaselineBenchmark.jacksonParseAndMask                                    unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4     29378.132 ±    950.980   ops/s
BaselineBenchmark.jacksonParseAndMask:gc.alloc.rate.norm                 unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4     64272.072 ±      0.050    B/op
BaselineBenchmark.jacksonParseOnly                                       unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4     50066.705 ±    636.332   ops/s
BaselineBenchmark.jacksonParseOnly:gc.alloc.rate.norm                    unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4     24312.042 ±      0.029    B/op
BaselineBenchmark.regexReplace                                           unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4      5299.154 ±    128.770   ops/s
BaselineBenchmark.regexReplace:gc.alloc.rate.norm                        unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4     61656.397 ±      0.266    B/op
BaselineBenchmark.writeFile                                              unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4      5514.633 ±   3079.881   ops/s
BaselineBenchmark.writeFile:gc.alloc.rate.norm                           unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4     10648.381 ±      0.190    B/op
InstanceCreationBenchmark.jsonMasker                                         N/A         N/A         N/A          N/A                     N/A                  1000                N/A                 N/A  thrpt    4       676.779 ±     24.061   ops/s
InstanceCreationBenchmark.jsonMasker:gc.alloc.rate.norm                      N/A         N/A         N/A          N/A                     N/A                  1000                N/A                 N/A  thrpt    4   2638443.754 ±      2.294    B/op
JsonMaskerBenchmark.jsonMaskerByteArrayStreams                           unicode       false         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4    246302.442 ±  10080.332   ops/s
JsonMaskerBenchmark.jsonMaskerByteArrayStreams:gc.alloc.rate.norm        unicode       false         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4     10816.008 ±      0.001    B/op
JsonMaskerBenchmark.jsonMaskerByteArrayStreams                           unicode        true         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4    295133.247 ±   3081.008   ops/s
JsonMaskerBenchmark.jsonMaskerByteArrayStreams:gc.alloc.rate.norm        unicode        true         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4     11560.007 ±      0.001    B/op
JsonMaskerBenchmark.jsonMaskerBytes                                      unicode       false         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4    426052.074 ±   9204.901   ops/s
JsonMaskerBenchmark.jsonMaskerBytes:gc.alloc.rate.norm                   unicode       false         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4      2240.005 ±      0.001    B/op
JsonMaskerBenchmark.jsonMaskerBytes                                      unicode        true         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4    420062.796 ±  13471.776   ops/s
JsonMaskerBenchmark.jsonMaskerBytes:gc.alloc.rate.norm                   unicode        true         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4      1392.005 ±      0.001    B/op
JsonMaskerBenchmark.jsonMaskerString                                     unicode       false         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4    227751.633 ±   3704.734   ops/s
JsonMaskerBenchmark.jsonMaskerString:gc.alloc.rate.norm                  unicode       false         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4     10144.009 ±      0.001    B/op
JsonMaskerBenchmark.jsonMaskerString                                     unicode        true         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4    212119.103 ±   3419.989   ops/s
JsonMaskerBenchmark.jsonMaskerString:gc.alloc.rate.norm                  unicode        true         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4     10312.009 ±      0.001    B/op
LargeKeySetInstanceCreationBenchmark.jsonMasker                              N/A         N/A         N/A          100                     N/A                  1000                N/A                 N/A  thrpt    4        10.815 ±      0.463   ops/s
LargeKeySetInstanceCreationBenchmark.jsonMasker:gc.alloc.rate.norm           N/A         N/A         N/A          100                     N/A                  1000                N/A                 N/A  thrpt    4  62613616.052 ±    116.600    B/op
StreamTypeBenchmark.jsonMaskerStreams                                        N/A         N/A         1kb          N/A                     N/A                   N/A    ByteArrayStream     ByteArrayStream  thrpt    4    250395.630 ±   3329.874   ops/s
StreamTypeBenchmark.jsonMaskerStreams:gc.alloc.rate.norm                     N/A         N/A         1kb          N/A                     N/A                   N/A    ByteArrayStream     ByteArrayStream  thrpt    4     11560.008 ±      0.007    B/op
StreamTypeBenchmark.jsonMaskerStreams                                        N/A         N/A         1kb          N/A                     N/A                   N/A    ByteArrayStream          FileStream  thrpt    4      4399.552 ±   1819.367   ops/s
StreamTypeBenchmark.jsonMaskerStreams:gc.alloc.rate.norm                     N/A         N/A         1kb          N/A                     N/A                   N/A    ByteArrayStream          FileStream  thrpt    4      8600.484 ±      0.584    B/op
StreamTypeBenchmark.jsonMaskerStreams                                        N/A         N/A         1kb          N/A                     N/A                   N/A         FileStream     ByteArrayStream  thrpt    4     84742.483 ±   1625.590   ops/s
StreamTypeBenchmark.jsonMaskerStreams:gc.alloc.rate.norm                     N/A         N/A         1kb          N/A                     N/A                   N/A         FileStream     ByteArrayStream  thrpt    4     11688.025 ±      0.021    B/op
StreamTypeBenchmark.jsonMaskerStreams                                        N/A         N/A         1kb          N/A                     N/A                   N/A         FileStream          FileStream  thrpt    4      4291.711 ±    616.750   ops/s
StreamTypeBenchmark.jsonMaskerStreams:gc.alloc.rate.norm                     N/A         N/A         1kb          N/A                     N/A                   N/A         FileStream          FileStream  thrpt    4      8728.491 ±      0.367    B/op
ValueMaskerBenchmark.maskWithRawValueFunction                            unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4    570716.027 ±  32104.447   ops/s
ValueMaskerBenchmark.maskWithRawValueFunction:gc.alloc.rate.norm         unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4      1600.003 ±      0.001    B/op
ValueMaskerBenchmark.maskWithStatic                                      unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4    657577.844 ±  63065.677   ops/s
ValueMaskerBenchmark.maskWithStatic:gc.alloc.rate.norm                   unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4      1240.003 ±      0.001    B/op
ValueMaskerBenchmark.maskWithTextValueFunction                           unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4    540648.618 ±   6476.507   ops/s
ValueMaskerBenchmark.maskWithTextValueFunction:gc.alloc.rate.norm        unicode         N/A         1kb          N/A                     0.1                   N/A                N/A                 N/A  thrpt    4      1888.004 ±      0.001    B/op

@gavlyukovskiy gavlyukovskiy force-pushed the radix-trie branch 2 times, most recently from 50895c5 to 6d0b359 Compare July 16, 2024 18:32
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarCloud

Catch issues before they fail your Quality Gate with our IDE extension SonarLint

@gavlyukovskiy gavlyukovskiy force-pushed the radix-trie branch 3 times, most recently from b4cce84 to 0702a2d Compare September 19, 2024 19:57
@gavlyukovskiy gavlyukovskiy marked this pull request as ready for review September 21, 2024 19:36
@gavlyukovskiy
Copy link
Collaborator Author

Was testing on of the benchmarks locally, the key matching actually got faster by quite a bit:

Before:

Benchmark                         (caseSensitive)  (mode)   Mode  Cnt         Score   Error  Units
KeyMatcherBenchmark.matchAllKeys            false    mask  thrpt    2  16196074.837          ops/s
KeyMatcherBenchmark.matchAllKeys            false   allow  thrpt    2  16032336.619          ops/s
KeyMatcherBenchmark.matchAllKeys             true    mask  thrpt    2  13108860.366          ops/s
KeyMatcherBenchmark.matchAllKeys             true   allow  thrpt    2  13075189.300          ops/s

After:

Benchmark                         (caseSensitive)  (mode)   Mode  Cnt         Score   Error  Units
KeyMatcherBenchmark.matchAllKeys            false    mask  thrpt    2  20520848.607          ops/s
KeyMatcherBenchmark.matchAllKeys            false   allow  thrpt    2  20466876.402          ops/s
KeyMatcherBenchmark.matchAllKeys             true    mask  thrpt    2  20085632.774          ops/s
KeyMatcherBenchmark.matchAllKeys             true   allow  thrpt    2  19213484.306          ops/s

so it looks like the reason we don't see much improvement on the existing benchmarks is because we have maskedKeyProbability = 0.1 and we just don't have a lot of things to mask. Will include this benchmark in the PR.

@gavlyukovskiy gavlyukovskiy force-pushed the radix-trie branch 3 times, most recently from 5f4bc37 to ff235e4 Compare September 26, 2024 19:47
Copy link

Copy link
Collaborator

@donavdey donavdey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there is a room to optimize JsonMakser instance creation time if we set out minds to do so, but otherwise the implementation seems pretty nice.
As discussed before, I will do something about the JSONPath stack so that we don't have to have a separate StatefulRadixTrieNode implementation

@Breus Breus self-requested a review October 28, 2024 09:21
@Breus
Copy link
Owner

Breus commented Nov 20, 2024

@gavlyukovskiy ignore the comments and last commit for now, I am applying the comments myself and want to play around with it a bit more adding lower level unit tests and some JavaDoc

@Breus Breus mentioned this pull request Jan 26, 2025
@gavlyukovskiy gavlyukovskiy enabled auto-merge (squash) January 26, 2025 12:44
@Breus Breus self-requested a review January 26, 2025 12:51
@gavlyukovskiy gavlyukovskiy merged commit 7eb9a97 into master Jan 26, 2025
5 checks passed
@gavlyukovskiy gavlyukovskiy deleted the radix-trie branch January 26, 2025 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants