Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve performance of unique and unique_by #3254

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

itchyny
Copy link
Contributor

@itchyny itchyny commented Feb 15, 2025

Previously, unique and unique_by filters are implemented using
group_by and then map(.[0]). This commit re-implements in C, to
avoid unnecessary boxing of grouping, and improves the performance.

Previously, `unique` and `unique_by` filters are implemented using
`group_by` and then `map(.[0])`. This commit re-implements in C, to
avoid unnecessary boxing of grouping, and improves the performance.
Copy link
Member

@wader wader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! any ballpark idea how much faster and less memory usage?

@itchyny
Copy link
Contributor Author

itchyny commented Feb 15, 2025

Good question, and I understand that it's important. However, I've measured it several times and it doesn't seem to make much difference. The difference between the measurements of the same command is larger than the difference before and after the improvement.

@itchyny
Copy link
Contributor Author

itchyny commented Feb 15, 2025

Oh, I had some mistakes while profiling due to non-static executables. Performance is actually better. unique_by is now 60% faster and unique is 3 times faster.

 $ jq -nc '[range(100000)]' > /tmp/x.json                                                                 
 $ hyperfine --warmup 10 './jq-old "unique_by(.)[0]" /tmp/x.json' './jq-new "unique_by(.)[0]" /tmp/x.json'
Benchmark 1: ./jq-old "unique_by(.)[0]" /tmp/x.json
  Time (mean ± σ):     116.1 ms ±   1.1 ms    [User: 105.5 ms, System: 9.4 ms]
  Range (min … max):   114.8 ms … 119.6 ms    25 runs

Benchmark 2: ./jq-new "unique_by(.)[0]" /tmp/x.json
  Time (mean ± σ):      73.0 ms ±   0.5 ms    [User: 67.1 ms, System: 4.9 ms]
  Range (min … max):    72.4 ms …  74.3 ms    38 runs

Summary
  ./jq-new "unique_by(.)[0]" /tmp/x.json ran
    1.59 ± 0.02 times faster than ./jq-old "unique_by(.)[0]" /tmp/x.json

 $ hyperfine --warmup 10 './jq-old "unique[0]" /tmp/x.json' './jq-new "unique[0]" /tmp/x.json'
Benchmark 1: ./jq-old "unique[0]" /tmp/x.json
  Time (mean ± σ):     115.9 ms ±   1.9 ms    [User: 103.7 ms, System: 10.2 ms]
  Range (min … max):   113.1 ms … 120.7 ms    25 runs

Benchmark 2: ./jq-new "unique[0]" /tmp/x.json
  Time (mean ± σ):      37.2 ms ±   4.2 ms    [User: 31.5 ms, System: 4.0 ms]
  Range (min … max):    35.3 ms …  69.5 ms    75 runs

Summary
  ./jq-new "unique[0]" /tmp/x.json ran
    3.12 ± 0.36 times faster than ./jq-old "unique[0]" /tmp/x.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants