Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize(Displayserver): performance up #636

Closed

Conversation

apocelipes
Copy link
Contributor

Replace stdio with the posix api, and drop fscanf.

It shows a huge performance up in my local benchmarks.

About 35~40% faster per modefile than the old one.

Here is the highlights of benchmarks:

Running ./bench_linux_amd64
Run on (8 X 1800.01 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 256 KiB (x4)
  L3 Unified 6144 KiB (x1)
Load Average: 0.12, 0.19, 0.16
-------------------------------------------------------
Benchmark             Time             CPU   Iterations
-------------------------------------------------------
BenchStdIO         1712 ns         1712 ns       341076
BenchPOSIXIO        998 ns          998 ns       655856

updates #634

Replace stdio with the posix api, and drop fscanf.

It shows a huge performance up in my local benchmarks.

About 35~40% faster per modefile than the old one.

Here is the highlights of benchmarks:

```text
Running ./bench_linux_amd64
Run on (8 X 1800.01 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 256 KiB (x4)
  L3 Unified 6144 KiB (x1)
Load Average: 0.12, 0.19, 0.16
-------------------------------------------------------
Benchmark             Time             CPU   Iterations
-------------------------------------------------------
BenchStdIO         1712 ns         1712 ns       341076
BenchPOSIXIO        998 ns          998 ns       655856
```

updates #634
@CarterLi
Copy link
Member

Do you really think this PR can address any performance issues?

You are right that parsing string manually is always faster than fscanf, but why fscanf exists?

Come on! 1ms = 1000000ns. Several nanoseconds are not bottleneck, because it's executed only once. In addition, as for #634, this code is NOT executed unless --ds-force-drm is used.

Since @flipflop133 was using X11, the real bottleneck was the xrandr stuff. If you really want to improve fastfetch, please check that.

@abbycin
Copy link

abbycin commented Nov 28, 2023

这个pr真是闲的蛋疼

@CarterLi
Copy link
Member

Don't get me wrong, I like performance improvements, and I'm accepting changes to improve performance at the cost of code complexity, but ONLY in common code.

Closing.

@CarterLi CarterLi closed this Nov 28, 2023
@apocelipes
Copy link
Contributor Author

Do you really think this PR can address any performance issues?

You are right that parsing string manually is always faster than fscanf, but why fscanf exists?

Come on! 1ms = 1000000ns. Several nanoseconds are not bottleneck, because it's executed only once. In addition, as for #634, this code is NOT executed unless --ds-force-drm is used.

Since @flipflop133 was using X11, the real bottleneck was the xrandr stuff. If you really want to improve fastfetch, please check that.

Why we use fscanf? Because of it's a much simpler way to handle a COMPLEX parsing. The tradeoff is performance.

Since things like "1024x768" is simple and intuitive, there is no readability advantage to using fscanf, and it even costs twice as much in performance.

@CarterLi
Copy link
Member

5x more lines of code and one more magic number are complexity.

it even costs twice as much in performance

1712ns -> 998ns can be called twice.

500 * 1000 * 1000ns + 1712ns -> 500 * 1000 * 1000ns + 998ns can't.

@apocelipes apocelipes deleted the feat-optimize-displayserver branch November 30, 2023 04:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants