Simple numerical program with integers, fixed sized arrays, nested loops.
(Note: It computes some first keys for a Poker hand evaluator)
Implemented in:
# hardware overview
# OS: Ubuntu 22.04.3 LTS x86_64
# Host: NUC10i7FNH M38010-308
# Kernel: 6.2.0-39-generic
# Shell: zsh 5.8.1
# DE: GNOME 42.9
# Terminal: gnome-terminal
# CPU: Intel i7-10710U (12) @ 4.700GHz
# GPU: Intel Comet Lake UHD Graphics
# Memory: 31800MiB
┌───────────┬──────────┬─────────────────────────┬─────────┬───────────────────────────────┐
│ algo ┆ compiler ┆ opt_level ┆ runtime ┆ best vs. naive & not parallel │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ f64 ┆ f64 │
╞═══════════╪══════════╪═════════════════════════╪═════════╪═══════════════════════════════╡
│ optimized ┆ rust ┆ release parallel safe b ┆ 0.08 ┆ 0.05 │
│ optimized ┆ rust ┆ release parallel safe a ┆ 0.1 ┆ 0.06 │
│ optimized ┆ rust ┆ release safe ┆ 0.19 ┆ 0.12 │
│ naive ┆ rust ┆ release parallel unsafe ┆ 0.48 ┆ 0.31 │
│ naive ┆ gcc ┆ -O3 ┆ 1.54 ┆ 1.0 │
│ naive ┆ rust ┆ release v2 unsafe ┆ 1.56 ┆ 1.01 │
│ naive ┆ rust ┆ release v3 safe ┆ 2.2 ┆ 1.43 │
│ naive ┆ rust ┆ release v3b safe ┆ 2.22 ┆ 1.44 │
│ naive ┆ clang ┆ -O3 ┆ 2.27 ┆ 1.47 │
│ naive ┆ clang ┆ -O1 ┆ 2.3 ┆ 1.49 │
│ naive ┆ go ┆ ┆ 2.34 ┆ 1.52 │
│ naive ┆ rust ┆ release v4 safe ┆ 2.62 ┆ 1.7 │
│ naive ┆ gcc ┆ -O2 ┆ 2.9 ┆ 1.88 │
│ naive ┆ rust ┆ release v5 safe ┆ 2.89 ┆ 1.88 │
│ naive ┆ gcc ┆ -O1 ┆ 2.95 ┆ 1.92 │
│ naive ┆ clang ┆ -O2 ┆ 4.35 ┆ 2.82 │
│ naive ┆ gcc ┆ ┆ 10.1 ┆ 6.56 │
│ naive ┆ clang ┆ ┆ 11.53 ┆ 7.49 │
│ naive ┆ rust ┆ debug v1 ┆ 16.61 ┆ 10.79 │
└───────────┴──────────┴─────────────────────────┴─────────┴───────────────────────────────┘
2 parallel versions - see scripts.
┌───────────┬──────────┬─────────────────────────┬─────────┐
│ algo ┆ compiler ┆ desc ┆ runtime │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ f64 │
╞═══════════╪══════════╪═════════════════════════╪═════════╡
│ naive ┆ rust ┆ safe 5-card ┆ 0.26 │
│ optimized ┆ rust ┆ safe 5-card parallel a ┆ 0.08 │
│ optimized ┆ rust ┆ safe 5-card parallel b ┆ 0.05 │
│ optimized ┆ rust ┆ safe 7-card parallel a ┆ 10.6 │
│ optimized ┆ rust ┆ safe 7-card parallel b ┆ 7.3 │
└───────────┴──────────┴─────────────────────────┴─────────┘
Cf. forum question https://users.rust-lang.org/t/rust-vs-c-vs-go-runtime-speed-comparison/104107
Conversation with the community brought the rust runtime from x10 to 1x the best C runtime ! - in 6 hours.
Then the next day to x0.12 ! And the next: a parallel version ! Total: a x200 speedup and several clever recipes in rust _v2_
algos.
I'm seriously impressed 👏
Special mention to steffhan for the optimized algo and efficient parallel version.