Performance comparison

Performance comparison among different surrogate models, acquisition functions and explore/exploit factors. See summary.

The tuning is done by matching the test_engine vs base_engine. The test_engine will take the param from the optimizer while the base_engine will use the best param. In the beginning the best param is the default param and this is used by the base_engine. After the match (500 games at depth 6 in this case see sample tuning command line), the result is sent to the optimizer. And if the test_engine won (greater than 50%) the best param is updated. In the next trial the base_engine will use the current best param while the test_engine will take new param suggested by the optimizer.

Studies are run with Optuna Game Parameter Tuner v0.19.0

1. Study

engine: stockfish
study match depth control: --depth 6 --base-time-sec 30
games per trial: --games-per-trial 500
trials: --trials 100
pruner: --threshold-pruner result=0.45

Parameters to be optimized

input param: OrderedDict([('FutMargin', {'default': 227, 'min': 50, 'max': 350, 'step': 4}), ('RazorMargin', {'default': 527, 'min': 250, 'max': 650, 'step': 4})])

init param: {'FutMargin': 227, 'RazorMargin': 527}

Study 1, explore

--sampler name=skopt acquisition_function=EI xi=10000

Command line

python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_1 --pgn-output study_1.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=EI xi=10000

Result

study best param: {'FutMargin': 306, 'RazorMargin': 558}
study best value: 0.5095625000000003
study best trial number: 94

Plot

Every dot is a 500-game match of stockfish at depth 6. The objective value is the match result between the param suggested by optimizer and the default param. Result is from the point of view of the optimizer param.

The dark dots (darker means higher trial number) are more spread out compared to study 2 as this study 1 prefers more exploration (high xi value) while that of study 2 prefers exploitation (low xi value).

slice1 hist1

Study 2, exploit

--sampler name=skopt acquisition_function=EI xi=0.0001

Command line

python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_2 --pgn-output study_2.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=EI xi=0.0001

Result

study best param: {'FutMargin': 290, 'RazorMargin': 650}
study best value: 0.5086562500000001
study best trial number: 97

Plot

slice2 hist2

Study 3, explore

--sampler name=skopt acquisition_function=PI xi=10000

Command line

python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_3 --pgn-output study_3.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=PI xi=10000

Result

study best param: {'FutMargin': 326, 'RazorMargin': 334}
study best value: 0.5129687500000003
study best trial number: 98

Plot

slice3 hist3

Study 4, exploit

--sampler name=skopt acquisition_function=PI xi=0.0001

Command line

python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_4 --pgn-output study_4.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=PI xi=0.0001

Result

study best param: {'FutMargin': 350, 'RazorMargin': 250}
study best value: 0.5089062499999999
study best trial number: 95

Plot

slice4 hist4

Study 5, explore

--sampler name=skopt acquisition_function=LCB kappa=10000

Command line

python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_5 --pgn-output study_5.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=LCB kappa=10000

Result

study best param: {'FutMargin': 282, 'RazorMargin': 650}
study best value: 0.506265625
study best trial number: 96

Plot

slice5 hist5

Study 6, exploit

--sampler name=skopt acquisition_function=LCB kappa=0.0001

Command line

python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_6 --pgn-output study_6.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=LCB kappa=0.0001

Result

study best param: {'FutMargin': 178, 'RazorMargin': 594}
study best value: 0.5071718750000004
study best trial number: 97

Plot

slice6 hist6

Study 7

--sampler name=skopt acquisition_function=PI

Command line

python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_7 --pgn-output study_7.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=PI

Result

study best param: {'FutMargin': 350, 'RazorMargin': 258}
study best value: 0.5101249999999999
study best trial number: 89

Plot

slic7 hist7

Study 8

--sampler name=skopt acquisition_function=LCB

Command line

python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_8 --pgn-output study_8.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=LCB

Result

study best param: {'FutMargin': 226, 'RazorMargin': 650}
study best value: 0.5086718750000001
study best trial number: 92

Plot

slice8 hist8

Study 9

--sampler name=skopt acquisition_function=EI

Command line

python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_9 --pgn-output study_9.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=EI

Result

study best param: {'FutMargin': 298, 'RazorMargin': 250}
study best value: 0.5093906250000005
study best trial number: 99

Plot

slice9 hist8

Study 10

--sampler name=skopt

Command line

python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_10 --pgn-output study_10.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt

Result

study best param: {'FutMargin': 306, 'RazorMargin': 250}
study best value: 0.5119843750000003
study best trial number: 95

Plot

slice10 hist10

Study 11

--sampler name=tpe

Command line

python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_11 --pgn-output study_11.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=tpe

Result

study best param: {'FutMargin': 110, 'RazorMargin': 338}
study best value: 0.5089843750000003
study best trial number: 93

Plot

slice11 hist11

Study 12

--sampler name=cmaes

Command line

python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_12 --pgn-output study_12.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=cmaes

Result

study best param: {'FutMargin': 190, 'RazorMargin': 494}
study best value: 0.5138437500000004
study best trial number: 98

Plot

slice12 hist12

Study 13

--sampler name=skopt acquisition_function=PI base_estimator=GBRT

Command line

python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_13 --pgn-output study_13.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=PI base_estimator=GBRT

Result

study best param: {'FutMargin': 326, 'RazorMargin': 442}
study best value: 0.5124062500000004
study best trial number: 99

Plot

slice13 hist13

Study 14

--sampler name=skopt acquisition_function=PI base_estimator=ET

Command line

python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_14 --pgn-output study_14.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=PI base_estimator=ET

Result

study best param: {'FutMargin': 110, 'RazorMargin': 594}
study best value: 0.5097031250000001
study best trial number: 94

Study 15

--sampler name=skopt acquisition_function=PI base_estimator=RF

Command line

python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_15 --pgn-output study_15.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=PI base_estimator=RF

Result

study best param: {'FutMargin': 166, 'RazorMargin': 254}
study best value: 0.5099531250000003
study best trial number: 98

Plot

slice15 hist15

2. Game verification

Engine setup

engine: stockfish
hash: 64
threads: 1
opponent: default param
depth control: 6
games: 10k

Match setup

study 1 best param after 100 trials vs default -> Elo of study 1 best param
study 2 best param after 100 trials vs default -> Elo of study 2 best param
...

Result

Study 1

Score of sf_study_1 vs sf_default: 4037 - 4719 - 1244  [0.466] 10000
...      sf_study_1 playing White: 1865 - 2482 - 653  [0.438] 5000
...      sf_study_1 playing Black: 2172 - 2237 - 591  [0.493] 5000
...      White vs Black: 4102 - 4654 - 1244  [0.472] 10000
Elo difference: -23.7 +/- 6.4, LOS: 0.0 %, DrawRatio: 12.4 %
Finished match

Study 2

Score of sf_study_2 vs sf_default: 4470 - 4219 - 1311  [0.513] 10000
...      sf_study_2 playing White: 2372 - 2025 - 603  [0.535] 5000
...      sf_study_2 playing Black: 2098 - 2194 - 708  [0.490] 5000
...      White vs Black: 4566 - 4123 - 1311  [0.522] 10000
Elo difference: 8.7 +/- 6.3, LOS: 99.6 %, DrawRatio: 13.1 %
Finished match

Study 3

Score of sf_study_3 vs sf_default: 4368 - 4126 - 1506  [0.512] 10000
...      sf_study_3 playing White: 2175 - 1845 - 980  [0.533] 5000
...      sf_study_3 playing Black: 2193 - 2281 - 526  [0.491] 5000
...      White vs Black: 4456 - 4038 - 1506  [0.521] 10000
Elo difference: 8.4 +/- 6.3, LOS: 99.6 %, DrawRatio: 15.1 %
Finished match

Study 4

Score of sf_study_4 vs sf_default: 4479 - 4165 - 1356  [0.516] 10000
...      sf_study_4 playing White: 2459 - 1956 - 585  [0.550] 5000
...      sf_study_4 playing Black: 2020 - 2209 - 771  [0.481] 5000
...      White vs Black: 4668 - 3976 - 1356  [0.535] 10000
Elo difference: 10.9 +/- 6.3, LOS: 100.0 %, DrawRatio: 13.6 %
Finished match

Study 5

Score of sf_study_5 vs sf_default: 4216 - 4270 - 1514  [0.497] 10000
...      sf_study_5 playing White: 2232 - 2042 - 726  [0.519] 5000
...      sf_study_5 playing Black: 1984 - 2228 - 788  [0.476] 5000
...      White vs Black: 4460 - 4026 - 1514  [0.522] 10000
Elo difference: -1.9 +/- 6.3, LOS: 27.9 %, DrawRatio: 15.1 %
Finished match

Study 6

Score of sf_study_6 vs sf_default: 4082 - 4587 - 1331  [0.475] 10000
...      sf_study_6 playing White: 2116 - 2090 - 794  [0.503] 5000
...      sf_study_6 playing Black: 1966 - 2497 - 537  [0.447] 5000
...      White vs Black: 4613 - 4056 - 1331  [0.528] 10000
Elo difference: -17.6 +/- 6.3, LOS: 0.0 %, DrawRatio: 13.3 %
Finished match

Study 7

Score of sf_study_7 vs sf_default: 4558 - 3950 - 1492  [0.530] 10000
...      sf_study_7 playing White: 2443 - 1825 - 732  [0.562] 5000
...      sf_study_7 playing Black: 2115 - 2125 - 760  [0.499] 5000
...      White vs Black: 4568 - 3940 - 1492  [0.531] 10000
Elo difference: 21.2 +/- 6.3, LOS: 100.0 %, DrawRatio: 14.9 %
Finished match

Study 8

Score of sf_study_8 vs sf_default: 4516 - 4177 - 1307  [0.517] 10000
...      sf_study_8 playing White: 2177 - 2172 - 651  [0.500] 5000
...      sf_study_8 playing Black: 2339 - 2005 - 656  [0.533] 5000
...      White vs Black: 4182 - 4511 - 1307  [0.484] 10000
Elo difference: 11.8 +/- 6.3, LOS: 100.0 %, DrawRatio: 13.1 %
Finished match

Study 9

Score of sf_study_9 vs sf_default: 4411 - 4125 - 1464  [0.514] 10000
...      sf_study_9 playing White: 2366 - 1892 - 742  [0.547] 5000
...      sf_study_9 playing Black: 2045 - 2233 - 722  [0.481] 5000
...      White vs Black: 4599 - 3937 - 1464  [0.533] 10000
Elo difference: 9.9 +/- 6.3, LOS: 99.9 %, DrawRatio: 14.6 %
Finished match

Study 10

Score of sf_study_10 vs sf_default: 4007 - 4385 - 1608  [0.481] 10000
...      sf_study_10 playing White: 2048 - 2130 - 822  [0.492] 5000
...      sf_study_10 playing Black: 1959 - 2255 - 786  [0.470] 5000
...      White vs Black: 4303 - 4089 - 1608  [0.511] 10000
Elo difference: -13.1 +/- 6.2, LOS: 0.0 %, DrawRatio: 16.1 %
Finished match

Study 11

Score of sf_study_11 vs sf_default: 3998 - 4206 - 1796  [0.490] 10000
...      sf_study_11 playing White: 2006 - 2134 - 860  [0.487] 5000
...      sf_study_11 playing Black: 1992 - 2072 - 936  [0.492] 5000
...      White vs Black: 4078 - 4126 - 1796  [0.498] 10000
Elo difference: -7.2 +/- 6.2, LOS: 1.1 %, DrawRatio: 18.0 %
Finished match

Study 12

Score of sf_study_12 vs sf_default: 4393 - 4155 - 1452  [0.512] 10000
...      sf_study_12 playing White: 2440 - 2037 - 523  [0.540] 5000
...      sf_study_12 playing Black: 1953 - 2118 - 929  [0.483] 5000
...      White vs Black: 4558 - 3990 - 1452  [0.528] 10000
Elo difference: 8.3 +/- 6.3, LOS: 99.5 %, DrawRatio: 14.5 %
Finished match

Study 13

Score of sf_study_13 vs sf_default: 4202 - 4324 - 1474  [0.494] 10000
...      sf_study_13 playing White: 2131 - 2068 - 801  [0.506] 5000
...      sf_study_13 playing Black: 2071 - 2256 - 673  [0.481] 5000
...      White vs Black: 4387 - 4139 - 1474  [0.512] 10000
Elo difference: -4.2 +/- 6.3, LOS: 9.3 %, DrawRatio: 14.7 %
Finished match

Study 14

Score of sf_study_14 vs sf_default: 4245 - 4322 - 1433  [0.496] 10000
...      sf_study_14 playing White: 2059 - 2139 - 802  [0.492] 5000
...      sf_study_14 playing Black: 2186 - 2183 - 631  [0.500] 5000
...      White vs Black: 4242 - 4325 - 1433  [0.496] 10000
Elo difference: -2.7 +/- 6.3, LOS: 20.3 %, DrawRatio: 14.3 %
Finished match

Study 15

Score of sf_study_15 vs sf_default: 4421 - 4380 - 1199  [0.502] 10000
...      sf_study_15 playing White: 2425 - 2002 - 573  [0.542] 5000
...      sf_study_15 playing Black: 1996 - 2378 - 626  [0.462] 5000
...      White vs Black: 4803 - 3998 - 1199  [0.540] 10000
Elo difference: 1.4 +/- 6.4, LOS: 66.9 %, DrawRatio: 12.0 %
Finished match

3. Summary

Studies

The studies or tuning are all done at depth 6 engine vs engine matches of 500 games per trial to get the objective value. Each study is consists of 100 trials.

Game validation

The best param in each study is matched against the default param. The Elo of default param is set to 0 as reference.
Elo1 is from a match at depth 6 on 10k games.

Time control inference

Like in game validation, the best param in each study is also matched against the default param but this match is conducted with a tc (time control).
Elo2 is from a match at tc 1s+50ms on 1k games. Average depth is around 13.

Table 1

study	sampler	model	acq_func	explore/exploit	Elo1	Elo2
1	skopt	GP	EI	xi=10000, explore	-23.7 +/- 6.4	2.1 +/- 15.8
2	skopt	GP	EI	xi=0.0001, exploit	8.7 +/- 6.3	-8.0 +/- 15.9
3	skopt	GP	PI	xi=10000, explore	8.4 +/- 6.3	-10.8 +/- 16.1
4	skopt	GP	PI	xi=0.0001, exploit	10.9 +/- 6.3	-9.4 +/- 16.1
5	skopt	GP	LCB	kappa=10000, explore	-1.9	8.7 +/- 15.9
6	skopt	GP	LCB	kappa=0.0001, exploit	-17.6	9.0 +/- 15.8
7	skopt	GP	PI	xi=0.01, default	+21.2	-12.9 +/- 16.2
8	skopt	GP	LCB	kappa=1.96, default	+11.8	6.9 +/- 15.9
9	skopt	GP	EI	xi=0.01, default	+9.9	-5.9 +/- 15.9
10	skopt	GP	gp_hedge	-	-13.1	11.8 +/- 15.9
11	optuna	TPE	EI	-	-7.2	25.4 +/- 16.2
12	optuna	CmaEs	EI	-	+8.3	4.2 +/- 15.8
13	skopt	GBRT	PI	xi=0.01, loss=quantile	-4.2	-4.9 +/- 16.2
14	skopt	ET	PI	xi=0.01, default	-2.7	34.2 +/- 16.5
15	skopt	RF	PI	xi=0.01, default	+1.4	9.0 +/- 16.3

References

Optuna
- Samplers
- Threshold pruner
scikit-optimize or skopt
- Optimizer
- GP
- GBRT
- ET
- RF
- Exploration and Exploitation