Skip to content

Performance comparison

fsmosca edited this page Oct 4, 2020 · 43 revisions

Performance comparison

Performance comparison among different surrogate models, acquisition functions and explore/exploit factors. See summary.

The tuning is done by matching the test_engine vs base_engine. The test_engine will take the param from the optimizer while the base_engine will use the best param. In the beginning the best param is the default param and this is used by the base_engine. After the match (500 games at depth 6 in this case see sample tuning command line), the result is sent to the optimizer. And if the test_engine won (greater than 50%) the best param is updated. In the next trial the base_engine will use the current best param while the test_engine will take new param suggested by the optimizer.

Studies are run with Optuna Game Parameter Tuner v0.19.0

1. Study

engine: stockfish
study match depth control: --depth 6 --base-time-sec 30
games per trial: --games-per-trial 500
trials: --trials 100
pruner: --threshold-pruner result=0.45

Parameters to be optimized

input param: OrderedDict([('FutMargin', {'default': 227, 'min': 50, 'max': 350, 'step': 4}), ('RazorMargin', {'default': 527, 'min': 250, 'max': 650, 'step': 4})])

init param: {'FutMargin': 227, 'RazorMargin': 527}

Study 1, explore

--sampler name=skopt acquisition_function=EI xi=10000

Command line
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_1 --pgn-output study_1.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=EI xi=10000
Result
study best param: {'FutMargin': 306, 'RazorMargin': 558}
study best value: 0.5095625000000003
study best trial number: 94
Plot

Every dot is a 500-game match of stockfish at depth 6. The objective value is the match result between the param suggested by optimizer and the default param. Result is from the point of view of the optimizer param.

The dark dots (darker means higher trial number) are more spread out compared to study 2 as this study 1 prefers more exploration (high xi value) while that of study 2 prefers exploitation (low xi value).

slice1 hist1

Study 2, exploit

--sampler name=skopt acquisition_function=EI xi=0.0001

Command line
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_2 --pgn-output study_2.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=EI xi=0.0001
Result
study best param: {'FutMargin': 290, 'RazorMargin': 650}
study best value: 0.5086562500000001
study best trial number: 97
Plot

slice2 hist2

Study 3, explore

--sampler name=skopt acquisition_function=PI xi=10000

Command line
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_3 --pgn-output study_3.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=PI xi=10000
Result
study best param: {'FutMargin': 326, 'RazorMargin': 334}
study best value: 0.5129687500000003
study best trial number: 98
Plot

slice3 hist3

Study 4, exploit

--sampler name=skopt acquisition_function=PI xi=0.0001

Command line
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_4 --pgn-output study_4.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=PI xi=0.0001
Result
study best param: {'FutMargin': 350, 'RazorMargin': 250}
study best value: 0.5089062499999999
study best trial number: 95
Plot

slice4 hist4

Study 5, explore

--sampler name=skopt acquisition_function=LCB kappa=10000

Command line
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_5 --pgn-output study_5.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=LCB kappa=10000
Result
study best param: {'FutMargin': 282, 'RazorMargin': 650}
study best value: 0.506265625
study best trial number: 96
Plot

slice5 hist5

Study 6, exploit

--sampler name=skopt acquisition_function=LCB kappa=0.0001

Command line
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_6 --pgn-output study_6.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=LCB kappa=0.0001
Result
study best param: {'FutMargin': 178, 'RazorMargin': 594}
study best value: 0.5071718750000004
study best trial number: 97
Plot

slice6 hist6

Study 7

--sampler name=skopt acquisition_function=PI

Command line
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_7 --pgn-output study_7.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=PI
Result
study best param: {'FutMargin': 350, 'RazorMargin': 258}
study best value: 0.5101249999999999
study best trial number: 89
Plot

slic7 hist7

Study 8

--sampler name=skopt acquisition_function=LCB

Command line
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_8 --pgn-output study_8.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=LCB
Result
study best param: {'FutMargin': 226, 'RazorMargin': 650}
study best value: 0.5086718750000001
study best trial number: 92
Plot

slice8 hist8

Study 9

--sampler name=skopt acquisition_function=EI

Command line
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_9 --pgn-output study_9.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=EI
Result
study best param: {'FutMargin': 298, 'RazorMargin': 250}
study best value: 0.5093906250000005
study best trial number: 99
Plot

slice9 hist8

Study 10

--sampler name=skopt

Command line
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_10 --pgn-output study_10.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt
Result
study best param: {'FutMargin': 306, 'RazorMargin': 250}
study best value: 0.5119843750000003
study best trial number: 95
Plot

slice10 hist10

Study 11

--sampler name=tpe

Command line
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_11 --pgn-output study_11.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=tpe
Result
study best param: {'FutMargin': 110, 'RazorMargin': 338}
study best value: 0.5089843750000003
study best trial number: 93
Plot

slice11 hist11

Study 12

--sampler name=cmaes

Command line
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_12 --pgn-output study_12.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=cmaes
Result
study best param: {'FutMargin': 190, 'RazorMargin': 494}
study best value: 0.5138437500000004
study best trial number: 98
Plot

slice12 hist12

Study 13

--sampler name=skopt acquisition_function=PI base_estimator=GBRT

Command line
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_13 --pgn-output study_13.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=PI base_estimator=GBRT
Result
study best param: {'FutMargin': 326, 'RazorMargin': 442}
study best value: 0.5124062500000004
study best trial number: 99
Plot

slice13 hist13

Study 14

--sampler name=skopt acquisition_function=PI base_estimator=ET

Command line
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_14 --pgn-output study_14.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=PI base_estimator=ET
Result
study best param: {'FutMargin': 110, 'RazorMargin': 594}
study best value: 0.5097031250000001
study best trial number: 94

Study 15

--sampler name=skopt acquisition_function=PI base_estimator=RF

Command line
python tuner.py --engine ./engines/stockfish-modern/stockfish.exe --hash 64 --concurrency 6 --opening-file ./start_opening/ogpt_chess_startpos.epd --input-param "{'RazorMargin': {'default':527, 'min':250, 'max':650, 'step':4}, 'FutMargin': {'default':227, 'min':50, 'max':350, 'step':4}}" --plot --base-time-sec 120 --depth 6 --study-name study_15 --pgn-output study_15.pgn --trials 100 --games-per-trial 500 --threshold-pruner result=0.45 --sampler name=skopt acquisition_function=PI base_estimator=RF
Result
study best param: {'FutMargin': 166, 'RazorMargin': 254}
study best value: 0.5099531250000003
study best trial number: 98
Plot

slice15 hist15

2. Game verification

Engine setup

engine: stockfish
hash: 64
threads: 1
opponent: default param
depth control: 6
games: 10k
Match setup
study 1 best param after 100 trials vs default -> Elo of study 1 best param
study 2 best param after 100 trials vs default -> Elo of study 2 best param
...

Result

Study 1
Score of sf_study_1 vs sf_default: 4037 - 4719 - 1244  [0.466] 10000
...      sf_study_1 playing White: 1865 - 2482 - 653  [0.438] 5000
...      sf_study_1 playing Black: 2172 - 2237 - 591  [0.493] 5000
...      White vs Black: 4102 - 4654 - 1244  [0.472] 10000
Elo difference: -23.7 +/- 6.4, LOS: 0.0 %, DrawRatio: 12.4 %
Finished match
Study 2
Score of sf_study_2 vs sf_default: 4470 - 4219 - 1311  [0.513] 10000
...      sf_study_2 playing White: 2372 - 2025 - 603  [0.535] 5000
...      sf_study_2 playing Black: 2098 - 2194 - 708  [0.490] 5000
...      White vs Black: 4566 - 4123 - 1311  [0.522] 10000
Elo difference: 8.7 +/- 6.3, LOS: 99.6 %, DrawRatio: 13.1 %
Finished match
Study 3
Score of sf_study_3 vs sf_default: 4368 - 4126 - 1506  [0.512] 10000
...      sf_study_3 playing White: 2175 - 1845 - 980  [0.533] 5000
...      sf_study_3 playing Black: 2193 - 2281 - 526  [0.491] 5000
...      White vs Black: 4456 - 4038 - 1506  [0.521] 10000
Elo difference: 8.4 +/- 6.3, LOS: 99.6 %, DrawRatio: 15.1 %
Finished match
Study 4
Score of sf_study_4 vs sf_default: 4479 - 4165 - 1356  [0.516] 10000
...      sf_study_4 playing White: 2459 - 1956 - 585  [0.550] 5000
...      sf_study_4 playing Black: 2020 - 2209 - 771  [0.481] 5000
...      White vs Black: 4668 - 3976 - 1356  [0.535] 10000
Elo difference: 10.9 +/- 6.3, LOS: 100.0 %, DrawRatio: 13.6 %
Finished match
Study 5
Score of sf_study_5 vs sf_default: 4216 - 4270 - 1514  [0.497] 10000
...      sf_study_5 playing White: 2232 - 2042 - 726  [0.519] 5000
...      sf_study_5 playing Black: 1984 - 2228 - 788  [0.476] 5000
...      White vs Black: 4460 - 4026 - 1514  [0.522] 10000
Elo difference: -1.9 +/- 6.3, LOS: 27.9 %, DrawRatio: 15.1 %
Finished match
Study 6
Score of sf_study_6 vs sf_default: 4082 - 4587 - 1331  [0.475] 10000
...      sf_study_6 playing White: 2116 - 2090 - 794  [0.503] 5000
...      sf_study_6 playing Black: 1966 - 2497 - 537  [0.447] 5000
...      White vs Black: 4613 - 4056 - 1331  [0.528] 10000
Elo difference: -17.6 +/- 6.3, LOS: 0.0 %, DrawRatio: 13.3 %
Finished match
Study 7
Score of sf_study_7 vs sf_default: 4558 - 3950 - 1492  [0.530] 10000
...      sf_study_7 playing White: 2443 - 1825 - 732  [0.562] 5000
...      sf_study_7 playing Black: 2115 - 2125 - 760  [0.499] 5000
...      White vs Black: 4568 - 3940 - 1492  [0.531] 10000
Elo difference: 21.2 +/- 6.3, LOS: 100.0 %, DrawRatio: 14.9 %
Finished match
Study 8
Score of sf_study_8 vs sf_default: 4516 - 4177 - 1307  [0.517] 10000
...      sf_study_8 playing White: 2177 - 2172 - 651  [0.500] 5000
...      sf_study_8 playing Black: 2339 - 2005 - 656  [0.533] 5000
...      White vs Black: 4182 - 4511 - 1307  [0.484] 10000
Elo difference: 11.8 +/- 6.3, LOS: 100.0 %, DrawRatio: 13.1 %
Finished match
Study 9
Score of sf_study_9 vs sf_default: 4411 - 4125 - 1464  [0.514] 10000
...      sf_study_9 playing White: 2366 - 1892 - 742  [0.547] 5000
...      sf_study_9 playing Black: 2045 - 2233 - 722  [0.481] 5000
...      White vs Black: 4599 - 3937 - 1464  [0.533] 10000
Elo difference: 9.9 +/- 6.3, LOS: 99.9 %, DrawRatio: 14.6 %
Finished match
Study 10
Score of sf_study_10 vs sf_default: 4007 - 4385 - 1608  [0.481] 10000
...      sf_study_10 playing White: 2048 - 2130 - 822  [0.492] 5000
...      sf_study_10 playing Black: 1959 - 2255 - 786  [0.470] 5000
...      White vs Black: 4303 - 4089 - 1608  [0.511] 10000
Elo difference: -13.1 +/- 6.2, LOS: 0.0 %, DrawRatio: 16.1 %
Finished match
Study 11
Score of sf_study_11 vs sf_default: 3998 - 4206 - 1796  [0.490] 10000
...      sf_study_11 playing White: 2006 - 2134 - 860  [0.487] 5000
...      sf_study_11 playing Black: 1992 - 2072 - 936  [0.492] 5000
...      White vs Black: 4078 - 4126 - 1796  [0.498] 10000
Elo difference: -7.2 +/- 6.2, LOS: 1.1 %, DrawRatio: 18.0 %
Finished match
Study 12
Score of sf_study_12 vs sf_default: 4393 - 4155 - 1452  [0.512] 10000
...      sf_study_12 playing White: 2440 - 2037 - 523  [0.540] 5000
...      sf_study_12 playing Black: 1953 - 2118 - 929  [0.483] 5000
...      White vs Black: 4558 - 3990 - 1452  [0.528] 10000
Elo difference: 8.3 +/- 6.3, LOS: 99.5 %, DrawRatio: 14.5 %
Finished match
Study 13
Score of sf_study_13 vs sf_default: 4202 - 4324 - 1474  [0.494] 10000
...      sf_study_13 playing White: 2131 - 2068 - 801  [0.506] 5000
...      sf_study_13 playing Black: 2071 - 2256 - 673  [0.481] 5000
...      White vs Black: 4387 - 4139 - 1474  [0.512] 10000
Elo difference: -4.2 +/- 6.3, LOS: 9.3 %, DrawRatio: 14.7 %
Finished match
Study 14
Score of sf_study_14 vs sf_default: 4245 - 4322 - 1433  [0.496] 10000
...      sf_study_14 playing White: 2059 - 2139 - 802  [0.492] 5000
...      sf_study_14 playing Black: 2186 - 2183 - 631  [0.500] 5000
...      White vs Black: 4242 - 4325 - 1433  [0.496] 10000
Elo difference: -2.7 +/- 6.3, LOS: 20.3 %, DrawRatio: 14.3 %
Finished match
Study 15
Score of sf_study_15 vs sf_default: 4421 - 4380 - 1199  [0.502] 10000
...      sf_study_15 playing White: 2425 - 2002 - 573  [0.542] 5000
...      sf_study_15 playing Black: 1996 - 2378 - 626  [0.462] 5000
...      White vs Black: 4803 - 3998 - 1199  [0.540] 10000
Elo difference: 1.4 +/- 6.4, LOS: 66.9 %, DrawRatio: 12.0 %
Finished match

3. Summary

Studies

The studies or tuning are all done at depth 6 engine vs engine matches of 500 games per trial to get the objective value. Each study is consists of 100 trials.

Game validation

The best param in each study is matched against the default param. The Elo of default param is set to 0 as reference.
Elo1 is from a match at depth 6 on 10k games.

Time control inference

Like in game validation, the best param in each study is also matched against the default param but this match is conducted with a tc (time control).
Elo2 is from a match at tc 1s+50ms on 1k games. Average depth is around 13.

Table 1

study sampler model acq_func explore/exploit Elo1 Elo2
1 skopt GP EI xi=10000, explore -23.7 +/- 6.4 2.1 +/- 15.8
2 skopt GP EI xi=0.0001, exploit 8.7 +/- 6.3 -8.0 +/- 15.9
3 skopt GP PI xi=10000, explore 8.4 +/- 6.3 -10.8 +/- 16.1
4 skopt GP PI xi=0.0001, exploit 10.9 +/- 6.3 -9.4 +/- 16.1
5 skopt GP LCB kappa=10000, explore -1.9 8.7 +/- 15.9
6 skopt GP LCB kappa=0.0001, exploit -17.6 9.0 +/- 15.8
7 skopt GP PI xi=0.01, default +21.2 -12.9 +/- 16.2
8 skopt GP LCB kappa=1.96, default +11.8 6.9 +/- 15.9
9 skopt GP EI xi=0.01, default +9.9 -5.9 +/- 15.9
10 skopt GP gp_hedge - -13.1 11.8 +/- 15.9
11 optuna TPE EI - -7.2 25.4 +/- 16.2
12 optuna CmaEs EI - +8.3 4.2 +/- 15.8
13 skopt GBRT PI xi=0.01, loss=quantile -4.2 -4.9 +/- 16.2
14 skopt ET PI xi=0.01, default -2.7 34.2 +/- 16.5
15 skopt RF PI xi=0.01, default +1.4 9.0 +/- 16.3

References