Thanks to @toanth for helping me understand this tuner's behavior, for the minor fixes required to get it up and running and for helping me debugging my own code.
Regarding the tuner's behavior, in their words:
From my understanding of Gedas' tuner, it assumes that your evaluation is a linear sum of features: Each feature (such as having a knight) can occur several times (or not at all) in a given position, and the tuner updates weights (such as the value of a knight) based on how often a position with this feature led to a win.
The eval gets executed when you load the fens. After that, the tuner only uses the stored weights and doesn't execute the eval again.
One thing that kept tripping me up is that in an actual engine, you want to compute the score relative to the side to move, but in the tuner, you compute it relative to white.
Links to some training datasets:
-
'Stash data' transformed into a format comptatible with this tuner by cj5716
Instructions/examples about how to build and run the tuner are in run.sh. In short:
-
Make sure gcc v10 or higher is installed.
-
Create a
sources.csv
file undersrc/
with the following format:# Path, WDL from side playing, position limit #/home/edu/dev/texel-tuner/datasets/example.book,0,0
-
Create a
build/
directory undersrc/
-
Inside of
src/build/
, run:cmake ..
or
cmake -DCMAKE_BUILD_TYPE=Debug ..
-
Inside of
src/build/
, run:cmake --build .
-
Run the executable passing the path of
sources.csv
to it:./tuner ../sources.csv
Is the ported evaluation code correct?
is a tricky question to answer.
print_eval
variable has been added to config.h to answer it.
When it's enabled, no tuning happens, but positions are printed next to their eval in a format that matches Lynx's custom staticeval
UCI command output.
This can be used with subset of a tuning dataset (i.e. 1000 entries) and the output redirected to a file, in order to be able to compare it with your engine's actual evaluation and detect any bugs that were written while porting your evaluation code to the tuner.
i.e.
-
Tuner:
cmake --build . && ./tuner ../sources.csv > quiet-labeled-tuner-evals.epd
-
Engine:
dotnet run -c Release "staticeval quiet-labeled-subset-1_1000.epd" > quiet-labeled-evals.epd
-
Compare two files using your preferred diff tool and see if there's any difference in the evaluations.
This project is based on the linear evaluation ideas described in https://github.com/AndyGrant/Ethereal/blob/master/Tuning.pdf. The code is loosely based on an implementation in Weiss, which can be found at https://github.com/TerjeKir/weiss.
The project internally uses https://github.com/Disservin/chess-library
To add your own evaluation it is required to transform your evaluation into a linear system.
For each position in the training dataset, the evaluation should count the occurances of each evaluation term, and return a coefficients_t
object where each entry is the count of times an evaluation term has been used per-side.
For a new engine it's required to implement an evaluation class with 4 functions and 2 constexpr variables. More on them at Evaluation class
class YourEval
{
public:
constexpr static bool includes_additional_score = true;
constexpr static bool supports_external_chess_eval = true;
static parameters_t get_initial_parameters();
static EvalResult get_fen_eval_result(const std::string& fen);
static EvalResult get_external_eval_result(const Chess::Board& board);
static void print_parameters(const parameters_t& parameters);
};
Edit config.h
to point TuneEval
to your evaluation class. Edit thread_count to be equivalent to what you're comfortable with. If you're using a tapered evaluation, set #define TAPERED 1
in both base.h
and config.h
, otherwhise set both to #define TAPERED 0
.
Examples can be found in the engines
directory. ToyEval
and ToyEvalTapered
are very minimal examples, while Fourku
is a full example for the engine 4ku.
This parameter should be set to true if there are any terms in the evaluation which are not being tuned at the moment. If set to false
, any additional terms would be ignored comepletely. If set to true
, then the evaluation function should compute the score itself, and set it as score
when returning an EvalResult
from get_*_eval_result functions.
The purpose of this is that if set to false
, then the tuning can be faster, while if set to true
, the terms being tuned would be tuned around any other existing terms that are not being tuned, and likely being more accurate.
This parameter indicates whether or not the engine supports translating from a board structure defined in the external
directory. See more at get_external_eval_result
This function retrieves the initial parameters of the evaluation in a vector form. Each parameter is an entry in parameters_t
.
If the evaluation used is tapered, each entry is an std::array<double, 2>
, where the first value is the midgame value and the second value is the endgame value.
If the evaluationis not tapered, the entry is just the plain value of the parameter used in the evaluation.
This function gets the linear coefficients for each parameter given a position in a FEN form. Instead of counting a score, count how many times an evaluation term was used for each side in a position in the data set.
The input is a FEN string because it's unreasonable to expect each engine to have the same structure for a board representation, so FEN parsing is left to the engine implementation.
Additionaly, you may return the score, this is used to tune around other existing parameters.
Similar to get_fen_eval_result, but instead of a FEN it gets a Chess::Board
as a base parameter. Support for it is not required, but is recommended if tuning with qsearch enabled, because it will greatly increase the data loading speed.
This function prints the results of the tuning, the input is given as a vector of the tuned parameters, and it's up to the engine to ptint it as as it desires.
Maximum number of how many threads various tuning operations will take. Recommended to set to the amount of physical cores on the system the tuner is being run on.
K
is a scaling parameter, the lower the K
, the higher the tuned evaluation scores will be overall. Setting preferred_k = 0
will make the tuner try to auto-determine the optimal K
in order to preserve the same scale as the existing eval terms.
Setting preferred_k = 0
is not compatible with retune_from_zero = true
.
Maximum limit of how many epochs (iterations over the whole dataset) to run. Could be useful if for example you only ever run 5000 epochs, to keep the tuning consistent.
If set to true
, tuning will start with all evaluation terms set to 0
. It is still needed to implement get_initial_parameters in the evaluation class, even it is set to true
. Setting it to false
will make the tuner start with the current evaluation terms.
If set to true
, will use quiescence search when loading each entry from the data set, in order to get to quiet positions (positions where the best move is not a capture). When tuning with already only quiet positions this will have a minimal effect on the tuning outcome.
If set to true
, data loading will be considerably slower. This can be mitigated by implementing get_external_eval_result in the evaluation class and setting supports_external_chess_eval to true
, however the data loading will still be slower.
If set to true
, will print information about each entry while loading the data set. Should only enable if debugging.
How often to print progress while loading data.
Cmake / make // TODO
This tuner does not provide data sources. Own data source must be used. Each data source is a file where each line is a FEN and a WDL. (WDL = Win/Draw/Loss).
Supported formats:
- Line containing WDL. Markers are
1.0
,0.5
, and0.0
for win, draw loss. Example:
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1; [1.0]
- Line containing match result. Markers are
1-0
,1/2-1/2
, and0-1
for win, draw loss. Example:
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1; [1-0]
- Line containing probability to win. Pretty much equivalent to option nr.1, except it doesn't need to be stricly win/draw/loss scores . Example:
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1; [0.6]
The brackets are not necessary, the WDL only has to be found somewhere in the line.
Create a csv formatted file with data sources. #
marks a comment line.
Columns:
- Path to data file.
- Whether or not the WDL is from the side playing. 1 = yes, 0 = no,
- Limit of how may FENs to load from this data source (sequentially). 0 = unlimited
Example:
# Path, WDL from side playing, position limit
C:\Data1.epd,0,0
C:\Data2.epd,0,900000
Build the project and run tuner.exe sources.csv
where sources.csv is the data source file mentioned previously.