This is a description of all the parameters available when you are running examples in this repo:
- All xgboost parameters are supported.
- Please use the
camelCase
, e.g.,--treeMethod=gpu_hist
. lambda
is replaced withlambda_
, becauselambda
is a keyword in Python.
- Please use the
--format=[csv|parquet|orc]
: The format of the data for training/transforming, now supports 'csv', 'parquet' and 'orc'. Required.--mode=[all|train|transform]
. To control the behavior of the sample app, default is 'all' if not specified.- all: Do both training and transforming, will save model to 'modelPath' if specified
- train: Do training only, will save model to 'modelPath' if specified.
- transform: Do transforming only, 'modelPath' is required to locate the model data to be loaded.
--trainDataPath=[path]
: Path to your training data file(s), required when mode is NOT 'transform'.--trainEvalDataPath=[path]
: Path to your data file(s) for training with evaluation. Optional.--evalDataPath=[path]
: Path to your test(evaluation) data file(s), required when mode is NOT 'train'.--modelPath=[path]
: Path to save model after training, or where to load model for transforming only. Required only when mode is 'transform'.--overwrite=[true|false]
: Whether to overwrite the current model data under 'modelPath'. Default is false. You may need to set to true to avoid IOException when saving the model to a path already exists.--hasHeader=[true|false]
: Indicate if your csv file has header.--asFloats=[true|false]
: Whether to cast numerical schema to float schema. Default is true.--maxRowsPerChunk=[value]
: Max lines of row to be read per chunk. Default is 2147483647.