-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Randomized Parameter Search #168
Conversation
We can just pass the list of model_parameters from the config file to this function.
This will make this piece of code easier to understand and test.
…rch setting One of these tests is failing because we haven't implemented this logic in the _get_model_parameters() function yet.
The new training.model_parameter_search is a more flexible version of param_grid. We still support param_grid, but eventually we will want to completely switch over to model_parameter_search instead.
- randint returns a random integer in an inclusive range - uniform returns a random float in an inclusive range
This makes this code more flexible and easier to understand. It also handles a weird case where the toml library returns a subclass of dict in some situations, and built-in Python dicts in other situations.
…gy randomized This lets users set some parameters to a particular value, and only sample others. It's mostly a convenience because previously you could get the same behavior by passing the parameter as a one-element list, like `maxDepth = [7]`. This commit introduces the extra convenience of just specifying the parameter as a value, like `maxDepth = 7`. So now you can do something like this: ``` [[training.model_parameters]] type = "random_forest" maxDepth = 7 numTrees = [1, 10, 20] subsamplingRate = {distribution = "uniform", low = 0.1, high = 0.9} ``` maxDepth will always be 7, numTrees will be randomly sampled from the list 1, 10, 20, and subsamplingRate will be sampled uniformly from the range [0.1, 0.9].
Only the hyper-parameters to the model should be affected by training.model_parameter_search.strategy. thresholds and threshold_ratios should be passed through unchanged on each model.
I haven't written user docs for this yet because I figure there will be a lot more changes to model exploration as well. We should make sure to write some good documentation once we have started narrowing in on how everything will work together. |
I renamed _get_model_parameters()'s training_config argument to "training_settings" to match the changes made in v4-dev.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read it over and understand the feature. At first I was confused about how the num_samples got implemented but I found that. Good to go.
The failing test is also failing on v4-dev. It's not related to randomized parameter search as far as I can tell. |
This work is for issue #167, which we can close once the v4-dev branch is merged into main and released.
Previously, there were two strategies for searching for the best parameters to a model in model exploration, and the
training.param_grid
config option switched between them. Whenparam_grid
was false, which was the default, model exploration would take the contents oftraining.model_parameters
and test them without any changes or transformation. I've named this strategy "explicit" because the user explicitly writes out each combination of parameters they would like to test. Whenparam_grid
was true, the user provided lists of possible values for each parameter inmodel_parameters
. Then model exploration took this list and generated every possible combination of parameters, testing each combination in serial. This strategy is called a "grid" search because it generates a grid of possible parameter combinations.This PR adds a third strategy, "randomized" search, which samples each parameter from a list or distribution to create a set number
N
of parameter combinations to test. This differs from grid search, in which every possible combination of parameters becomes a test case. Randomized search should speed up searches for parameters. Grid search may still be helpful in some situations when you need more precision and would like to test a range of values very thoroughly.To add a third strategy, we have deprecated the
param_grid
option and replaced it withtraining.model_parameter_search
.param_grid
still works but model exploration prints a warning message when you use it.model_parameter_search
may beor
The explicit and grid strategies correspond exactly to the previous behavior with the
param_grid
option. The randomized strategy adds new behavior. When the strategy is randomized, each parameters inmodel_parameters
may take one of three different forms.[low, high]
.[low, high]
.Here's an example configuration for randomized parameter search.