Skip to content

Commit

Permalink
[#183] Document that training.param_grid is deprecated
Browse files Browse the repository at this point in the history
  • Loading branch information
riley-harper committed Dec 19, 2024
1 parent 8bfe87e commit dc9b3d2
Show file tree
Hide file tree
Showing 25 changed files with 108 additions and 56 deletions.
2 changes: 1 addition & 1 deletion docs/.buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file records the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 3d084ea912736a6c4043e49bc2b58167
config: 346c22873853f51d4bd34095fc5e3354
tags: 645f666f9bcd5a90fca523b33c5a78b7
2 changes: 1 addition & 1 deletion docs/.buildinfo.bak
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file records the configuration used when building these files. When it is not found, a full rebuild will be done.
config: a706061ae4b2d0ec765440a2505ca382
config: 3d084ea912736a6c4043e49bc2b58167
tags: 645f666f9bcd5a90fca523b33c5a78b7
28 changes: 23 additions & 5 deletions docs/_sources/config.md.txt
Original file line number Diff line number Diff line change
Expand Up @@ -334,7 +334,7 @@ split_by_id_a = true
decision = "drop_duplicate_with_threshold_ratio"

n_training_iterations = 2
param_grid = true
model_parameter_search = {strategy = "grid"}
model_parameters = [
{ type = "random_forest", maxDepth = [7], numTrees = [100], threshold = [0.05, 0.005], threshold_ratio = [1.2, 1.3] },
{ type = "logistic_regression", threshold = [0.50, 0.65, 0.80], threshold_ratio = [1.0, 1.1] }
Expand All @@ -360,7 +360,7 @@ split_by_id_a = true
decision = "drop_duplicate_with_threshold_ratio"

n_training_iterations = 10
param_grid = false
model_parameter_search = {strategy = "explicit"}
model_parameters = [
{ type = "random_forest", maxDepth = 6, numTrees = 50, threshold = 0.5, threshold_ratio = 1.0 },
{ type = "probit", threshold = 0.5, threshold_ratio = 1.0 }
Expand Down Expand Up @@ -743,7 +743,6 @@ splits = [-1,0,6,11,9999]
* `decision` -- Type: `string`. Optional. Specifies which decision function to use to create the final prediction. The first option is `drop_duplicate_a`, which drops any links for which a record in the `a` data set has a predicted match more than one time. The second option is `drop_duplicate_with_threshold_ratio` which only takes links for which the `a` record has the highest probability out of any other potential links, and the second best link for the `a` record is less than the `threshold_ratio`.
* `threshold_ratio` -- Type: `float`. Optional. For use when `decision` is `drop_duplicate_with_threshold_ratio` . Specifies the smallest possible ratio to accept between a best and second best link for a given record. Can be used to specify a threshold ratio (beta threshold) to use for all models. Alternatively, unique threshold ratios can be specified in each individual `chosen_model` and `model_parameters` specification.
* `model_parameters` -- Type: `list`. Specifies models to test out in the `model_exploration` task. See the [models](models) section for more information on model specifications.
* `param_grid` -- Type: `boolean`. Optional. If you would like to evaluate multiple hyper-parameters for a single model type in your `model_parameters` specification, you can give hyper-parameter inputs as arrays of length >= 1 instead of integers to allow one model per row specification with multiple model eval outputs.
* `score_with_model` -- Type: `boolean`. If set to false, will skip the `apply_model` step of the matching task. Use this if you want to use the `run_all_steps` command and are just trying to generate potential links, such as for the creation of training data.
* `n_training_iterations` -- Type: `integer`. Optional; default value is 10. The number of training iterations to use during the `model_exploration` task.
* `scale_data` -- Type: `boolean`. Optional. Whether to scale the data as part of the machine learning pipeline.
Expand All @@ -752,6 +751,25 @@ splits = [-1,0,6,11,9999]
* `feature_importances` -- Type: `boolean`. Optional. Whether to record
feature importances or coefficients for the training features when training
the ML model. Set this to true to enable training step 3.
* Deprecated Attributes:
* `param_grid` (*Deprecated in version 4.0.0*) -- Type: `boolean`. Optional.
`param_grid` has been deprecated and will eventually be removed. Please use
the more flexible `model_parameter_search` option by replacing `param_grid
= false`

with

```toml
model_parameter_search = {strategy = "explicit"}
```

and replacing `param_grid = true`

with

```toml
model_parameter_search = {strategy = "grid"}
```


```
Expand All @@ -769,7 +787,7 @@ feature_importances = true
decision = "drop_duplicate_with_threshold_ratio"

n_training_iterations = 10
param_grid = false
model_parameter_search = {strategy = "explicit"}
model_parameters = [
{ type = "random_forest", maxDepth = 6, numTrees = 50 },
{ type = "probit", threshold = 0.5}
Expand Down Expand Up @@ -805,7 +823,7 @@ score_with_model = true
feature_importances = true
decision = "drop_duplicate_with_threshold_ratio"

param_grid = true
model_parameter_search = {strategy = "grid"}
n_training_iterations = 10
model_parameters = [
{ type = "logistic_regression", threshold = [0.5], threshold_ratio = [1.1]},
Expand Down
2 changes: 1 addition & 1 deletion docs/_sources/use_examples.md.txt
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ However, when this training data set is used for other years, the program does n
score_with_model = true
feature_importances = false
decision = "drop_duplicate_with_threshold_ratio"
param_grid = true
model_parameter_search = {strategy = "grid"}
n_training_iterations = 10
model_parameters = [
{ type = "logistic_regression", threshold = [0.5], threshold_ratio = [1.0, 1.1]},
Expand Down
2 changes: 1 addition & 1 deletion docs/_static/documentation_options.js
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
VERSION: '3.8.0',
VERSION: '4.0.0a1',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
Expand Down
4 changes: 2 additions & 2 deletions docs/column_mappings.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,11 @@
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>Column Mappings &#8212; hlink 3.8.0 documentation</title>
<title>Column Mappings &#8212; hlink 4.0.0a1 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=d1102ebc" />
<link rel="stylesheet" type="text/css" href="_static/basic.css?v=686e5160" />
<link rel="stylesheet" type="text/css" href="_static/alabaster.css?v=27fed22d" />
<script src="_static/documentation_options.js?v=948f11bf"></script>
<script src="_static/documentation_options.js?v=f5d13bc6"></script>
<script src="_static/doctools.js?v=9bcbadda"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<link rel="index" title="Index" href="genindex.html" />
Expand Down
4 changes: 2 additions & 2 deletions docs/comparison_features.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,11 @@
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>Comparison Features &#8212; hlink 3.8.0 documentation</title>
<title>Comparison Features &#8212; hlink 4.0.0a1 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=d1102ebc" />
<link rel="stylesheet" type="text/css" href="_static/basic.css?v=686e5160" />
<link rel="stylesheet" type="text/css" href="_static/alabaster.css?v=27fed22d" />
<script src="_static/documentation_options.js?v=948f11bf"></script>
<script src="_static/documentation_options.js?v=f5d13bc6"></script>
<script src="_static/doctools.js?v=9bcbadda"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<link rel="index" title="Index" href="genindex.html" />
Expand Down
4 changes: 2 additions & 2 deletions docs/comparisons.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,11 @@
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

<title>Comparisons &#8212; hlink 3.8.0 documentation</title>
<title>Comparisons &#8212; hlink 4.0.0a1 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css?v=d1102ebc" />
<link rel="stylesheet" type="text/css" href="_static/basic.css?v=686e5160" />
<link rel="stylesheet" type="text/css" href="_static/alabaster.css?v=27fed22d" />
<script src="_static/documentation_options.js?v=948f11bf"></script>
<script src="_static/documentation_options.js?v=f5d13bc6"></script>
<script src="_static/doctools.js?v=9bcbadda"></script>
<script src="_static/sphinx_highlight.js?v=dc90522c"></script>
<link rel="index" title="Index" href="genindex.html" />
Expand Down
Loading

0 comments on commit dc9b3d2

Please sign in to comment.