Skip to content

Commit

Permalink
[#158] Update the documentation for household comparisons
Browse files Browse the repository at this point in the history
  • Loading branch information
riley-harper committed Oct 30, 2024
1 parent a3f8b48 commit 319c60b
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 9 deletions.
6 changes: 6 additions & 0 deletions sphinx-docs/comparisons.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,12 @@ feature. During matching, only pairs of records with `namefrst_jw` greater than
or equal to 0.79 will be added to the potential matches table. Pairs of records
which do not satisfy the comparison will not be potential matches.

*Note: This page focuses on the `comparisons` section in particular, but the
household comparisons section `hh_comparisons` has the same structure. It
defines rules which hlink uses to filter record pairs after household blocking
in the hh_matching task. These rules are effectively filters on the output
`hh_potential_matches` table.*

## Comparison Types

Currently the only `comparison_type` supported for the `comparisons` section is
Expand Down
18 changes: 9 additions & 9 deletions sphinx-docs/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -653,19 +653,19 @@ either a single comparison or another set of sub-comparisons. Please see the
[comparisons documentation](comparisons.html#defining-multiple-comparisons) for
more details and examples.

## [Household Comparisons](comparison_types)
## [Household Comparisons](comparisons)

* Header name: `hh_comparisons`
* Description: A list of comparisons to threshold the household potential matches on. Also referred to as post-blocking filters, as all household potential matches are created, then only potential matches that pass the post-blocking filters will be kept for scoring. See [comparison types](comparison_types) for more information.
* Required: False
* Type: Object
* Attributes:
* `comparison_type` -- Type: `string`. Required. See [comparison types](comparison_types) for more information.
* `feature_name` -- Type: `string`. Required. The `comparison_feature` to use for the comparison threshold. A `comparison_feature` column by this name must be specified in the `comparison_features` section.

* Description: A set of comparisons which filter the household potential
matches. `hh_comparisons` has the same configuration structure as
`comparisons` and works in a similar way, except that it applies during the
`hh_matching` task instead of `matching`. You can read more about comparisons
[here](comparisons).

```
# Only household record pairs with an age difference <= 10 can be
# household potential matches.
[hh_comparisons]
# only keep household potential matches with an age difference less than or equal than ten years
comparison_type = "threshold"
feature_name = "byrdiff"
threshold_expr = "<= 10"
Expand Down

0 comments on commit 319c60b

Please sign in to comment.