Skip to content

Commit

Permalink
[#137] Add documentation for the new blocking.or_group feature
Browse files Browse the repository at this point in the history
  • Loading branch information
riley-harper committed Jun 17, 2024
1 parent 1383518 commit f569256
Show file tree
Hide file tree
Showing 4 changed files with 36 additions and 1 deletion.
12 changes: 12 additions & 0 deletions docs/_sources/config.md.txt
Original file line number Diff line number Diff line change
Expand Up @@ -568,6 +568,18 @@ expression = "sex == 1"
* `dataset` -- Type: `string`. Optional. Must be `a` or `b` and used in conjuction with `explode`. Will only explode the column from the `a` or `b` dataset when specified.
* `derived_from` -- Type: `string`. Used in conjunction with `explode = true`. Specifies an input column from the existing dataset to be exploded.
* `expand_length` -- Type: `integer`. When `explode` is used on a column that is an integer, this can be specified to create an array with a range of integer values from (`expand_length` minus `original_value`) to (`expand_length` plus `original_value`). For example, if the input column value for birthyr is 1870, explode is true, and the expand_length is 3, the exploded column birthyr_3 value would be the array [1867, 1868, 1869, 1870, 1871, 1872, 1873].
* `or_group` -- Type: `string`. Optional. The "OR group" to which this
blocking table belongs. Blocking tables that belong to the same OR group
are joined by OR in the blocking condition instead of AND. By default each
blocking table belongs to a different OR group. For example, suppose that
your dataset has 3 possible birthplaces BPL1, BPL2, and BPL3 for each
record. If you don't provide OR groups when blocking on each BPL variable,
then you will get a blocking condition like `(a.BPL1 = b.BPL1) AND (a.BPL2
= b.BPL2) AND (a.BPL3 = b.BPL3)`. But if you set `or_group = "BPL"` for
each of the 3 variables, then you will get a blocking condition like this
instead: `(a.BPL1 = b.BPL1 OR a.BPL2 = b.BPL2 OR a.BPL3 = b.BPL3)`. Note
the parentheses around the entire OR group condition. Other OR groups would
be connected to the BPL OR group with an AND condition.


```
Expand Down
11 changes: 11 additions & 0 deletions docs/config.html
Original file line number Diff line number Diff line change
Expand Up @@ -619,6 +619,17 @@ <h2>Blocking<a class="headerlink" href="#blocking" title="Link to this heading">
<li><p><code class="docutils literal notranslate"><span class="pre">dataset</span></code> – Type: <code class="docutils literal notranslate"><span class="pre">string</span></code>. Optional. Must be <code class="docutils literal notranslate"><span class="pre">a</span></code> or <code class="docutils literal notranslate"><span class="pre">b</span></code> and used in conjuction with <code class="docutils literal notranslate"><span class="pre">explode</span></code>. Will only explode the column from the <code class="docutils literal notranslate"><span class="pre">a</span></code> or <code class="docutils literal notranslate"><span class="pre">b</span></code> dataset when specified.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">derived_from</span></code> – Type: <code class="docutils literal notranslate"><span class="pre">string</span></code>. Used in conjunction with <code class="docutils literal notranslate"><span class="pre">explode</span> <span class="pre">=</span> <span class="pre">true</span></code>. Specifies an input column from the existing dataset to be exploded.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">expand_length</span></code> – Type: <code class="docutils literal notranslate"><span class="pre">integer</span></code>. When <code class="docutils literal notranslate"><span class="pre">explode</span></code> is used on a column that is an integer, this can be specified to create an array with a range of integer values from (<code class="docutils literal notranslate"><span class="pre">expand_length</span></code> minus <code class="docutils literal notranslate"><span class="pre">original_value</span></code>) to (<code class="docutils literal notranslate"><span class="pre">expand_length</span></code> plus <code class="docutils literal notranslate"><span class="pre">original_value</span></code>). For example, if the input column value for birthyr is 1870, explode is true, and the expand_length is 3, the exploded column birthyr_3 value would be the array [1867, 1868, 1869, 1870, 1871, 1872, 1873].</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">or_group</span></code> – Type: <code class="docutils literal notranslate"><span class="pre">string</span></code>. Optional. The “OR group” to which this
blocking table belongs. Blocking tables that belong to the same OR group
are joined by OR in the blocking condition instead of AND. By default each
blocking table belongs to a different OR group. For example, suppose that
your dataset has 3 possible birthplaces BPL1, BPL2, and BPL3 for each
record. If you don’t provide OR groups when blocking on each BPL variable,
then you will get a blocking condition like <code class="docutils literal notranslate"><span class="pre">(a.BPL1</span> <span class="pre">=</span> <span class="pre">b.BPL1)</span> <span class="pre">AND</span> <span class="pre">(a.BPL2</span> <span class="pre">=</span> <span class="pre">b.BPL2)</span> <span class="pre">AND</span> <span class="pre">(a.BPL3</span> <span class="pre">=</span> <span class="pre">b.BPL3)</span></code>. But if you set <code class="docutils literal notranslate"><span class="pre">or_group</span> <span class="pre">=</span> <span class="pre">&quot;BPL&quot;</span></code> for
each of the 3 variables, then you will get a blocking condition like this
instead: <code class="docutils literal notranslate"><span class="pre">(a.BPL1</span> <span class="pre">=</span> <span class="pre">b.BPL1</span> <span class="pre">OR</span> <span class="pre">a.BPL2</span> <span class="pre">=</span> <span class="pre">b.BPL2</span> <span class="pre">OR</span> <span class="pre">a.BPL3</span> <span class="pre">=</span> <span class="pre">b.BPL3)</span></code>. Note
the parentheses around the entire OR group condition. Other OR groups would
be connected to the BPL OR group with an AND condition.</p></li>
</ul>
</li>
</ul>
Expand Down
2 changes: 1 addition & 1 deletion docs/searchindex.js

Large diffs are not rendered by default.

12 changes: 12 additions & 0 deletions sphinx-docs/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -568,6 +568,18 @@ expression = "sex == 1"
* `dataset` -- Type: `string`. Optional. Must be `a` or `b` and used in conjuction with `explode`. Will only explode the column from the `a` or `b` dataset when specified.
* `derived_from` -- Type: `string`. Used in conjunction with `explode = true`. Specifies an input column from the existing dataset to be exploded.
* `expand_length` -- Type: `integer`. When `explode` is used on a column that is an integer, this can be specified to create an array with a range of integer values from (`expand_length` minus `original_value`) to (`expand_length` plus `original_value`). For example, if the input column value for birthyr is 1870, explode is true, and the expand_length is 3, the exploded column birthyr_3 value would be the array [1867, 1868, 1869, 1870, 1871, 1872, 1873].
* `or_group` -- Type: `string`. Optional. The "OR group" to which this
blocking table belongs. Blocking tables that belong to the same OR group
are joined by OR in the blocking condition instead of AND. By default each
blocking table belongs to a different OR group. For example, suppose that
your dataset has 3 possible birthplaces BPL1, BPL2, and BPL3 for each
record. If you don't provide OR groups when blocking on each BPL variable,
then you will get a blocking condition like `(a.BPL1 = b.BPL1) AND (a.BPL2
= b.BPL2) AND (a.BPL3 = b.BPL3)`. But if you set `or_group = "BPL"` for
each of the 3 variables, then you will get a blocking condition like this
instead: `(a.BPL1 = b.BPL1 OR a.BPL2 = b.BPL2 OR a.BPL3 = b.BPL3)`. Note
the parentheses around the entire OR group condition. Other OR groups would
be connected to the BPL OR group with an AND condition.


```
Expand Down

0 comments on commit f569256

Please sign in to comment.