Skip to content

Commit

Permalink
added verbose to dr estimator, readme updated
Browse files Browse the repository at this point in the history
  • Loading branch information
BERENZ committed Feb 24, 2025
1 parent d6ff09f commit 5529c7c
Show file tree
Hide file tree
Showing 4 changed files with 85 additions and 11 deletions.
19 changes: 17 additions & 2 deletions R/nonprob_dr.R
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,9 @@ nonprob_dr <- function(selection,
if (control_inference$vars_selection & control_inference$vars_combine) {

## estimate the mi
if (verbose) {
cat("MI variable selection in progress...\n")
}
results_mi <- nonprob_mi(outcome = outcome,
data = data,
svydesign = svydesign,
Expand All @@ -84,6 +87,9 @@ nonprob_dr <- function(selection,
se = FALSE,
pop_size_fixed=pop_size_fixed)

if (verbose) {
cat("IPW variable selection in progress...\n")
}
results_ipw <- nonprob_ipw(selection = selection,
target = reformulate(outcomes[[1]]),
data = data,
Expand Down Expand Up @@ -114,10 +120,19 @@ nonprob_dr <- function(selection,
mi_coefs_sel <- lapply(results_mi$outcome, coef)
dr_coefs_sel <- lapply(mi_coefs_sel, function(x) {
mi_cols <- names(x[abs(x)>0])
combined <- union(ipw_coefs_sel, mi_cols)
combined <- sort(base::union(ipw_coefs_sel, mi_cols))
combined[!grepl("Intercept", combined)]
})

if (verbose) {
cat("IPW vars selected:", ipw_coefs_sel, "\n")
cat("MI vars selected:\n")
print(lapply(mi_coefs_sel, function(x) names(x[abs(x)>0])))
cat("DR combined vars:\n")
print(dr_coefs_sel)
}


selection_vars <- all.vars(formula.tools::rhs(outcome))
outcome_vars <- all.vars(formula.tools::rhs(selection))
target_vars <- all.vars(formula.tools::lhs(outcome))
Expand All @@ -139,7 +154,7 @@ nonprob_dr <- function(selection,
control_inference_$vars_selection <- FALSE

for (o in outcomes$f) {

## this is not saved in the output list
results_ipw_combined[[o]] <- nonprob_ipw(data = X_nons,
target = reformulate(o),
selection = reformulate(dr_coefs_sel[[o]]),
Expand Down
4 changes: 2 additions & 2 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ population or probability sample is available:
- inverse probability weighting estimators with possible calibration
constraints [@chen2020],
- mass imputation estimators based on nearest neighbours [@yang2021],
predictive mean matching and regression imputation [@kim2021],
predictive mean matching [@chlebicki2025], non-parametric [@chen2022nonparametric] and regression imputation [@kim2021],
- doubly robust estimators [@chen2020] with bias minimization [@yang2020].

The package allows for:
Expand All @@ -56,7 +56,7 @@ The package allows for:
- estimation of variance using analytical and bootstrap approach (see
@wu2023),
- integration with the `survey` and `srvyr` packages when probability sample is
available [@Lumley2004, @Lumley2023],
available [@Lumley2004; @Lumley2023; @srvyr2024],
- different links for selection (`logit`, `probit` and `cloglog`) and
outcome (`gaussian`, `binomial` and `poisson`) variables.

Expand Down
42 changes: 35 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,12 +27,14 @@ for non-probability samples when auxiliary information from the
population or probability sample is available:

- inverse probability weighting estimators with possible calibration
constraints ([Chen, Li, and Wu 2020](#ref-chen2020)),
constraints ([Y. Chen, Li, and Wu 2020](#ref-chen2020)),
- mass imputation estimators based on nearest neighbours ([Yang, Kim,
and Hwang 2021](#ref-yang2021)), predictive mean matching and
and Hwang 2021](#ref-yang2021)), predictive mean matching ([Chlebicki,
Chrostowski, and Beręsewicz 2025](#ref-chlebicki2025)), non-parametric
([S. Chen, Yang, and Kim 2022](#ref-chen2022nonparametric)) and
regression imputation ([Kim et al. 2021](#ref-kim2021)),
- doubly robust estimators ([Chen, Li, and Wu 2020](#ref-chen2020)) with
bias minimization ([Yang, Kim, and Song 2020](#ref-yang2020)).
- doubly robust estimators ([Y. Chen, Li, and Wu 2020](#ref-chen2020))
with bias minimization ([Yang, Kim, and Song 2020](#ref-yang2020)).

The package allows for:

Expand All @@ -42,7 +44,9 @@ The package allows for:
- estimation of variance using analytical and bootstrap approach (see Wu
([2023](#ref-wu2023))),
- integration with the `survey` and `srvyr` packages when probability
sample is available Lumley ([2023](#ref-Lumley2023)),
sample is available ([Lumley 2004](#ref-Lumley2004),
[2023](#ref-Lumley2023); [Freedman Ellis and Schneider
2024](#ref-srvyr2024)),
- different links for selection (`logit`, `probit` and `cloglog`) and
outcome (`gaussian`, `binomial` and `poisson`) variables.

Expand Down Expand Up @@ -449,7 +453,7 @@ sample_prob
#> flag_srs == 1))
```

or with `srvyr`
or with the `srvyr` package

``` r
sample_prob <- srvyr::as_survey_design(.data = subset(population, flag_srs == 1),
Expand Down Expand Up @@ -490,7 +494,7 @@ result_dr
#> - variance estimator: analytic
#> - population size fixed: false
#> - naive (uncorrected) estimator: 3.1817
#> - selected estimator: 2.95 (se=0.0414, ci=(2.8688, 3.0312))
#> - selected estimator: 2.95 (se=0.0415, ci=(2.8687, 3.0313))
```

Mass imputation estimator
Expand Down Expand Up @@ -553,6 +557,14 @@ Work on this package is supported by the National Science Centre, OPUS
<div id="refs" class="references csl-bib-body hanging-indent"
entry-spacing="0">

<div id="ref-chen2022nonparametric" class="csl-entry">

Chen, Sixia, Shu Yang, and Jae Kwang Kim. 2022. “Nonparametric Mass
Imputation for Data Integration.” *Journal of Survey Statistics and
Methodology* 10 (1): 1–24.

</div>

<div id="ref-chen2020" class="csl-entry">

Chen, Yilin, Pengfei Li, and Changbao Wu. 2020. “Doubly Robust Inference
Expand All @@ -562,6 +574,22 @@ Statistical Association* 115 (532): 2011–21.

</div>

<div id="ref-chlebicki2025" class="csl-entry">

Chlebicki, Piotr, Łukasz Chrostowski, and Maciej Beręsewicz. 2025. “Data
Integration of Non-Probability and Probability Samples with Predictive
Mean Matching.” <https://arxiv.org/abs/2403.13750>.

</div>

<div id="ref-srvyr2024" class="csl-entry">

Freedman Ellis, Greg, and Ben Schneider. 2024. *Srvyr: ’Dplyr’-Like
Syntax for Summary Statistics of Survey Data*.
<https://CRAN.R-project.org/package=srvyr>.

</div>

<div id="ref-kim2021" class="csl-entry">

Kim, Jae Kwang, Seho Park, Yilin Chen, and Changbao Wu. 2021. “Combining
Expand Down
31 changes: 31 additions & 0 deletions references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -87,3 +87,34 @@ @article{wu2023
url = {https://www150.statcan.gc.ca/n1/pub/12-001-x/2022002/article/00002-eng.htm},
langid = {en}
}

@Manual{srvyr2024,
title = {srvyr: 'dplyr'-Like Syntax for Summary Statistics of Survey Data},
author = {{Freedman Ellis}, Greg and Schneider, Ben },
year = {2024},
note = {R package version 1.3.0},
url = {https://CRAN.R-project.org/package=srvyr},
}


@article{chen2022nonparametric,
title={Nonparametric mass imputation for data integration},
author={Chen, Sixia and Yang, Shu and Kim, Jae Kwang},
journal={Journal of survey statistics and methodology},
volume={10},
number={1},
pages={1--24},
year={2022},
publisher={Oxford University Press}
}


@misc{chlebicki2025,
title={Data integration of non-probability and probability samples with predictive mean matching},
author={Piotr Chlebicki and Łukasz Chrostowski and Maciej Beręsewicz},
year={2025},
eprint={2403.13750},
archivePrefix={arXiv},
primaryClass={stat.ME},
url={https://arxiv.org/abs/2403.13750},
}

0 comments on commit 5529c7c

Please sign in to comment.