Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External variable in logisticCoefs call not recognized when inside function call #220

Closed
moosterwegel opened this issue Mar 22, 2024 · 9 comments · Fixed by #221
Closed
Assignees
Labels

Comments

@moosterwegel
Copy link

First of all, thank you for providing an R package that makes it trivial to simulate some data with a certain marginal probability. Much appreciated.

I'm running into an issue with logisticCoefs when simulating p(x) p(m|x) p(y|x,m). I want to define the strength of the effect of x on m on the fly, but I can only get it to work outside of a function call. The variable that sets the strength of the effect is not recognized inside the function call. See the reprex below.

Any pointers to what's going wrong?

Thank you.

library(simstudy)
set.seed(2024)

# p(x) p(m) p(y|x,m)
f1 <- function() {
  n <- 2000
  prevalence <- 0.1
  coefs_y <- c(log(3), log(2))
  
  defs <- defData(
    defData(varname = "x", formula = 0, variance = 1, dist = "normal"), 
    varname = "m",
    formula = 0,
    dist = "normal", variance = 1)
  
  intercept <- logisticCoefs(defCovar = defs, coefs = coefs_y, popPrev = prevalence)[['B0']]
  
  defs <- defData(
    defs,
    varname = "y",
    formula = "t(c(..intercept, ..coefs_y)) %*% c(1, x, m)",
    dist = "binary", link = "logit")
  
  df <- genData(n, defs)
  glm(y ~ 1 + x + m, family = binomial(link = "logit"), data = df) |> coef() |> exp()
}

f1()
#> (Intercept)           x           m 
#>  0.05879635  2.87530784  1.83699935

# p(x) p(m|x) p(y|x,m)
f2 <- function() {
  n <- 2000
  prevalence <- 0.1
  coefs_y <- c(log(3), log(2))
  beta_x_m <- 2 
  
  defs <- defData(
    defData(varname = "x", formula = 0, variance = 1, dist = "normal"), 
    varname = "m",
    formula = "x * ..beta_x_m", # only difference with f1
    dist = "normal", variance = 1)
  
  intercept <- logisticCoefs(defCovar = defs, coefs = coefs_y, popPrev = prevalence)[['B0']]
  
  defs <- defData(
    defs,
    varname = "y",
    formula = "t(c(..intercept, ..coefs_y)) %*% c(1, x, m)",
    dist = "binary", link = "logit")
  
  df <- genData(n, defs)
  glm(y ~ 1 + x + m, family = binomial(link = "logit"), data = df) |> coef() |> exp()
}

f2()
#> Error: External variable(s) not defined or NA: beta_x_m

# the code from f2 works fine outside of a function
n <- 2000
prevalence <- 0.1
coefs_y <- c(log(3), log(2))
beta_x_m <- 2

defs <- defData(
  defData(varname = "x", formula = 0, variance = 1, dist = "normal"), 
  varname = "m",
  formula = "x * ..beta_x_m", 
  dist = "normal", variance = 1)

intercept <- logisticCoefs(defCovar = defs, coefs = coefs_y, popPrev = prevalence)[['B0']]

defs <- defData(
  defs,
  varname = "y",
  formula = "t(c(..intercept, ..coefs_y)) %*% c(1, x, m)",
  dist = "binary", link = "logit")

df <- genData(n, defs)
glm(y ~ 1 + x + m, family = binomial(link = "logit"), data = df) |> coef() |> exp()
#> (Intercept)           x           m 
#>  0.01726143  3.29607931  2.04816878
sessionInfo()
#> R version 4.3.1 (2023-06-16 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#> 
#> Matrix products: default
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] simstudy_0.7.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] backports_1.4.1     digest_0.6.31       fastmap_1.1.1      
#>  [4] xfun_0.42           glue_1.6.2          fastglm_0.0.3      
#>  [7] knitr_1.43          bigmemory_4.6.4     htmltools_0.5.5    
#> [10] rmarkdown_2.22      lifecycle_1.0.3     cli_3.6.1          
#> [13] reprex_2.1.0        data.table_1.14.8   withr_2.5.0        
#> [16] compiler_4.3.1      bigmemory.sri_0.1.8 rstudioapi_0.14    
#> [19] tools_4.3.1         evaluate_0.21       Rcpp_1.0.10        
#> [22] yaml_2.3.7          rlang_1.1.1         fs_1.6.2           
#> [25] uuid_1.1-0
@kgoldfeld
Copy link
Owner

Thanks for reaching out. Yes - you've found a bug that we need to fix related to the double-dot notation. Currently, that variable only gets evaulated if it exists in the "root" environment. So, when you create beta_x_m inside the function, the double dot functionality does not find it - it is looking for it in the calling environment. So, a workaround until we fix this is to define beta_x_m just before calling f2(), like this:

library(simstudy)
set.seed(2024)

# p(x) p(m|x) p(y|x,m)
f2 <- function() {
  n <- 2000
  prevalence <- 0.1
  coefs_y <- c(log(3), log(2))
  
  defs <- defData(
    defData(varname = "x", formula = 0, variance = 1, dist = "normal"), 
    varname = "m",
    formula = "x * ..beta_x_m", # only difference with f1
    dist = "normal", variance = 1)
  
  intercept <- logisticCoefs(defCovar = defs, coefs = coefs_y, popPrev = prevalence)[['B0']]
  
  defs <- defData(
    defs,
    varname = "y",
    formula = "t(c(..intercept, ..coefs_y)) %*% c(1, x, m)",
    dist = "binary", link = "logit")
  
  df <- genData(n, defs)
  glm(y ~ 1 + x + m, family = binomial(link = "logit"), data = df) |> coef() |> exp()
}

beta_x_m <- 2 
f2()
#> (Intercept)           x           m 
#>  0.01568302  5.22871219  1.60440634

Created on 2024-03-23 with reprex v2.1.0

You can do this with replications. Here is an example with 3 replications where beta_x_m is set to 1.5:

beta_x_m <- 1.5 
lapply(1:3, function(x) f2())
#> [[1]]
#> (Intercept)           x           m 
#>  0.02491957  3.83257775  1.63143803 
#> 
#> [[2]]
#> (Intercept)           x           m 
#>  0.02581665  2.26940339  2.24345931 
#> 
#> [[3]]
#> (Intercept)           x           m 
#>   0.0171123   3.6076806   1.9789129

Created on 2024-03-23 with reprex v2.1.0

I'll let you know when we've fixed the bigger issue.

@kgoldfeld kgoldfeld added the bug label Mar 23, 2024
@kgoldfeld
Copy link
Owner

Just to follow up - this bug is related specifically to function logisticCoefs. In general one can pass arguments to functions and refer to them in double dot notation:

library(simstudy)

fa <- function(a) {
  
  def <- 
    defData(varname = "x", formula = 4, variance = 2)  |>
    defData(varname = "z", formula = "x*..a", variance = 3)
  
  genData(1000, def)[, mean(z)]
}

set.seed(123123)

fa(5)
#> [1] 19.96077
fa(20)
#> [1] 80.67964

Created on 2024-03-23 with reprex v2.1.0

@kgoldfeld
Copy link
Owner

@assignUser I see the problem - in the logisticCoefs, the algorithm requires that a dataset using the specified data definition is generated. Of course, the double dot variable has not been passed as an argument in the call to logisticCoefs, resulting in the error message. The question is, is there a way to pass those variables (in this case beta_x_m) without the user explicitly specifying it. I'm guessing not - so the solution would require the user to attach the reference variables as an argument.

@kgoldfeld
Copy link
Owner

@assignUser I think I see a possible solution: the get function allows us to pull variables from the parent environment. So, if I can extract all the double dot references from the data definition, I can probably pull those variables into the environment.

@assignUser
Copy link
Collaborator

Yeah that sounds like it would work, I can look over the issue in a bit too!

@kgoldfeld
Copy link
Owner

I've pretty much got it working - will incorporate into logisticCoefs soon.

@assignUser
Copy link
Collaborator

Oh great, feel free to ping me for a review. I had to reset my dev environment (thanks windows...)

@kgoldfeld kgoldfeld self-assigned this Mar 24, 2024
@kgoldfeld kgoldfeld linked a pull request Mar 24, 2024 that will close this issue
@kgoldfeld
Copy link
Owner

This has been fixed - install from development. (No need for workaround provided above.)

@moosterwegel
Copy link
Author

That's quick, superb. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants