-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conditional imputation (ifdo) related issue #548
Comments
Any chance that the visiting sequence may be the culprit here? i.e. that |
Hi Gerko, Thanks for your quick reply and advice. |
Can you create a reprex? |
Sure, see below. library(mice)
#> Warning: package 'mice' was built under R version 4.2.3
#>
#> Attaching package: 'mice'
#> The following object is masked from 'package:stats':
#>
#> filter
#> The following objects are masked from 'package:base':
#>
#> cbind, rbind
library(data.table)
my_dt = readRDS("~/testdt.rds")
dummy_col = c('employed')
my_dt[, (dummy_col) := lapply(.SD, factor), .SDcols = dummy_col]
ini <- mice(my_dt,predictorMatrix=quickpred(my_dt,minpuc=0.9,mincor=0.1),maxit=0,seed=101)
post <- ini$post
pred <- ini$pred
## ASSIGN METHODS
method <- ini$meth
method["age"] <- ""
cols_to_pmm= c("edu", "height", "weight", "workinghours")
for(i in cols_to_pmm){
method[i] <- "pmm"
}
post["workinghours"] <- "imp[[j]][imp[[j]]$data$employed[!r[,j]]==0, i] <- 0"
pred[c("employed"),c("workinghours")] <-0
visit_seq = c("edu","height","weight","employed","workinghours")
imp <- mice(data= my_dt, maxit = 5, predictorMatrix = pred, post=post, method=method, m=2, visitSequence = visit_seq)
#>
#> iter imp variable
#> 1 1 edu height weight employed workinghours
#> 1 2 edu height weight employed workinghours
#> 2 1 edu height weight employed workinghours
#> 2 2 edu height weight employed workinghours
#> 3 1 edu height weight employed workinghours
#> 3 2 edu height weight employed workinghours
#> 4 1 edu height weight employed workinghours
#> 4 2 edu height weight employed workinghours
#> 5 1 edu height weight employed workinghours
#> 5 2 edu height weight employed workinghours
completeimp <- complete(imp) Created on 2023-04-17 with reprex v2.0.2 I just found that the |
Could the problem be that complete only sets values to zero that are actually included as |
I think I'd already ruled out the possibility of this scenario while cleaning and recoding the data. As you can see, I ran this line of code: summary(my_dt[is.na(my_dt$employed),'workinghours']) Created on 2023-04-17 with reprex v2.0.2 The result is this: workinghours |
I spent some more time looking into this, and I think that the error is in your specification of the post-processing arguments. As far as I know, there is no Note that the use of In your case, I think you could modify it to something like I hope this solves the problem! library(mice)
#>
#> Attaching package: 'mice'
#> The following object is masked from 'package:stats':
#>
#> filter
#> The following objects are masked from 'package:base':
#>
#> cbind, rbind
post <- make.post(nhanes)
post["hyp"] <- "imp[[j]][, i] <- ifelse(data[r[,j] == 0, 'bmi'] > 30, 10, imp[[j]][,i])"
imp <- mice(nhanes, post = post, seed = 1, printFlag = FALSE)
imp$imp
#> $age
#> [1] 1 2 3 4 5
#> <0 rows> (or 0-length row.names)
#>
#> $bmi
#> 1 2 3 4 5
#> 1 27.4 35.3 27.2 29.6 24.9
#> 3 28.7 27.2 35.3 30.1 29.6
#> 4 25.5 20.4 22.0 20.4 25.5
#> 6 24.9 21.7 22.7 24.9 27.4
#> 10 28.7 22.7 22.0 20.4 22.7
#> 11 30.1 29.6 35.3 22.7 33.2
#> 12 26.3 27.2 33.2 27.4 22.5
#> 16 27.2 27.2 35.3 22.7 24.9
#> 21 26.3 29.6 27.2 29.6 27.2
#>
#> $hyp
#> 1 2 3 4 5
#> 1 1 10 1 1 1
#> 4 1 1 1 1 2
#> 6 1 2 2 1 1
#> 10 1 1 1 1 1
#> 11 10 1 10 1 10
#> 12 1 1 10 2 1
#> 16 1 1 10 1 1
#> 21 1 1 1 1 1
#>
#> $chl
#> 1 2 3 4 5
#> 1 199 284 187 187 238
#> 4 218 118 184 187 238
#> 10 206 187 186 238 187
#> 11 218 206 186 131 187
#> 12 199 199 187 184 187
#> 15 206 186 229 229 206
#> 16 218 184 184 113 238
#> 20 199 218 206 184 187
#> 21 229 204 131 206 187
#> 24 206 238 284 199 218
complete(imp, "all")
#> $`1`
#> age bmi hyp chl
#> 1 1 27.4 1 199
#> 2 2 22.7 1 187
#> 3 1 28.7 1 187
#> 4 3 25.5 1 218
#> 5 1 20.4 1 113
#> 6 3 24.9 1 184
#> 7 1 22.5 1 118
#> 8 1 30.1 1 187
#> 9 2 22.0 1 238
#> 10 2 28.7 1 206
#> 11 1 30.1 10 218
#> 12 2 26.3 1 199
#> 13 3 21.7 1 206
#> 14 2 28.7 2 204
#> 15 1 29.6 1 206
#> 16 1 27.2 1 218
#> 17 3 27.2 2 284
#> 18 2 26.3 2 199
#> 19 1 35.3 1 218
#> 20 3 25.5 2 199
#> 21 1 26.3 1 229
#> 22 1 33.2 1 229
#> 23 1 27.5 1 131
#> 24 3 24.9 1 206
#> 25 2 27.4 1 186
#>
#> $`2`
#> age bmi hyp chl
#> 1 1 35.3 10 284
#> 2 2 22.7 1 187
#> 3 1 27.2 1 187
#> 4 3 20.4 1 118
#> 5 1 20.4 1 113
#> 6 3 21.7 2 184
#> 7 1 22.5 1 118
#> 8 1 30.1 1 187
#> 9 2 22.0 1 238
#> 10 2 22.7 1 187
#> 11 1 29.6 1 206
#> 12 2 27.2 1 199
#> 13 3 21.7 1 206
#> 14 2 28.7 2 204
#> 15 1 29.6 1 186
#> 16 1 27.2 1 184
#> 17 3 27.2 2 284
#> 18 2 26.3 2 199
#> 19 1 35.3 1 218
#> 20 3 25.5 2 218
#> 21 1 29.6 1 204
#> 22 1 33.2 1 229
#> 23 1 27.5 1 131
#> 24 3 24.9 1 238
#> 25 2 27.4 1 186
#>
#> $`3`
#> age bmi hyp chl
#> 1 1 27.2 1 187
#> 2 2 22.7 1 187
#> 3 1 35.3 1 187
#> 4 3 22.0 1 184
#> 5 1 20.4 1 113
#> 6 3 22.7 2 184
#> 7 1 22.5 1 118
#> 8 1 30.1 1 187
#> 9 2 22.0 1 238
#> 10 2 22.0 1 186
#> 11 1 35.3 10 186
#> 12 2 33.2 10 187
#> 13 3 21.7 1 206
#> 14 2 28.7 2 204
#> 15 1 29.6 1 229
#> 16 1 35.3 10 184
#> 17 3 27.2 2 284
#> 18 2 26.3 2 199
#> 19 1 35.3 1 218
#> 20 3 25.5 2 206
#> 21 1 27.2 1 131
#> 22 1 33.2 1 229
#> 23 1 27.5 1 131
#> 24 3 24.9 1 284
#> 25 2 27.4 1 186
#>
#> $`4`
#> age bmi hyp chl
#> 1 1 29.6 1 187
#> 2 2 22.7 1 187
#> 3 1 30.1 1 187
#> 4 3 20.4 1 187
#> 5 1 20.4 1 113
#> 6 3 24.9 1 184
#> 7 1 22.5 1 118
#> 8 1 30.1 1 187
#> 9 2 22.0 1 238
#> 10 2 20.4 1 238
#> 11 1 22.7 1 131
#> 12 2 27.4 2 184
#> 13 3 21.7 1 206
#> 14 2 28.7 2 204
#> 15 1 29.6 1 229
#> 16 1 22.7 1 113
#> 17 3 27.2 2 284
#> 18 2 26.3 2 199
#> 19 1 35.3 1 218
#> 20 3 25.5 2 184
#> 21 1 29.6 1 206
#> 22 1 33.2 1 229
#> 23 1 27.5 1 131
#> 24 3 24.9 1 199
#> 25 2 27.4 1 186
#>
#> $`5`
#> age bmi hyp chl
#> 1 1 24.9 1 238
#> 2 2 22.7 1 187
#> 3 1 29.6 1 187
#> 4 3 25.5 2 238
#> 5 1 20.4 1 113
#> 6 3 27.4 1 184
#> 7 1 22.5 1 118
#> 8 1 30.1 1 187
#> 9 2 22.0 1 238
#> 10 2 22.7 1 187
#> 11 1 33.2 10 187
#> 12 2 22.5 1 187
#> 13 3 21.7 1 206
#> 14 2 28.7 2 204
#> 15 1 29.6 1 206
#> 16 1 24.9 1 238
#> 17 3 27.2 2 284
#> 18 2 26.3 2 199
#> 19 1 35.3 1 218
#> 20 3 25.5 2 187
#> 21 1 27.2 1 187
#> 22 1 33.2 1 229
#> 23 1 27.5 1 131
#> 24 3 24.9 1 218
#> 25 2 27.4 1 186
#>
#> attr(,"class")
#> [1] "mild" "list" Created on 2023-04-18 with reprex v2.0.2 |
Thank you for your suggestion! Unfortunately, it did not work so well on my data. I think there might be some confusion here due to my description. However, your solution does give me some hint. If I understood correctly, the code line from you
tries to mutate those (NA in var In fact, I need to mutate the imputed values of `post["workinghours"] <- paste(sep = ";",
And the |
Hi Stef,
I got a problem while implementing the post() function when imputing some downstream variables in my dataset.
I have a data frame, my_dt, which is a Nx40 data frame with some missingess, and I would like to impute a variable called "employed" and its downstream variables, e.g. "workinghour", "workingdays" etc.
my_dt = df( ... ,
employed = c(0, NA, 1, 0, NA, 0, NA, 1,1, NA),
workinghours = c(0, NA, NA, 0, NA, 0, NA, 22, 30, NA),
... )
The issue is if the imputed "employed" is 0, then I would like to set all the downstream vars equal to 0 as well. (unemployed should have 0 workinghours and workingdays etc.)
I'd looked into your discussions of related issues, and also tried code lines in #43, #125, and #258 etc. So I'm setting the post like this
post["workinghours"] <- "imp[[j]][imp[[j]]$data$employed[!r[,j]]==0, i] <- 0"
It worked without any error. But I found my code did not work as expected: there are non-zero imputations of "workinghours" for "employed"==0 observations.
I'm wondering how to fix it. Thank you in advance for your time.
The text was updated successfully, but these errors were encountered: