-
Notifications
You must be signed in to change notification settings - Fork 0
/
immortal.Rmd
366 lines (287 loc) · 17.3 KB
/
immortal.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
---
title: "Immortal time bias"
author:
- name: André Moser, CTU Bern, University of Bern
orcid_id: 0000-0001-7178-6539
output:
distill::distill_article:
toc: true
number_sections: true
toc_float: true
bibliography: immortal.bib
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Review
This vignette has not been reviewed (yet) by other statisticians `r emo::ji("face screaming in fear")`. `r emo::ji("skull")` You read it on your own risk `r emo::ji("skull")`.
# Open tasks
- More realistic assumptions: People can die in treatment period. Start of time zero possible hospital admission? Other examples?
- Define specific time at end of treatment and not median time
- Decrease sample size
# Background
Immortal time bias might arise, if - for example - individuals are assigned to an exposure or treatment group based on covariate information after time zero (@hernan). Consequently, group assignment is not aligned with time zero and individuals inherently experience no outcome event for a specific time period (they are immortal when the outcome of interest is time to death). Immortal time bias is common when analyzing observational studies (@suissa, @levesque). Other terminology for immortal time bias is *time dependent bias* or *survival bias*. In this vignette we introduce several methods to address immortal time bias discussed in the articles by @zhou and @robins.
# Example
We consider a study population of 10,000 patients admitted to an emergency unit. Patients with a more severe admission status are treated more quickly. Clinicians are interested whether the time from admission to time to treatment affect survival and group patients into two treatment groups: Short and long time to treatment after emergency admission. Group assignment is based on the median time to treatment.
- Research question: **What is the effect of the time to treatment group assignment on survival?**
- Study design: Cohort study
- Outcome of interest: Time to death or end of follow-up
- Observation time: Admission to end of follow-up
- Predictor of interest: Time to treatment after admission
- Confounders: Severity
The used variables from the data are:
| Variable | Definition | Coding |
| --- | --- | --- |
| group | Group assignment | 1=Long (exposed), 0=Short (unexposed) |
| severity | Severity status | 1=Severe, 0=Non-severe |
| death | Death | 1=Death, 0=Alive |
| time_to_treatment | Time to treatment (days) | Non-negative number |
| fup | Follow-up time (days) | Non-negative number |
## Knowledge from the crystal ball
Because we simulated the data, we know that group assignment has no effect on mortality.
# Analysis strategy
@zhou, @karim, @maringe, @robins and @hernan discuss different analysis strategies for addressing (or not addressing) immortal time bias. In this vignette we summarize the investigated methods from the above mentioned articles:
| Method | Definition |
| --- | --- |
| 1 | Group assignment at time zero |
| 2 | Random time to treatment assignment |
| 3 | Matched time to treatment assignment |
| 4 | Time zero at end of treatment |
| 5 | Time dependent treatment |
| 6 | Cloning |
```{r, echo=T, message=F, warning=F}
# Required packages
library(tidyverse)
library(survey)
library(survival)
library(survminer)
```
```{r, echo=F, message=F, warning=F}
library(gtsummary)
```
```{r, echo=F, message=F, warning=F, results='asis'}
n <- 10000
set.seed(1)
# Time to treatment: Mean 10 days
time_to_treatment <- rweibull(n, 5, 10)
# Two equal groups based on time to treatment
quantile_info <- median(time_to_treatment)
group <- ifelse(time_to_treatment>median(time_to_treatment), 1, 0)
# More severe patients are treated more quickly
severity <- ifelse(runif(n)<plogis(qlogis(0.8)+log(0.2)*(group==1)), 1, 0)
# Hazard of dying: Independent of time to treatment and severity
haz <- 0.05
# Time to death: Exponential with rate=haz
time_to_death <- -log(runif(n))/haz
# Censoring: Mean 20 days
censored <- rweibull(n, 1, 20)
# Follow-up time: Patients do not die until they are treated
fup <- pmin(time_to_treatment+time_to_death, time_to_treatment+censored)
# Death indicator: Patients do not die until they are treated
death <- ifelse(time_to_treatment+time_to_death<=time_to_treatment+censored, 1, 0)
# Data
data <- data.frame(id=1:n, fup, time_to_treatment, censored, time_to_death, death, group, severity)
```
## Descriptive table
The table below shows a descriptive summary of the study population, by group assignment.
```{r, echo=F, message=F, warning=F, results='asis'}
data$group <- factor(data$group, levels=0:1, labels=c("Short", "Long"))
```
```{r, echo=F, message=F, warning=F, results='asis'}
data %>% select(!c(id, censored, time_to_death)) %>%
tbl_summary(by=group, label=list(group ~ "Group assignment"))
```
```{r, echo=F, message=F, warning=F, results='asis'}
data$group <- as.numeric(data$group)-1
```
```{r, echo=F, message=F, warning=F, results='asis'}
data$group_lab <- NULL
```
## Method 1: Group assignment at time zero
Time zero was time of emergency admission. Treatment was assigned according to whether
patients had a short or long time to treatment. No patients were excluded from the analysis.
```{r, echo=T, message=F, warning=F}
# Kaplan-Meier survival curves
mod <- survfit(Surv(fup, death)~factor(group), data=data)
ggsurvplot(mod, data=data, palette=c("#CC0000", "black"), censor=F, pval=T, xlab="Days from emergency admission")
```
## Method 2: Random time to treatment assignment
Treatment was assigned according to whether patients had a short or long time to treatment.
Time zero was set at start of treatment, but time to treatment for the short group was
randomly replaced by a time to treatment from the time to treatment range of the long group.
```{r, echo=T, message=F, warning=F}
data_method2 <- data
# New time to treament for short group
data_method2$time_to_treatment_new <- data_method2$time_to_treatment
data_method2$time_to_treatment_new[data_method2$group==0] <-
runif(sum(data_method2$group==0), median(time_to_treatment), max(time_to_treatment))
# Exclusion of patients:
# If original follow-up time of patients < random treatment time
data_method2$exclude <- 0
# Patients in the short group who died
id_short <- data_method2$group==0 & data_method2$death==1
data_method2$exclude[data_method2$group==0 & data_method2$death==1] <-
ifelse(data_method2$fup[id_short]<data_method2$time_to_treatment_new[id_short], 1, 0)
```
`r sum(data_method2$exclude)` (`r paste0(round(sum(data_method2$exclude)/nrow(data_method2)*100, 1), "%")`) individuals from the short group who died before time zero were excluded.
```{r, echo=T, message=F, warning=F}
data_method2 <- data_method2 %>% filter(exclude==0)
# Time zero: Start of treatment
# Difference between new random treatment time - original treatment time (=zero for long group)
data_method2$fup <- pmin(data_method2$time_to_treatment_new-data_method2$time_to_treatment+data_method2$time_to_death,
data_method2$time_to_treatment_new-data_method2$time_to_treatment+data_method2$censored)
# Replace death indicator
data_method2$death <- ifelse(data_method2$time_to_treatment_new-data_method2$time_to_treatment+data_method2$time_to_death<=
data_method2$time_to_treatment_new-data_method2$time_to_treatment+data_method2$censored, 1, 0)
# Kaplan-Meier survival curves
mod <- survfit(Surv(fup, death)~factor(group), data=data_method2)
ggsurvplot(mod, data=data_method2, palette=c("#CC0000", "black"), censor=F, pval=T, xlab="Days from start of treatment")
```
## Method 3: Matched time to treatment assignment
Treatment was assigned according to whether patients had a short or long time to treatment. Time zero was set at start of treatment, but time to treatment for the short group was randomly replaced by a time to treatment from the time to treatment range of the long group. This methods corrects for time to treatment imbalances between the two groups (in contrast to method 2).
```{r, echo=T, message=F, warning=F}
data_method3 <- data
# Get time to treatment from long group
sample_time <- data_method3$time_to_treatment[data_method3$group==1]
# New time to treatment for short group: Matching from long group
data_method3$time_to_treatment_new <- data_method3$time_to_treatment
data_method3$time_to_treatment_new[data_method3$group==0] <-
sample(sample_time, length(data_method3$time_to_treatment[data_method3$group==0]))
# Exclusion of patients
data_method3$exclude <- 0
data_method3$exclude[data_method3$group==0 & data_method3$death==1] <-
ifelse(data_method3$fup[data_method3$group==0 & data_method3$death==1]<
data_method3$time_to_treatment_new[data_method3$group==0 & data_method3$death==1], 1, 0)
```
`r sum(data_method3$exclude)` (`r paste0(round(sum(data_method3$exclude)/nrow(data_method3)*100, 1), "%")`) individuals from the short group who die before time zero were excluded.
```{r, echo=T, message=F, warning=F}
data_method3 <- data_method3 %>% filter(exclude==0)
# Time zero: Start of treatment
# Difference between new matched treatment time - original treatment time (=zero for long group)
data_method3$fup <- pmin(data_method3$time_to_treatment_new-data_method3$time_to_treatment+data_method3$time_to_death,
data_method3$time_to_treatment_new-data_method3$time_to_treatment+data_method3$censored)
data_method3$fup <- pmin(data_method3$time_to_death, data_method3$censored)
# Replace death indicator
data_method3$death <- ifelse(data_method3$time_to_treatment_new-data_method3$time_to_treatment+data_method3$time_to_death<=
data_method3$time_to_treatment_new-data_method3$time_to_treatment+data_method3$censored, 1, 0)
# Kaplan-Meier survival curves
mod <- survfit(Surv(fup, death)~factor(group), data=data_method3)
ggsurvplot(mod, data=data_method3, palette=c("#CC0000", "black"), censor=F,
pval=T, xlab="Days from start of treatment")
```
## Method 4: Time zero at end of treatment
Treatment was assigned according to whether patients had a short or long time to treatment. End of treatment was defined at the last observed time to treatment (`r round(max(time_to_treatment), 1)` days). Individuals were followed-up from the end of treatment duration (time zero).
```{r, echo=T, message=F, warning=F}
data_method4 <- data
data_method4$exclude <- ifelse(data_method4$fup<max(time_to_treatment), 1, 0)
```
`r sum(data_method4$exclude)` (`r paste0(round(sum(data_method4$exclude)/nrow(data_method4)*100, 1), "%")`) individuals from both groups with a follow-uo time smaller than end of treatment duration were excluded.
```{r, echo=T, message=F, warning=F}
data_method4 <- data_method4 %>% filter(exclude==0)
# Shift time to follow-up
data_method4$fup <- data_method4$fup-max(time_to_treatment)
# Kaplan-Meier survival curves
mod <- survfit(Surv(fup, death)~factor(group), data=data_method4)
ggsurvplot(mod, data=data_method4, palette=c("#CC0000", "black"), censor=F,
pval=T, xlab="Days from end of treatment duration")
```
## Method 5: Time-dependent treatment
Treatment assignment was 0 ("short") as long as a patient had a time to treatment
smaller than the median time to treatment, and 1 ("long") otherwise. No individuals were excluded.
```{r, echo=T, message=F, warning=F}
library(splitstackshape)
# For "smoother" curves: Fup*10
multiplier <- 10
data$fup2 <- data$fup*multiplier
data_discrete_surv <- expandRows(data, count="fup2", drop=F) %>% arrange(id)
# Count indicator: How many follow-up days per individuals
data_discrete_surv <- data_discrete_surv %>% group_by(id) %>% mutate(ind=1, time=cumsum(ind)-1)
data_discrete_surv$ind <- NULL
# Maximal follow-up day
data_discrete_surv <- data_discrete_surv %>% group_by(id) %>% mutate(max_time=max(time))
# Correct death
data_discrete_surv$death[data_discrete_surv$time < data_discrete_surv$max_time] <- 0
# Shift days
data_discrete_surv <- data_discrete_surv %>% group_by(id) %>% mutate(time2=lead(time))
# Fill last day
data_discrete_surv <- data_discrete_surv %>% group_by(id) %>% fill(time2)
data_discrete_surv$time2[data_discrete_surv$time==data_discrete_surv$max_time] <-
data_discrete_surv$time2[data_discrete_surv$time==data_discrete_surv$max_time]+1
# Create time-dependent treatment
data_discrete_surv <- data_discrete_surv %>% group_by(id) %>% mutate(group_timedependent=0)
data_discrete_surv$group_timedependent[data_discrete_surv$time>=
data_discrete_surv$time_to_treatment*multiplier & data_discrete_surv$group==1] <- 1
# Kaplan-Meier survival curves
mod_km <- survfit(Surv(time=time/multiplier, time2=time2/multiplier, event=death)~
group_timedependent, data=data_discrete_surv, cluster = id)
ggsurvplot(mod_km, data=data_discrete_surv, palette=c("#CC0000", "black"),
censor=F, pval=F, xlab="Days from emergency admission")
# Cox model
cox_fit <- coxph(Surv(time=time/multiplier, time2=time2/multiplier, event=death)~
group_timedependent, data=data_discrete_surv, cluster = id)
```
The p-value from a Cox proportional hazard model comparing the two survival curves was p=`r round(summary(cox_fit)$coef[, "Pr(>|z|)"], 3)` (p-value from Schoenfeld test p=`r round(cox.zph(cox_fit)$table[1,3], 3)`).
## Method 6: Cloning
Each patient received the actual observed exposure or treatment and was cloned, receiving the not observed exposure or treatment. This led to two pseudopopulations, each with a size of `r n`` patients. Patients in one pseudopopulation (those having a short time to treatment duration) were censored if the time to treatment is longer than the median time to treatment. Patients in the pseudopopulation with a long time to treatment duration were censored if patients were alive and had a time to treatment shorter than the median time to treatment. Censoring can be seen as a protocol deviation (@maringe). Artificial censoring is addressed by inverse probability weighting (@robins, @hernan). No patients were excluded from the analysis.
```{r, echo=F, message=F, warning=F}
# Short group clone
data_clone0 <- data
# Long group clone
data_clone1 <- data
# Short group clone
# Censor if follow-up time > median time to treatment
data_clone0$censored <- 0
data_clone0$censored[data_clone0$group==1] <- 1
data_clone0$death[data_clone0$group==1] <- 0
data_clone0$fup[data_clone0$group==1] <- median(time_to_treatment)
data_clone0$group <- 0
# Long group clone
# Censor if follow-up time < median time to treatment and alive
data_clone1$censored <- 0
data_clone1$censored[data_clone0$group==0 & data_clone0$death==0] <- 1
data_clone1$group <- 1
# Combine both cloning data sets
data_clone <- bind_rows(data_clone0, data_clone1)
# Construction of inverse probability weights (IPW) for censoring
mod <- glm(censored ~ severity, data=data_clone, family=binomial())
data_clone$ipw <- NA
# Probability of being censored
data_clone$ipw <- predict(mod, data=data_clone, type="response")
# Probability of being censored
data_clone$ipw[data_clone$severity==0] <- 1-predict(mod, data=data_clone, type="response")[data_clone$severity==0]
# Set survey design
svy_design <- svydesign(id=~1, weights=~ipw, data=data_clone)
# IPW adjusted Kaplan-Meier
km_fit <- svykm(Surv(fup, death) ~ group, design=svy_design)
km_df <- data.frame(time=km_fit$`1`$time, surv=km_fit$`1`$surv, strata="group=1")
km_df <- bind_rows(km_df, data.frame(time=km_fit$`0`$time, surv=km_fit$`0`$surv, strata="group=0"))
ggsurvplot_df(km_df, palette=c("#CC0000", "black"), censor=F, pval=F, xlab="Days from emergency admission")
# Cox model
cox_fit <- svycoxph(Surv(fup, death) ~ group, design=svy_design)
```
The p-value from a Cox proportional hazard model comparing the two survival curves was p=`r round(summary(cox_fit)$coef[, "Pr(>|z|)"], 3)` (p-value from Schoenfeld test p=`r round(cox.zph(cox_fit)$table[1,3], 3)`).
# Conclusion
Immortal time bias is common in time to event analysis of observational data. In the present vignette we presented several methods for addressing immortal time bias. Similar to @zhou we conclude that the investigated methods 1 and 2 did not adequately address immortal time bias. In contrast to @zhou we found that method 3 did not adequately address immortal time bias too, which was also highlighted in @karim. Methods 4 to 6 were able to address immortal time bias. We recommend to compare some of the suggested methods in a real world analysis in sensitivity analyses and to be aware of their limitations (for example, exclusion of patients which leads to selection bias).
# Data simulation
```{r, echo=T, message=F, warning=F, results='asis'}
n <- 10000
set.seed(1)
# Time to treatment: Mean 10 days
time_to_treatment <- rweibull(n, 5, 10)
# Two equal groups based on time to treatment
group <- ifelse(time_to_treatment>median(time_to_treatment), 1, 0)
# More severe patients are treated more quickly
severity <- ifelse(runif(n)<plogis(qlogis(0.8)+log(0.2)*(group==1)), 1, 0)
# Hazard of dying: Independent of time to treatment and severity
haz <- 0.05
# Time to death: Exponential with rate=haz
time_to_death <- -log(runif(n))/haz
# Censoring: Mean 20 days
censored <- rweibull(n, 1, 20)
# Follow-up time: Patients do not die until they are treated
fup <- pmin(time_to_treatment+time_to_death, time_to_treatment+censored)
# Death indicator: Patients do not die until they are treated
death <- ifelse(time_to_treatment+time_to_death<=time_to_treatment+censored, 1, 0)
# Data
data <- data.frame(id=1:n, fup, time_to_treatment, censored, time_to_death, death, group, severity)
```