We have explained how to implement TTE through manual coding for an active-comparator study design. In this tutorial, we focus on a placebo-controlled design, where individuals in the control group may meet the eligibility criteria multiple times (Hernán and Robins 2016). This makes it unclear how to define their time zero.
We have simulated a data as an example to compare the effectiveness of ARB versus no anti-hypertensive medications on reducing the risk of cardiovascular disease (CVD) among subjects with hypertension with no history of chronic disease and no use of ARB medications during the previous 2 years. Table 1 shows the protocol of the target trial that I wish to run and my emulating plan with observational data side by side.
Recall the three key steps before implementing TTE:
We import the data we simulated.
obsdata1 <- readRDS("obsdata1trt.rds")
get_label <- function(x) {
lbl <- attr(x, "label", exact = TRUE)
if (is.null(lbl)) "" else as.character(lbl)
}
dict <- data.frame(
Variable = names(obsdata1),
Meaning = vapply(obsdata1, get_label, character(1)),
check.names = FALSE
)
knitr::kable(dict, caption = "Data Dictionary", row.names = FALSE)| Variable | Meaning |
|---|---|
| id | Patient ID |
| time | Time index for longitudinal records (months) |
| X1 | Non-ACEI or ARB antihypertensive medication use over time |
| X2 | Standardized systolic blood pressure over time |
| X3 | Biological sex (M=1, F=0) |
| X4 | Standardized diastolic blood pressure at baseline |
| age | Age over time (years) |
| A | Treatment indicator over time (ARB = 1, control = 0 ) |
| Y | Event indicator of cardiovascular disease |
| C | Indicator of early dropout / censoring |
Suppose treatment and covariates information are updated monthly in
our observational data, so we consider each month as a separate
enrollment period. For instance, the first enrollment period is
Jan. 2017, then Feb. 2017, and so on.
In contrast to emulating a single trial, we need to construct a pooled dataset by stacking the separate data from each trial. Here are a few things to keep in mind during this process:
Before start using the TrialEmulation R package, it is crucial to understand the process being carried out in the data_preparation function. We explain the steps using a toy example where we only consider three enrollment times: at the beginning of the overall study, month 1, and month 2.
obsdata2 <- obsdata1 %>%
group_by(id) %>%
mutate(eligible = as.integer(age >= 50 & cumsum(Y) == 0 & ( slide_dbl(
A,
sum,
.before = 24, # Look back 24 rows
.after = -1, # Exclude the current row
.complete = T
)==0) )) %>%
# Age over 50, no history of CVD, and no ACEI or ARB treatment in last 24 months.
mutate(eligible = ifelse(is.na(eligible), 0, eligible)) %>% #if unknown, then not eligible
ungroup()#find the first date when some individual become eligible. This is the start date that we can start enroll subjects
start.date=obsdata2%>%filter(eligible==1)%>%select(time)%>%unique()%>%summarise(start.date=min(time))
#for convenience purpose, convert time to months from this date
obsdata2 <- obsdata2 %>%
mutate(month = interval( start.date$start.date, time) %/% months(1)) # month zero is the original study baseline for the 1st trial
# Select eligible individuals
iligible1 <- obsdata2 %>%
filter(eligible == 1 & month == 0) %>% # time zero is the original study baseline for the 1st trial
select(id, A, X1, X2, X3, X4, age) %>%
rename(assigned_treatment=A, X1_0=X1, X2_0=X2, X3_0=X3, X4_0=X4, age_0=age) # baseline covariates in the 1st trial are the same as the baseline covariates of the study
trial.1 <- obsdata2 %>%
filter(id %in% iligible1$id & month >= 0) %>% # month zero is the original study baseline for the 1st trial
mutate(trial = 0, # create an emulated trial indicator
follow_up = month) %>% # no adjustment of follow-up time is needed since the 1st trial share the same baseline as the entire study
left_join(iligible1)
#> Joining with `by = join_by(id)`# Select eligible individuals
iligible2 <- obsdata2 %>%
filter(eligible == 1 & month == 1) %>% # time zero is month 1 for the 2nd trial
select(id, A, X1, X2, X3, X4, age) %>%
rename(assigned_treatment=A, X1_0=X1, X2_0=X2, X3_0=X3, X4_0=X4, age_0=age) # baseline covariates in the 2nd trial are the covariates at month 1
trial.2 <- obsdata2 %>%
filter(id %in% iligible2$id & month >= 1) %>% # time zero is month 1 for the 2nd trial
mutate(trial = 1, # create an emulated trial indicator
follow_up = month - 1) %>% # adjust the follow-up time by decreasing by 1
left_join(iligible2)
#> Joining with `by = join_by(id)`# Select eligible individuals
iligible3 <- obsdata2 %>%
filter(eligible == 1 & month == 2) %>% # time zero is month 2 for the 3rd trial
select(id, A, X1, X2, X3, X4, age) %>%
rename(assigned_treatment=A, X1_0=X1, X2_0=X2, X3_0=X3, X4_0=X4, age_0=age) # baseline covariates in the 3rd trial are the covariates at month 2
trial.3 <- obsdata2 %>%
filter(id %in% iligible3$id & month >= 2) %>%
mutate(trial = 2, # create an emulated trial indicator
follow_up = month - 2) %>% # adjust the follow-up time by decreasing by 2
left_join(iligible3)
#> Joining with `by = join_by(id)`obsdata2.all.trials <- data.frame(rbind(trial.1, trial.2, trial.3)) %>%
rename(trial_period = trial,
followup_time = follow_up,
treatment = A,
outcome = Y)
head(obsdata2.all.trials, n=10)
#> id time X1 X2 X3 X4 age treatment outcome C
#> 1 46 2000-01-01 0 0.86415249 1 -1.377567 68.93422 0 0 0
#> 2 46 2000-02-01 0 1.01366731 1 -1.377567 69.01756 0 0 0
#> 3 46 2000-03-01 1 2.68932942 1 -1.377567 69.10089 0 0 0
#> 4 46 2000-04-01 0 1.26360835 1 -1.377567 69.18422 0 0 0
#> 5 46 2000-05-01 0 1.55297235 1 -1.377567 69.26756 0 0 0
#> 6 46 2000-06-01 1 -2.70904796 1 -1.377567 69.35089 0 0 0
#> 7 46 2000-07-01 1 -2.04404444 1 -1.377567 69.43422 0 0 0
#> 8 46 2000-08-01 1 -2.53416818 1 -1.377567 69.51756 0 0 0
#> 9 46 2000-09-01 1 -2.48962305 1 -1.377567 69.60089 0 0 0
#> 10 46 2000-10-01 1 -0.09444011 1 -1.377567 69.68422 0 0 0
#> eligible month trial_period followup_time assigned_treatment X1_0 X2_0
#> 1 1 0 0 0 0 0 0.8641525
#> 2 1 1 0 1 0 0 0.8641525
#> 3 1 2 0 2 0 0 0.8641525
#> 4 1 3 0 3 0 0 0.8641525
#> 5 1 4 0 4 0 0 0.8641525
#> 6 1 5 0 5 0 0 0.8641525
#> 7 1 6 0 6 0 0 0.8641525
#> 8 1 7 0 7 0 0 0.8641525
#> 9 1 8 0 8 0 0 0.8641525
#> 10 1 9 0 9 0 0 0.8641525
#> X3_0 X4_0 age_0
#> 1 1 -1.377567 68.93422
#> 2 1 -1.377567 68.93422
#> 3 1 -1.377567 68.93422
#> 4 1 -1.377567 68.93422
#> 5 1 -1.377567 68.93422
#> 6 1 -1.377567 68.93422
#> 7 1 -1.377567 68.93422
#> 8 1 -1.377567 68.93422
#> 9 1 -1.377567 68.93422
#> 10 1 -1.377567 68.93422We now use the data_preparation function to prepare the data for emulating a sequence of trials and focus on the primary intention-to-treat estimand.
prep_ITT_data <- data_preparation(
data = obsdata2,
id = "id",
period = "month",
treatment = "A",
outcome = "Y",
eligible = "eligible", # indicator of eligibility for the target trial at that visit/period
estimand_type = "ITT",
outcome_cov = ~ X1 + X2 + X3 + X4 + age,
model_var = "assigned_treatment",
use_censor_weights = F,
first_period = 0,
last_period = 2,
quiet = TRUE,
control = list(maxit = 100))
dt <- data.frame(prep_ITT_data$data)
dt <- dt %>%
rename(X1_0=X1, X2_0=X2, X3_0=X3, X4_0=X4, age_0=age) %>%
arrange(trial_period, id, followup_time)Let us compare the data sets prepared on our own and using the data_preparation function
table(dt$trial_period==obsdata2.all.trials$trial_period)
#>
#> TRUE
#> 1026
table(dt$id==obsdata2.all.trials$id)
#>
#> TRUE
#> 1026
table(dt$followup_time==obsdata2.all.trials$followup_time)
#>
#> TRUE
#> 1026
table(dt$treatment==obsdata2.all.trials$treatment)
#>
#> TRUE
#> 1026
table(dt$outcome==obsdata2.all.trials$outcome)
#>
#> TRUE
#> 1006
table(dt$age_0==obsdata2.all.trials$age_0)
#>
#> TRUE
#> 1026
table(dt$X1_0==obsdata2.all.trials$X1_0)
#>
#> TRUE
#> 1026
table(dt$X2_0==obsdata2.all.trials$X2_0)
#>
#> TRUE
#> 1026
table(dt$X3_0==obsdata2.all.trials$X3_0)
#>
#> TRUE
#> 1026
table(dt$X4_0==obsdata2.all.trials$X4_0)
#>
#> TRUE
#> 1026It shows that all the variable are the same between the two data sets though finer checking can be made. Both datasets are ready to be used for downstream analyses now.
The research reported in this publication was supported in part by the National Center for Advancing Translational Sciences of the National Institutes of Health under Award Number UM1 TR 004409. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.