knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
options(warn = -1)
## Load packages
library(TrialEmulation)
library(dplyr)
library(tidyr)
library(ggplot2)
library(survival)
library(survminer)
library(lubridate)
library(cobalt)
working_dir <- getwd()
knitr::opts_knit$set(root.dir = working_dir)
# Set seed for reproducibility
set.seed(412)
The intention-to-treat (ITT) analysis estimates the effects of initiating one treatment versus another on an outcome of interest regardless of the subsequent adherence to the assigned treatment. In contrast, the per-protocol (PP) analysis estimates the effects of initiating and continuously adhering to the initial treatment, which is discussed in the Per-protocol Analysis tutorial.
Counterfactual framework is the foundation of modern causal inference and it defines causal effects as the contrast in potential outcomes. Let us use the research question from Hernán and Robins (2016) as an example:
What is the effect of hormone therapy on the 5-year risk of breast cancer among postmenopausal women?
Note that this is a causal question rather than an associational question since we are interested in contrasting the outcomes under two hypothetical scenarios
At the individual level, the potential outcomes \(Y(1)\) and \(Y(0)\) under two treatment are the outcomes that would be observed if the subject had taken hormone therapy and had never taken the hormone therapy, respectively. The contrast in the potential outcomes, either on the absolute difference scale or relative scale, is referred to as individual causal effect (ITE)
In comparison, at the population level, the potential mean outcomes \(E[Y(1)]\) and \(E[Y(0)]\) under two treatment are the average of the outcomes that would be observed if all subjects had taken hormone therapy and had never taken the hormone therapy, respectively. The contrast in these potential outcomes is referred to as average causal effect (ATE)
For both ITE and ATE, if the potential outcomes under the two treatment are the same, then there is no causal effect; otherwise, this is a causal effect
For an individual, we can only observe the outcome under one scenario, either under hormone therapy or no hormone therapy, but not both. Although this is an obstacle at the individual level, we can resolve this barrier at the population level:
Consider an ideal randomized control trial (RCT) where patients are perfectly randomized to hormone therapy versus usual care treatment
Under this set up, it is reasonable to say that patients from either treatment group can represent all the patients in the study since there is no difference in patient characteristics between treatment groups, thanks to randomization
Given this, the mean outcome in the hormone therapy group equals to the potential outcome had everyone in the study taken hormone therapy. Similarly, the mean outcome in the usual care group equals to the potential outcome had everyone in the study never taken usual care
Thus, we can observe both potential outcomes at the population level under an RCT design and estimate causal effects. This is a main reason why RCT is the gold standard for comparative effectiveness research (CER)
Many researchers think observational data is messy and a more challenging place to find reliable evidence, so let us summarize the advantages and disadvantages of RCTs and observational studies on a list of key considerations of a study below:
| Randomized Control Trial | Observational Study | |
|---|---|---|
| Time Zero | Clear and straightforward: Randomization | Not straightforward |
| Selection Bias | Some is prevented, e.g., prevalent user bias | Exist if eligibility, treatment initiation, and start of follow up are misaligned |
| Immortal-time bias | No | Yes if eligibility, treatment initiation, and start of follow up are misaligned |
| Confounding Bias | No | Yes |
| Missing Data | Less | More |
| Measurement Error | Less | More |
| Statistical Power | Limited | Rich |
| Patient Information | Limited | Rich |
| Generalizability | Limited | More generalizable |
| Cost | Millions of dollars | Existing data |
| Follow-up Time | Limited: 0.5-5 years | Long-term: 10-20 years |
| Ethical Concerns | More | Much Less |
According to the summary above, I argue that high-quality observational studies can complement findings from RCTs and provide actionable evidence when conducted robustly. Here are the reasons:
To leverage observational data for estimating causal treatment effects, we need to make the following identifibility assumptions:
Various propensity score methods, including matching, weighting, stratification, regression, doubly robust methods Wager and Athey (2018), marginal structural models, are widely used to estimate ATEs and they well explained by Miguel; Robins (2020). We focus on the inverse probability (IP) of treatment weighting approach (Horvitz and Thompson 1952) here and defer other methods to future tutorials:
Recall our previous example that aims to compare the effectiveness of ARB versus ACEI anti-hypertensive medications on reducing the risk of cardiovascular disease (CVD).
We emulated the target trial specified in Table 1 and prepared the dataset in the single-trial emulation session. Thus, we directly load the pre-processed data.
obsdata5 <- readRDS("obsdata5.rds")
get_label <- function(x) {
lbl <- attr(x, "label", exact = TRUE)
if (is.null(lbl)) "" else as.character(lbl)
}
dict <- data.frame(
Variable = names(obsdata5),
Meaning = vapply(obsdata5, get_label, character(1)),
check.names = FALSE
)
knitr::kable(dict, caption = "Data Dictionary", row.names = FALSE)
| Variable | Meaning |
|---|---|
| id | Patient ID |
| time | Time index for longitudinal records (months) |
| X1 | Non-ACEI or ARB antihypertensive medication use over time |
| X2 | Standardized systolic blood pressure over time |
| X3 | Biological sex (M=1, F=0) |
| X4 | Standardized diastolic blood pressure at baseline |
| age | Age over time (years) |
| A | Treatment indicator over time (ARB = 1, ACEI = 0; NA = neither) |
| Y | Event indicator of cardiovascular disease |
| C | Indicator of early dropout / censoring |
| follow_up | |
| assigned_treatment | Treatment indicator over time (ARB = 1, ACEI = 0; NA = neither) |
| X1_0 | Baseline value of X1 |
| X2_0 | Baseline value of X2 |
| X3_0 | Biological sex (M=1, F=0) |
| X4_0 | Standardized diastolic blood pressure at baseline |
| age_0 | Baseline value of age |
We now convert the long format data (multiple rows per subject) to wide format (one row per subject), retaining baseline covariates, treatment variable, follow-up time, and outcome status
# Find event/censoring time (min time of the two) for subjects who experience it
# Find follow-up time (max time) for subjects who never experience event/censoring
obsdata6 <- obsdata5 %>%
group_by(id) %>%
summarise(
event_or_cens = any(Y == 1 | C == 1, na.rm = TRUE),
follow_up = ifelse(event_or_cens,
min(follow_up[Y == 1 | C == 1], na.rm = TRUE),
max(follow_up, na.rm = TRUE)),
outcome = ifelse(any(Y == 1 & C == 0, na.rm = TRUE), 1, 0),
.groups = "drop") %>%
left_join(obsdata5, by = c("id", "follow_up")) %>%
select(id, assigned_treatment, X1_0, X2_0, X3, X4, age_0, follow_up, outcome) %>%
arrange(id)
# Check if all subjects are included
setequal(unique(obsdata5$id), unique(obsdata6$id))
#> [1] TRUE
In a time-to-event outcome analysis, the censoring mechanism for individuals who dropped out early plays an important role in deciding the appropriate statistical analysis. We consider the two scenarios below:
Below, we demonstrate the analysis procedures step-by-step for each scenario.
# Fit a logistic regression
ps_model <- glm(assigned_treatment ~ X1_0 + X2_0 + X3 + X4 + age_0,
data = obsdata6,
family = binomial)
# Calculate propensity scores
p <- predict(ps_model, type = "response")
# Calculate inverse probability of treatment weights (IPTW)
obsdata6 <- obsdata6 %>%
mutate(
p = p,
trt_weight = if_else(
assigned_treatment == 1,
1/p, # 1/P(treatment) for treated subjects
1/(1-p) # 1/(1-P(treatment)) for control subjects
))
# Visualize overlap
ps_plot <- obsdata6 %>%
mutate(treatment_group = factor(assigned_treatment,
levels = c(0, 1),
labels = c("Control", "Treated"))) %>%
ggplot(aes(x = p, fill = treatment_group, group = treatment_group)) +
theme_classic() +
geom_density(data = . %>% filter(treatment_group == "Treated"),
aes(y = after_stat(density)), alpha = 0.6) +
geom_density(data = . %>% filter(treatment_group == "Control"),
aes(y = -after_stat(density)), alpha = 0.6) +
xlab("Propensity Score") +
ylab("Density") +
labs(fill = "Treatment Group",
title = "Distribution of Propensity Scores") +
theme(plot.title = element_text(size = 10, face = "bold"))
print(ps_plot)
# Save untruncated weights for later use
obsdata6 <- obsdata6 %>%
mutate(trt_weight_untrunc = trt_weight)
trt_weight_cutoff <- quantile(obsdata6$trt_weight, 0.99)
obsdata6 <- obsdata6 %>%
mutate(trt_weight = ifelse(trt_weight > trt_weight_cutoff,
trt_weight_cutoff, trt_weight))
saveRDS(obsdata6, "obsdata6.rds")
bal_s2 <- bal.tab(assigned_treatment ~ X1_0 + X2_0 + X3 + X4 + age_0,
data = obsdata6, estimand = "ATE", un = TRUE, # Include unadjusted balance
weights = obsdata6$trt_weight,
disp.means = TRUE, disp.sds = TRUE)
love.plot(bal_s2, var.order = "unadjusted", abs = TRUE, thresholds = c(m = 0.1),
stats = "mean.diffs", sample.names = c("Unweighted", "Weighted"),
title = "Covariate Balance Before and After Weighting",
stars = "raw",
var.names = c(X1_0 = "anti-HTN meds_0",
X2_0 = "SBP_0",
X3 = "Sex",
X4 = "DBP",
age_0 = "Age_0")) +
theme(plot.title = element_text(size = 10), plot.caption = element_text(hjust = 0, size = 8)) +
labs(caption = "Abbreviations: anti-HTW meds_0 = antihypertensive medication use at baseline;\nSBP_0 = Standardized systolic blood pressure at baseline;\nDBP = Standardized diastolic blood pressure at baseline")
# Fit weighted CPH model
weighted_cox_model <- coxph(Surv(follow_up, outcome) ~ assigned_treatment,
data = obsdata6, weights = trt_weight)
summary(weighted_cox_model)
#> Call:
#> coxph(formula = Surv(follow_up, outcome) ~ assigned_treatment,
#> data = obsdata6, weights = trt_weight)
#>
#> n= 245, number of events= 29
#>
#> coef exp(coef) se(coef) robust se z Pr(>|z|)
#> assigned_treatment -1.1732 0.3094 0.3194 0.4369 -2.685 0.00725 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> exp(coef) exp(-coef) lower .95 upper .95
#> assigned_treatment 0.3094 3.232 0.1314 0.7284
#>
#> Concordance= 0.615 (se = 0.052 )
#> Likelihood ratio test= 15.02 on 1 df, p=1e-04
#> Wald test = 7.21 on 1 df, p=0.007
#> Score (logrank) test = 14.92 on 1 df, p=1e-04, Robust = 6.18 p=0.01
#>
#> (Note: the likelihood ratio and score tests assume independence of
#> observations within a cluster, the Wald and robust score tests do not).
##### Interpretation:
# Hazard ratio (HR) = exp(coef)
# coef < 0 & HR < 1: treatment is associated with lower risk.
# coef > 0 & HR > 1: treatment is associated with higher risk.
# P-value <= alpha = 0.05 & 95% confidence interval excluding 1 suggest a statistically significant effect (not in this case).
The hazard ratio (HR) is the estimand.
# Schoenfeld Residuals plot: A flat line indicates that the proportional hazards assumption holds.
plot(cox.zph(weighted_cox_model))
# Log-log plot: Parallel lines indicate proportional hazards assumption is met.
obsdata6.nonzero <- obsdata6 %>% filter(follow_up > 0)
surv_fit <- survfit(Surv(follow_up, outcome) ~ assigned_treatment, data = obsdata6.nonzero)
ggsurvplot(surv_fit, data = obsdata6, fun = "cloglog",
title = "Log-log Plot", xlab = "log(Time)",
legend.title = "Treatment Group",
legend.labs = c("Control", "Treatment"),
ggtheme = theme(legend.text = element_text(size = 9),
legend.title = element_text(size = 10)))
We directly work with the long format data in this scenario since we need to explicitly model the censoring mechanism.
# Fit logistic regression to predict censoring based on treatment and covariates
# Skip baseline observations since baseline censoring weight is always 1
# Note C=1: censored, but we want to know the probability of not being censored (C_neg=0)
obsdata5 <- obsdata5 %>% mutate(C_neg = 1-C)
censor_model <- glm(C_neg ~ A + X1_0 + X2_0 + X3 + X4 + age_0,
data <- obsdata5 %>% filter(follow_up > 0), family = binomial)
obsdata5 <- obsdata5 %>%
mutate(
censor_probs = predict(censor_model, newdata = ., type = "response"), # Calculate censoring probabilities
censor_weight = ifelse(C_neg == 1, # Calculate censoring weights (unstablized)
1 / censor_probs, # Not censored
1 / (1 - censor_probs)) # Censored
)
obsdata5 <- obsdata5 %>%
arrange(id, follow_up) %>%
group_by(id) %>%
mutate(cumulative_censor_weight = cumprod(censor_weight)) %>%
ungroup()
# Calculate summary statistics for cumulative censoring weights by treatment group
censor_weight_summary <- obsdata5 %>%
mutate(treatment_group = factor(assigned_treatment,
levels = c(0, 1),
labels = c("Control", "Treated"))) %>%
group_by(treatment_group) %>%
summarise(
min = min(cumulative_censor_weight, na.rm = TRUE),
max = max(cumulative_censor_weight, na.rm = TRUE),
mean = mean(cumulative_censor_weight, na.rm = TRUE),
sd = sd(cumulative_censor_weight, na.rm = TRUE),
.groups = 'drop')%>%
mutate(across(c(min, max, mean, sd), ~ round(.x, 2)))
print(censor_weight_summary)
#> # A tibble: 2 × 5
#> treatment_group min max mean sd
#> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 Control 1 1 1 0
#> 2 Treated 1 1 1 0
# Combined weight
obsdata5 <- obsdata5 %>%
left_join(obsdata6 %>% select(id, trt_weight_untrunc), by = "id") %>%
mutate(combi_weight = trt_weight_untrunc * cumulative_censor_weight)
comb_weight_cutoff <- quantile(obsdata5$combi_weight, 0.99, na.rm=T)
obsdata5 <- obsdata5 %>%
mutate(combi_weight = ifelse(combi_weight > comb_weight_cutoff, comb_weight_cutoff, combi_weight))
# Fit weighted pooled logistic regression (start from baseline time)
# Do not need to add X1_0 + X2_0 + X3 + X4 + age_0 as predictors for the unstablized weights case
weighted_iptw_ipcw_model <- glm(Y ~ assigned_treatment + follow_up + I(follow_up^2),
data = obsdata5 %>% filter(C != 1), # Exclude censored observations (C=1) because their outcomes are unknown
weights = combi_weight,
family = binomial)
summary(weighted_iptw_ipcw_model)
#>
#> Call:
#> glm(formula = Y ~ assigned_treatment + follow_up + I(follow_up^2),
#> family = binomial, data = obsdata5 %>% filter(C != 1), weights = combi_weight)
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -5.102e+00 5.478e-01 -9.313 <2e-16 ***
#> assigned_treatment -1.077e+00 5.194e-01 -2.073 0.0382 *
#> follow_up 1.721e-02 8.089e-03 2.127 0.0334 *
#> I(follow_up^2) -3.791e-05 2.676e-05 -1.417 0.1566
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 189.84 on 860 degrees of freedom
#> Residual deviance: 178.86 on 857 degrees of freedom
#> (1040 observations deleted due to missingness)
#> AIC: 178.81
#>
#> Number of Fisher Scoring iterations: 8
gen_cf_data <- function(data, treatment_value) {
data_cf <- data %>% mutate(assigned_treatment = treatment_value)
return(data_cf)
}
data_all_treated <- gen_cf_data(obsdata5 %>% filter(C != 1), 1)
data_all_untreated <- gen_cf_data(obsdata5 %>% filter(C != 1), 0)
# Calculate conditional cumulative incidences (combined weights)
calc_condit_cum_inc <- function(outcome_model, data) {
pred_data <- data %>% select(id, follow_up, assigned_treatment)
# Predict probabilities
pred_data$prob_outcome <- predict(outcome_model, newdata = pred_data, type = "response")
# Calculate cumulative incidence for each subject
cum_inc <- pred_data %>%
arrange(id, follow_up) %>%
group_by(id) %>%
mutate(
survival_prob = cumprod(1 - prob_outcome), # P(alive up to current time)
survival_prob_lag = lag(survival_prob, default = 1), # P(alive up to last time)
death_prob = survival_prob_lag * prob_outcome, # P(dead at current time)
cumulative_incidence = cumsum(death_prob)
) %>% ungroup()
return(cum_inc)
}
condit_cum_inc_treated <- calc_condit_cum_inc(weighted_iptw_ipcw_model, data_all_treated)
condit_cum_inc_untreated <- calc_condit_cum_inc(weighted_iptw_ipcw_model, data_all_untreated)
marginal_res <- bind_rows(
condit_cum_inc_treated %>%
group_by(follow_up) %>%
summarise(mean_cum_inc = mean(cumulative_incidence, na.rm = TRUE), .groups = 'drop') %>%
mutate(treatment = "Treated"),
condit_cum_inc_untreated %>%
group_by(follow_up) %>%
summarise(mean_cum_inc = mean(cumulative_incidence, na.rm = TRUE), .groups = 'drop') %>%
mutate(treatment = "Untreated")
)
# Calculate incidence differences and ratios
inc_comp <- marginal_res %>%
select(follow_up, treatment, mean_cum_inc) %>%
pivot_wider(names_from = treatment, values_from = mean_cum_inc) %>%
mutate(incidence_diff = Treated - Untreated, incidence_ratio = Treated / Untreated)
# Marginal Cumulative Incidences at different Time Points
key_times <- c(182, 365) # 6 months and 1 year
#find the closest time points in the data stratified by treatment
results <- marginal_res %>%
tidyr::crossing(assess_time = key_times) %>%
filter(follow_up >= assess_time) %>%
group_by(treatment, assess_time) %>%
slice_min(follow_up, n = 1, with_ties = FALSE) %>%
ungroup() %>%
select(treatment, assess_time, mean_cum_inc)%>%
arrange(treatment, assess_time)%>%
pivot_wider(names_from = treatment, values_from = mean_cum_inc)
results
#> # A tibble: 2 × 3
#> assess_time Treated Untreated
#> <dbl> <dbl> <dbl>
#> 1 182 0.0512 0.142
#> 2 365 0.116 0.300
# Risk Differences and Ratios at different Time Points
results1 <- inc_comp %>%
crossing(assess_time = key_times) %>%
filter(follow_up >= assess_time) %>%
group_by(assess_time) %>%
slice_min(follow_up, n = 1, with_ties = FALSE) %>%select(assess_time,Treated, Untreated, incidence_diff, incidence_ratio )%>%
ungroup()
results1
#> # A tibble: 2 × 5
#> assess_time Treated Untreated incidence_diff incidence_ratio
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 182 0.0512 0.142 -0.0906 0.361
#> 2 365 0.116 0.300 -0.184 0.385
This work was supported by Utah Clinical & Translational Science Institute (CTSI) Translational Innovation Pilot (TIP) Program Award (NCATS UM1TR004409).