Causal Mediation in Additive Survival Models with the multimediate package

Tellez-Plaza, María

2025-10-08

We provide a short toy example that allows interested readers to reproduce the analysis from Ruiz-Hernandez et al. (2017) using the multimediate package which implements counterfactual multiple mediation analysis in the setting of additive survival.

We first download and install the package:

# Install from CRAN
install.packages("multimediate")

Now, we load the needed packages. We can verify that the package correctly uploaded by running packageVersion("multimediate"). Which should return [1] ‘0.1.4’.

suppressPackageStartupMessages(library(multimediate))
packageVersion("multimediate")
suppressPackageStartupMessages(library(timereg))

The toy dataset provided in this vignette is a subset of the study population used in the manuscript by Ruiz-Hernandez et al. (2017). The original dataset included imputation of missing completely at random urine cadmium values and blood lead values below the detection limit (for a deeper understanding of the imputation, see Ruiz-Hernandez et al. (2017). Here we provided the subset of actually measured (i.e. unimputed) urine cadmium values. Also we extended the followup for mortality data collection to 10 years.

Multiple mediation analysis with correlated=FALSE

In Ruiz-Hernandez et al. (2017), the estimated absolute reduction in cardiovascular deaths in the US comparing 1999-2004 to 1988-94 was 230.7 deaths/100,000 person-years in models adjusted for cardiovascular risk factors. The product of coefficients method estimated that, among those avoided deaths, 52.0 (95% CI 84.4, 96.7) and 19.4 (4.3, 36.4) deaths/100,000 person-years were attributable to changes in lead and cadmium, respectively.

The multimediate() function with correlated=FALSE option will reproduce the Ruiz-Hernandez et al. (2017) results, using instead the counterfactual frame, yielding fairly consistent findings.

# Please provide the corresponding path or place file in working directory.
nhanes <- read.csv("251003_Nhanes_Multimediate.csv")

# Multimediate requires that in the Aalen model, factors are recoded as dummy variables
nhanes$race.1 <- as.factor(ifelse(nhanes$race == 1, 1, 0))
nhanes$race.2 <- as.factor(ifelse(nhanes$race == 2, 1, 0))
nhanes$race.3 <- as.factor(ifelse(nhanes$race == 3, 1, 0))
nhanes$race.4 <- as.factor(ifelse(nhanes$race == 4, 1, 0))
nhanes$riagendr.1 <- as.factor(ifelse(nhanes$riagendr == 1, 1, 0))
nhanes$riagendr.2 <- as.factor(ifelse(nhanes$riagendr == 2, 1, 0))
nhanes$smoking.1 <- as.factor(ifelse(nhanes$smoking == 1, 1, 0))
nhanes$smoking.2 <- as.factor(ifelse(nhanes$smoking == 2, 1, 0))
nhanes$smoking.3 <- as.factor(ifelse(nhanes$smoking == 3, 1, 0))
nhanes$bmi_cat <- as.factor(I(nhanes$bmxbmi >= 30))
nhanes$lbxtc_cat <- as.factor(I(nhanes$lbxtc >= 200))
nhanes$lowhdl <- as.factor(nhanes$lowhdl)
nhanes$cholmed <- as.factor(nhanes$cholmed)
nhanes$hbp <- as.factor(nhanes$hbp)
nhanes$sedent <- as.factor(nhanes$sedent)
nhanes$diab <- as.factor(nhanes$diab)

# In this example the multiple mediators are continuous variables.
# We log-transform to improve normality.

nhanes$log.cr.ucd <- log(nhanes$cr.ucd) # Urine cadmium
nhanes$log.bpb <- log(nhanes$lbxbpb) # Blood lead

# Cautionary note: treatment variables must NOT BE factors
# In this example treatment is time period.

nhanes$wv.gr.1 <- ifelse(nhanes$wv.gr == 1, 1, 0) # Indicator for 1988-1994 survey
nhanes$wv.gr.2 <- ifelse(nhanes$wv.gr == 2, 1, 0) # Indicator for 1999-2004 survey

dim(nhanes)

# We first run the outcome model including indicator for 1999-2004 (treatment)
# the two mediators and confounders:

variables_y <- paste0(
  "Surv(ridageyr,peryr.age.10yr,cvd.10yr==1) ~ const(wv.gr.2) +
       const(log.bpb) + const(log.cr.ucd) + const(race.2) + 
       const(race.3)+const(race.4) + const(diab) + 
       const(bmi_cat) + const(riagendr.2) + const(smoking.2) + 
       const(smoking.3) + const(lbxtc_cat) + const(lowhdl) + 
       const(cholmed) + const(hbp)+const(sedent)"
)

model.y = aalen(
  as.formula(variables_y),
  start.time = 40,
  weights = nhanes$wtshm12yr,
  data = nhanes,
  robust = T
)

We subsequently run the mediator models. Cautionary note: the mediators must be entered in the same order as in the outcome model.

variables1 <- paste0("log.bpb ~ wv.gr.2 + riagendr.2 +  ridageyr +  race.2 + race.3 +
                           race.4 + smoking.2 + smoking.3")
variables2 <- paste0("log.cr.ucd ~ wv.gr.2 + riagendr.2 +  ridageyr + race.2 + race.3 +
                           race.4 + smoking.2 + smoking.3")
      
# NOTE: Here we have not adjusted one given mediator model for other correlated 
# mediators, since `multimediate` internally simulates marginal pair-wise correlations 
# between the multiple mediators and we want to compare results with correlated=TRUE
# option in next section. Generally, if we are to set the correlated=TRUE option it is 
# not recommended to adjust one given mediator for other correlated mediators.

modM1 = lm(variables1, data = nhanes)
modM2 = lm(variables2, data = nhanes)

lmodel.m=list(modM1, modM2)
      
est_1 <- as.data.frame(paste0(round(summary(modM1)$coef[2, 1], 2), ' (', round((
  summary(modM1)$coef[2, 1] - (summary(modM1)$coef[2, 2] * 1.96)
), 2), ', ', round((
  summary(modM1)$coef[2, 1] + (summary(modM1)$coef[2, 2] * 1.96)
), 2), ')'))

# Note that exp(est_1) will estimate the relative change in blood lead levels 
# comparing time periods

est_2 <- as.data.frame(paste0(round(summary(modM2)$coef[2,1],2), ' (', round((summary(modM2)$coef[2,1] -
                    (summary(modM2)$coef[2,2]*1.96)),2), ', ', round((summary(modM2)$coef[2,1] +
                    (summary(modM2)$coef[2,2]*1.96)),2), ')'))

# Note that exp(est_2) will estimate the relative change in urine cadmium levels 
# comparing time periods

Once we have fitted the outcome and the mediator models we can call the multimediate() function to estimate direct, indirect and total effects.

      
multim=multimediate(lmodel.m,correlated=FALSE,model.y,treat="wv.gr.2",treat.value=1,
                        control.value=0,J=1000,conf.level=0.95,data=nhanes)
     

sum <- suppressWarnings(summary(multim, option='avg'))
. Estimation IC.inf IC.sup P.val
ACME.joint.treat -94.3997 -147.0478 -42.2592 0.000
PM(treat) 0.5207 0.1863 1.1778 0.000
ACME.joint.control -94.3997 -147.0478 -42.2592 0.000
PM(control) 0.5207 0.1863 1.1778 0.000
ACME.treat.log.bpb -66.0982 -116.9757 -15.9155 0.006
PM(treat).log.bpb 0.3509 0.0724 0.8268 0.006
ACME.control.log.bpb -66.0982 -116.9757 -15.9155 0.006
PM(control).log.bpb 0.3509 0.0724 0.8268 0.006
ACME.treat.log.cr.ucd -28.3015 -48.3761 -9.3259 0.002
PM(treat).log.cr.ucd 0.1586 0.0388 0.3838 0.002
ACME.control.log.cr.ucd -28.3015 -48.3761 -9.3259 0.002
PM(control).log.cr.ucd 0.1586 0.0388 0.3838 0.002
ADE.treat -111.5327 -241.2674 19.1708 0.098
ADE.control -111.5327 -241.2674 19.1708 0.098
Total Effect -205.9324 -320.6998 -94.3693 0.000

Note that we set option='avg' for obtaining average mediation effects in the absence of exposure-mediator interactions. Also note the output table shows identical mediation effects for treatment and control because we have not included exposure-mediator terms in the outcome model.

We next run and outcome model without the mediator.


variables_y2 <- paste0("Surv(ridageyr,peryr.age.10yr,cvd.10yr==1) ~ const(wv.gr.2) +
              const(race.2)+const(race.3)+const(race.4)+const(diab)+const(bmi_cat)+const(riagendr.2)+
              const(smoking.2)+const(smoking.3)+const(lbxtc_cat)+const(lowhdl)+const(cholmed)+const(hbp)+const(sedent)")
 
model.y2 = aalen(as.formula(variables_y2), start.time=40,
                 weights=nhanes$wtshm12yr, data = nhanes, robust=T)
              
    m_y2 <- as.data.frame(as.matrix(paste0(round(model.y2$gamma[1,1]*100000,2), ' (', round((model.y2$gamma[1,1] - (sqrt(model.y2$robvar.gamma[1,1])*1.96))*100000,2), ', ', round((model.y2$gamma[1,1] + (sqrt(model.y2$robvar.gamma[1,1])*1.96))*100000,2), ')')))

We expect the coefficient for time period will equal the total effect in the multimediate output (sanity check). Happily it does as shown next:

V1
-201.65 (-317.7, -85.61)

Next, we show interesting extensions of multiple mediation that we could not carry out with the difference and product of coefficient methods in the original Ruiz-Hernandez et al. (2017) paper.

Multiple mediation analysis with correlated=TRUE

We will allow for correlation between multiple mediators (in this case blood lead and urine cadmium, which are exposure biomarkers), which may share common sources of exposure and metabolic pathways.

multim=multimediate(lmodel.m,correlated=TRUE,model.y,treat="wv.gr.2",treat.value=1,
                        control.value=0,J=1000,conf.level=0.95,data=nhanes)
     

sum <- suppressWarnings(summary(multim, option='avg'))
. Estimation IC.inf IC.sup P.val
ACME.joint.treat -95.5885 -146.3312 -44.0272 0.000
PM(treat) 0.5137 0.1805 1.1810 0.000
ACME.joint.control -95.5885 -146.3312 -44.0272 0.000
PM(control) 0.5137 0.1805 1.1810 0.000
ACME.treat.log.bpb -67.0958 -116.6899 -15.8575 0.004
PM(treat).log.bpb 0.3580 0.0759 0.8411 0.004
ACME.control.log.bpb -67.0958 -116.6899 -15.8575 0.004
PM(control).log.bpb 0.3580 0.0759 0.8411 0.004
ACME.treat.log.cr.ucd -28.4927 -48.7361 -9.3144 0.002
PM(treat).log.cr.ucd 0.1528 0.0390 0.3529 0.002
ACME.control.log.cr.ucd -28.4927 -48.7361 -9.3144 0.002
PM(control).log.cr.ucd 0.1528 0.0390 0.3529 0.002
ADE.treat -112.5341 -246.3659 17.7682 0.088
ADE.control -112.5341 -246.3659 17.7682 0.088
Total Effect -208.1226 -320.3532 -95.7705 0.000

We observed that, in this case, the estimated mediation effects do not substantially change after allowing for correlation between multiple mediators. Thus, supporting the correlation between mediators is not relevant in the original Ruiz-Hernandez et al. (2017) paper.

Multiple mediation analysis including exposure-mediator interactions

Interestingly, multimediate can also accommodate exposure-mediator interactions because the counterfactual framework enables so.


variables_y <- paste0("Surv(ridageyr,peryr.age.10yr,cvd.10yr==1) ~ const(wv.gr.2) +
              const(log.bpb) + const(log.cr.ucd)+const(wv.gr.2*log.bpb)  +const(race.2)+
              const(race.3)+ const(race.4)+ const(diab)+ const(bmi_cat)+ const(riagendr.2)+
              const(smoking.2)+const(smoking.3)+ const(lbxtc_cat)+const(lowhdl)+const(cholmed)+
                const(hbp)+const(sedent)")


model.y = aalen(as.formula(variables_y),start.time=40,
                weights=nhanes$wtshm12yr, data = nhanes, robust=T)
                      
                      
 
# Warning: at the current moment multimediate only takes one exposure-mediator 
# interaction at a time (work in progress to accomodate more). Note that the first 
# mediator model in the lmodel.m list must be the one with the interaction, and this 
# ordering must be consistent with the order mediators have been included in the outcome
# model.
        
variables1 <- paste0("log.bpb ~  wv.gr.2 + riagendr.2 +  ridageyr +  race.2 + race.3 +
                     race.4 + smoking.2 + smoking.3")
variables2 <- paste0("log.cr.ucd ~ wv.gr.2 + riagendr.2 +  ridageyr + race.2 + race.3 +
                     race.4 + smoking.2 + smoking.3")
      
    
modM1 = lm(variables1, data = nhanes)
modM2 = lm(variables2, data = nhanes)
lmodel.m=list(modM1, modM2)
      
    
multim=multimediate(lmodel.m,correlated=TRUE,model.y,treat="wv.gr.2",treat.value=1,
                             control.value=0,J=1000,conf.level=0.95,data=nhanes)

sum <- suppressWarnings(summary(multim, option='navg'))

Note the option option='navg' for obtaining average mediation effects in the presence of exposure-mediator interactions. Now, we obtain different mediation effects estimates for both NHANES periods.

. Estimation IC.inf IC.sup P.val
ACME.joint.treat -58.3744 -143.3592 24.1236 0.140
PM(treat) 0.3390 -0.0947 1.0828 0.140
ACME.joint.control -119.7350 -170.3306 -67.5683 0.000
PM(control) 0.6118 0.2874 1.2170 0.000
ACME.treat.log.bpb -30.1140 -108.2133 47.5489 0.476
PM(treat).log.bpb 0.1542 -0.2536 0.6806 0.476
ACME.control.log.bpb -91.4746 -141.8467 -40.3954 0.000
PM(control).log.bpb 0.4674 0.1645 1.0498 0.000
ACME.treat.log.cr.ucd -28.2603 -47.8338 -9.0230 0.002
PM(treat).log.cr.ucd 0.1508 0.0399 0.3480 0.002
ACME.control.log.cr.ucd -28.2603 -47.8338 -9.0230 0.002
PM(control).log.cr.ucd 0.1508 0.0399 0.3480 0.002
ADE.treat -97.2185 -217.6141 22.8343 0.126
ADE.control -158.5790 -333.0215 7.8943 0.072
Total Effect -216.9534 -337.8046 -100.3597 0.000

The estimated mediation effects are similar as in the model not including the interaction. Indeed, the interaction p-value in the outcome model is far from being statistically significant (equal to 0.96). Thus, our results do not suppor exposure-mediator interaction for cadmium models.

multimediate can do more!

Note that multimediate can also accommodate continuous and binary outcome models, as well as categorical or continuous exposures/treatments (Jérolon et al. (2020)). We could also think of modeling a non linear single mediator, by treating the mediator’s main and spline terms as correlated mediators. For the moment, multimediate should only be used for continuous mediators, but we have work in progress to accommodate categorical mediators and both.

References

Domingo-Relloso, Arce, Allan Jerolon, Maria Tellez-Plaza, and Jose D Bermudez. 2024. “Causal Mediation for Uncausally Related Mediators in the Context of Survival Analysis,” February. https://doi.org/10.1101/2024.02.16.24302923.
Jérolon, Allan, Laura Baglietto, Etienne Birmelé, Flora Alarcon, and Vittorio Perduca. 2020. “Causal Mediation Analysis in Presence of Multiple Mediators Uncausally Related.” The International Journal of Biostatistics 17 (2): 191–221. https://doi.org/10.1515/ijb-2019-0088.
Ruiz-Hernandez, Adrian, Ana Navas-Acien, Roberto Pastor-Barriuso, Ciprian M Crainiceanu, Josep Redon, Eliseo Guallar, and Maria Tellez-Plaza. 2017. “Declining Exposures to Lead and Cadmium Contribute to Explaining the Reduction of Cardiovascular Mortality in the US Population, 1988–2004.” International Journal of Epidemiology 46 (6): 1903–12. https://doi.org/10.1093/ije/dyx176.