Causal Mediation in Additive Survival Models with the multimediate package

Tellez-Plaza, María

2025-08-08

We provide a short toy example that allows interested readers to reproduce the analysis from Ruiz-Hernandez et al. (2017) using the multimediate package which implements counterfactual multiple mediation analysis in the setting of additive survival.

We first download and install the package:

# Install from CRAN
install.packages("multimediate")

Now, we load the needed packages. We can verify that the package correctly uploaded by running packageVersion("multimediate"). Which should return [1] ‘0.1.4’.

suppressPackageStartupMessages(library(multimediate))
packageVersion("multimediate")
suppressPackageStartupMessages(library(timereg))

The toy dataset provided in this vignette is a subset of the study population used in the manuscript by Ruiz-Hernandez et al. (2017). The original dataset included imputation of missing completely at random urine cadmium values and blood lead values below the detection limit (for a deeper understanding of the imputation, see Ruiz-Hernandez et al. (2017)). Here we provided the subset of actually measured (i.e. unimputed) urine cadmium values.

Multiple mediation analysis with correlated=FALSE

In Ruiz-Hernandez et al. (2017), the estimated absolute reduction in cardiovascular deaths in the US comparing 1999-2004 to 1988-94 was 230.7 deaths/100,000 person-years in models adjusted for cardiovascular risk factors. The product of coefficients method estimated that, among those avoided deaths, 52.0 (95% CI 84.4, 96.7) and 19.4 (4.3, 36.4) deaths/100,000 person-years were attributable to changes in lead and cadmium, respectively.

The multimediate() function with correlated=FALSE option will reproduce the Ruiz-Hernandez et al. (2017) results, using instead the counterfactual frame, yielding fairly consistent findings.

# Please provide the corresponding path or place file in working directory.
nhanes <- read.csv("250825_Nhanes_Multimediate.csv")

# Multimediate requires that in the Aalen model, factors are recoded as dummy variables
nhanes$race.1 <- as.factor(ifelse(nhanes$race == 1, 1, 0))
nhanes$race.2 <- as.factor(ifelse(nhanes$race == 2, 1, 0))
nhanes$race.3 <- as.factor(ifelse(nhanes$race == 3, 1, 0))
nhanes$race.4 <- as.factor(ifelse(nhanes$race == 4, 1, 0))
nhanes$riagendr.1 <- as.factor(ifelse(nhanes$riagendr == 1, 1, 0))
nhanes$riagendr.2 <- as.factor(ifelse(nhanes$riagendr == 2, 1, 0))
nhanes$smoking.1 <- as.factor(ifelse(nhanes$smoking == 1, 1, 0))
nhanes$smoking.2 <- as.factor(ifelse(nhanes$smoking == 2, 1, 0))
nhanes$smoking.3 <- as.factor(ifelse(nhanes$smoking == 3, 1, 0))
nhanes$bmi_cat <- as.factor(I(nhanes$bmxbmi >= 30))
nhanes$lbxtc_cat <- as.factor(I(nhanes$lbxtc >= 200))
nhanes$lowhdl <- as.factor(nhanes$lowhdl)
nhanes$cholmed <- as.factor(nhanes$cholmed)
nhanes$hbp <- as.factor(nhanes$hbp)
nhanes$sedent <- as.factor(nhanes$sedent)
nhanes$diab <- as.factor(nhanes$diab)

# In this example the multiple mediators are continuous variables.
# We log-transform to improve normality.

nhanes$log.cr.ucd <- log(nhanes$cr.ucd) # Urine cadmium
nhanes$log.bpb <- log(nhanes$lbxbpb) # Blood lead

# Cautionary note: treatment variables must NOT BE factors
# In this example treatment is time period.

nhanes$wv.gr.1 <- ifelse(nhanes$wv.gr == 1, 1, 0) # Indicator for 1988-1994 survey
nhanes$wv.gr.2 <- ifelse(nhanes$wv.gr == 2, 1, 0) # Indicator for 1999-2004 survey

#> dim(nhanes)
#[1] 15421   788


# We first run the outcome model including indicator for 1999-2004 (treatment)
# the two mediators and confounders:

variables_y <- paste0(
  "Surv(ridageyr,peryr.age.8yr,cvd.8yr==1) ~ const(wv.gr.2) +
       const(log.bpb) + const(log.cr.ucd) + const(race.2) + 
       const(race.3)+const(race.4) + const(diab) + 
       const(bmi_cat) + const(riagendr.2) + const(smoking.2) + 
       const(smoking.3) + const(lbxtc_cat) + const(lowhdl) + 
       const(cholmed) + const(hbp)+const(sedent)"
)

model.y = aalen(
  as.formula(variables_y),
  start.time = 40,
  weights = nhanes$wtshm12yr,
  data = nhanes,
  robust = T
)

We subsequently run the mediator models. Cautionary note: the mediators must be entered in the same order as in the outcome model.

variables1 <- paste0("log.bpb ~ wv.gr.2 + riagendr.2 +  ridageyr +  race.2 + race.3 +
                           race.4 + smoking.2 + smoking.3")
variables2 <- paste0("log.cr.ucd ~ wv.gr.2 + riagendr.2 +  ridageyr + race.2 + race.3 +
                           race.4 + smoking.2 + smoking.3")
      
# NOTE: Here we have not adjusted one given mediator model for other correlated 
# mediators, since `multimediate` internally simulates marginal pair-wise correlations 
# between the multiple mediators and we want to compare results with correlated=TRUE
# option in next section. Generally, if we are to set the correlated=TRUE option it is 
# not recommended to adjust one given mediator for other correlated mediators.

modM1 = lm(variables1, data = nhanes)
modM2 = lm(variables2, data = nhanes)

lmodel.m=list(modM1, modM2)
      
est_1 <- as.data.frame(paste0(round(summary(modM1)$coef[2, 1], 2), ' (', round((
  summary(modM1)$coef[2, 1] - (summary(modM1)$coef[2, 2] * 1.96)
), 2), ', ', round((
  summary(modM1)$coef[2, 1] + (summary(modM1)$coef[2, 2] * 1.96)
), 2), ')'))

# Note that exp(est_1) will estimate the relative change in blood lead levels 
# comparing time periods

est_2 <- as.data.frame(paste0(round(summary(modM2)$coef[2,1],2), ' (', round((summary(modM2)$coef[2,1] -
                    (summary(modM2)$coef[2,2]*1.96)),2), ', ', round((summary(modM2)$coef[2,1] +
                    (summary(modM2)$coef[2,2]*1.96)),2), ')'))

# Note that exp(est_2) will estimate the relative change in urine cadmium levels 
# comparing time periods

Once we have fitted the outcome and the mediator models we can call the multimediate() function to estimate direct, indirect and total effects.

      
multim=multimediate(lmodel.m,correlated=FALSE,model.y,treat="wv.gr.2",treat.value=1,
                        control.value=0,J=1000,conf.level=0.95,data=nhanes)
     

sum <- suppressWarnings(summary(multim, option='avg'))
. Estimation IC.inf IC.sup P.val
ACME.joint.treat -86.4713 -146.9897 -27.1545 0.000
PM(treat) 0.4090 0.1112 0.9347 0.004
ACME.joint.control -86.4713 -146.9897 -27.1545 0.000
PM(control) 0.4090 0.1112 0.9347 0.004
ACME.treat.log.bpb -59.2512 -116.1802 -6.0117 0.030
PM(treat).log.bpb 0.2904 0.0152 0.8072 0.034
ACME.control.log.bpb -59.2512 -116.1802 -6.0117 0.030
PM(control).log.bpb 0.2904 0.0152 0.8072 0.034
ACME.treat.log.cr.ucd -27.2201 -46.2698 -8.1830 0.004
PM(treat).log.cr.ucd 0.1275 0.0319 0.3117 0.008
ACME.control.log.cr.ucd -27.2201 -46.2698 -8.1830 0.004
PM(control).log.cr.ucd 0.1275 0.0319 0.3117 0.008
ADE.treat -141.3910 -274.6978 -4.3872 0.048
ADE.control -141.3910 -274.6978 -4.3872 0.048
Total Effect -227.8623 -359.0293 -96.4896 0.004

Note that we set option='avg' for obtaining average mediation effects in the absence of exposure-mediator interactions. Also note the output table shows identical mediation effects for treatment and control because we have not included exposure-mediator terms in the outcome model.

We next run and outcome model without the mediator.


variables_y2 <- paste0("Surv(ridageyr,peryr.age.8yr,cvd.8yr==1) ~ const(wv.gr.2) +
              const(race.2)+const(race.3)+const(race.4)+const(diab)+const(bmi_cat)+const(riagendr.2)+
              const(smoking.2)+const(smoking.3)+const(lbxtc_cat)+const(lowhdl)+const(cholmed)+const(hbp)+const(sedent)")
 
model.y2 = aalen(as.formula(variables_y2), start.time=40,
                 weights=nhanes$wtshm12yr, data = nhanes, robust=T)
              
    m_y2 <- as.data.frame(as.matrix(paste0(round(model.y2$gamma[1,1]*100000,2), ' (', round((model.y2$gamma[1,1] - (sqrt(model.y2$robvar.gamma[1,1])*1.96))*100000,2), ', ', round((model.y2$gamma[1,1] + (sqrt(model.y2$robvar.gamma[1,1])*1.96))*100000,2), ')')))

We expect the coeffcient for time period will equal the total effect in the multimediate output (sanity check). Happily it does as shown next:

V1
-219.02 (-350.49, -87.54)

We repeated the re-sampling with replacement 1000 times and we display the distribution of absolute and relative mediation effects:

Next, we show interesting extensions of multiple mediation that we could not carry out with the difference and product of coefficient methods in the original Ruiz-Hernandez et al. (2017) paper.

Multiple mediation analysis with correlated=TRUE

We will allow for correlation between multiple mediators (in this case blood lead and urine cadmium, which are exposure biomarkers), which may share common sources of exposure and metabolic pathways.

multim=multimediate(lmodel.m,correlated=TRUE,model.y,treat="wv.gr.2",treat.value=1,
                        control.value=0,J=1000,conf.level=0.95,data=nhanes)
     

sum <- suppressWarnings(summary(multim, option='avg'))
. Estimation IC.inf IC.sup P.val
ACME.joint.treat -85.2629 -140.5979 -30.6086 0.006
PM(treat) 0.4130 0.1325 0.9030 0.008
ACME.joint.control -85.2629 -140.5979 -30.6086 0.006
PM(control) 0.4130 0.1325 0.9030 0.008
ACME.treat.log.bpb -57.9979 -110.3694 -9.9669 0.020
PM(treat).log.bpb 0.2883 0.0368 0.7761 0.024
ACME.control.log.bpb -57.9979 -110.3694 -9.9669 0.020
PM(control).log.bpb 0.2883 0.0368 0.7761 0.024
ACME.treat.log.cr.ucd -27.2649 -45.9354 -9.2364 0.004
PM(treat).log.cr.ucd 0.1361 0.0351 0.3375 0.004
ACME.control.log.cr.ucd -27.2649 -45.9354 -9.2364 0.004
PM(control).log.cr.ucd 0.1361 0.0351 0.3375 0.004
ADE.treat -142.1245 -275.5487 -12.1045 0.036
ADE.control -142.1245 -275.5487 -12.1045 0.036
Total Effect -227.3874 -359.0347 -92.6641 0.002

We observed that, in this case, the estimated mediation effects do not substantially change after allowing for correlation between multiple mediators. Thus, supporting the correlation between mediators is not relevant in the original Ruiz-Hernandez et al. (2017) paper.

Multiple mediation analysis including exposure-mediator interactions

Interestingly, multimediate can also accommodate exposure-mediator interactions because the counterfactual framework enables so.


variables_y <- paste0("Surv(ridageyr,peryr.age.8yr,cvd.8yr==1) ~ const(wv.gr.2) +
              const(log.cr.ucd)+const(log.bpb) + const(wv.gr.2*log.cr.ucd)  +const(race.2)+
              const(race.3)+ const(race.4)+ const(diab)+ const(bmi_cat)+ const(riagendr.2)+
              const(smoking.2)+const(smoking.3)+ const(lbxtc_cat)+const(lowhdl)+const(cholmed)+
                const(hbp)+const(sedent)")


model.y = aalen(as.formula(variables_y),start.time=40,
                weights=nhanes$wtshm12yr, data = nhanes, robust=T)
                      
                      
 
# Warning: at the current moment multimediate only takes one exposure-mediator 
# interaction at a time (work in progress to accomodate more). Note that the first 
# mediator model in the lmodel.m list must be the one with the interaction, and this 
# ordering must be consistent with the order mediators have been included in the outcome
# model.
        
variables1 <- paste0("log.cr.ucd ~ wv.gr.2 + riagendr.2 +  ridageyr + race.2 + race.3 +
                     race.4 + smoking.2 + smoking.3")
variables2 <- paste0("log.bpb ~  wv.gr.2 + riagendr.2 +  ridageyr +  race.2 + race.3 +
                     race.4 + smoking.2 + smoking.3")
      
    
modM1 = lm(variables1, data = nhanes)
modM2 = lm(variables2, data = nhanes)
lmodel.m=list(modM1, modM2)
      
    
multim=multimediate(lmodel.m,correlated=TRUE,model.y,treat="wv.gr.2",treat.value=1,
                             control.value=0,J=1000,conf.level=0.95,data=nhanes)

sum <- suppressWarnings(summary(multim, option='navg'))

Note the option option='navg' for obtaining average mediation effects in the presence of exposure-mediator interactions. Now, we obtain different mediation effects estimates for both NHANES periods.

. Estimation IC.inf IC.sup P.val
ACME.joint.treat -79.5085 -143.4931 -13.7257 0.018
PM(treat) 0.4219 0.0531 1.2139 0.020
ACME.joint.control -86.7132 -144.5966 -26.8414 0.006
PM(control) 0.4236 0.1185 0.9782 0.008
ACME.treat.log.cr.ucd -20.9532 -65.4880 21.5909 0.354
PM(treat).log.cr.ucd 0.1017 -0.1043 0.4388 0.354
ACME.control.log.cr.ucd -28.1579 -48.6183 -7.5227 0.006
PM(control).log.cr.ucd 0.1420 0.0311 0.4004 0.006
ACME.treat.log.bpb -58.5553 -110.9135 -6.8066 0.024
PM(treat).log.bpb 0.3046 0.0206 0.7632 0.028
ACME.control.log.bpb -58.5553 -110.9135 -6.8066 0.024
PM(control).log.bpb 0.3046 0.0206 0.7632 0.028
ADE.treat -145.3458 -291.4826 -0.3968 0.050
ADE.control -152.5505 -316.6510 22.3366 0.090
Total Effect -232.0590 -365.0424 -85.0617 0.002

The results suggest that at the cadmium exposure levels observed in NHANES 1999-2004, the association of cadmium with cardiovascular mortality might be weaker compared to the association at cadmium exposure levels observed in 1988-1994. The interaction p-value in the outcome model, however, is suggestive (equal to 0.09).

multimediate can do more!

Note that multimediate can also accommodate continuous and binary outcome models, as well as categorical or continuous exposures/treatments (Jérolon et al. (2020)). We could also think of modeling a non linear single mediator, by treating the mediator’s main and spline terms as correlated mediators. For the moment, multimediate should only be used for continuous mediators, but we have work in progress to accommodate categorical mediators and both.

References

Domingo-Relloso, Arce, Allan Jerolon, Maria Tellez-Plaza, and Jose D Bermudez. 2024. “Causal Mediation for Uncausally Related Mediators in the Context of Survival Analysis,” February. https://doi.org/10.1101/2024.02.16.24302923.
Jérolon, Allan, Laura Baglietto, Etienne Birmelé, Flora Alarcon, and Vittorio Perduca. 2020. “Causal Mediation Analysis in Presence of Multiple Mediators Uncausally Related.” The International Journal of Biostatistics 17 (2): 191–221. https://doi.org/10.1515/ijb-2019-0088.
Ruiz-Hernandez, Adrian, Ana Navas-Acien, Roberto Pastor-Barriuso, Ciprian M Crainiceanu, Josep Redon, Eliseo Guallar, and Maria Tellez-Plaza. 2017. “Declining Exposures to Lead and Cadmium Contribute to Explaining the Reduction of Cardiovascular Mortality in the US Population, 1988–2004.” International Journal of Epidemiology 46 (6): 1903–12. https://doi.org/10.1093/ije/dyx176.