************STATA CODE - CONTINUOUS ASPES METHOD
/*
Variable definitions
y = Outcome of interest
Ma = Actual mediator
Mp = Predicted mediator
T = Dummy variable to identify treatment group
Mp_T = Interaction term between T and Mp (defined at analysis stage - Part 2)
x* = Baseline student level covariates
z* = Baseline student level covariates
w* = Baseline student level covariates
*/
import excel using "mock_data.xlsx", firstrow clear
************************************************************************************************************************************
***PART 1: Identify Subgroup Membership, by Using a Cross-Validation Strategy which ensures symmetric prediction************************************************************************************************************************
/*The following example Stata code can be used to conduct the following steps:
(1) Randomly partition the sample into ten mutually-exclusive cross-validation groups.
(2) For treatment group members, use OLS regression to model the relationship between baseline characteristics and the mediator of interest ten times, each time leaving out one of the ten cross-validation groups and using the remaining 90 percent subsample for prediction.
(3) For each of the ten cross-validation groups, construct the predicted mediator using the parameters obtained from the model estimation that excluded their group. This step provides each participant with a continuous predicted mediator value based on baseline characteristics.
*/
*Create Prediction Groups, continuous ASPES
set seed 123456
quietly gen random_number=runiform() //generates random number between 0 and 1
sort random_number //sorts random_number from lowest to highest
count
scalar samplesize=r(N) //set "samplesize" equal to the number of observations
gen CV_group=1+floor((_n-1)/(samplesize*0.10)) //divide sample into 10 Cross Validation groups (denoted 1, 2, ..., 10) based on their value of random_number
summarize CV_group
local number_of_CV_groups=r(max) //number_of_CV_groups is set equal to 10 (the number of Cross Validation groups to which the sample is randomly assigned)
display `number_of_CV_groups'
***for each cross-validation group, out-of-sample predict the probability of subgroup membership***
quietly gen Mp=. //generate a predicted intermediate variable called Mp that is currently set to missing for all individuals
forval focalgroup=1/`number_of_CV_groups' {
reg Ma x* w* z* if CV_group!=`focalgroup' & T==1 //model the relationship between the actual value of the mediator of interest (Ma) and baseline covariates for all treatment group members not in the "focal group"
quietly predict Mp_focalgroup //predict values of the intermediate variable for the focal group
quietly replace Mp= Mp_focalgroup if CV_group==`focalgroup' //set the predicted intermediate variable (Mp) equal to the predicted value of the intermediate variable (where these values are stored in Mp_focalgroup for each focal group).
drop Mp_focalgroup //drop the predicted values of the intermediate variable before next loop
}
*Assessing Prediction Performance*
reg Ma Mp if T ==1, noconstant // estimate linear relationship between actual and predicted values.
twoway scatter Ma Mp || qfitci Ma Mp if T ==1 //graph relationship between actual and predicted values of the intermediate variable for the treatment group
****************************************************************************************************************************
*PART 2: Estimate the Relationship between the Predicted Intermediate Variable and Effect Size
**********************************************************************************************************************************
*Generate treatment X Mp interaction
gen Mp_T = T*Mp
*Estimate the Relationship between the Predicted Intermediate Variable and Effect Size
/*
We can interpret the model coefficients as follows:
_b[Mp_T] is the indirect effect of the treatment on impact, operating through the mediator; and
_b[T] + _b[Mp_T] is the total effect of treatment on impact
*/
regress y Mp T Mp_T x* w* z*
lincom _b[Mp_T]
lincom _b[T] + _b[Mp_T]