Intro
On a recent vacation to Oahu taking a much needed break from life and work, it was not long until my mind felt the itch to start thinking about statistics again. Prior to vacation, I bombed a screening call with a recruiter who asked me about mixed market models. I had never encountered mixed market models and instead decided to spew my knowledge about mixed effect models. After the phone call, I found out the difference via some google scholaring.
On one hand, I was fascinated by the potential of mixed market models, particularly AdStock. I appreciated the value of visualizing the carryover effect of various channels of engagement on a longitudinal target variable.
On the other hand, I was underwhelmed not to have found a model that accounted for random effects to handle within-individual variance of a target variable. It seemed to be a lot of current AdStock mixed market models are adept at handling situations like the following:
XYZ Retail Company is a mid-sized e-commerce business specializing in eco-friendly products. Over the past few years, they have invested heavily in various digital marketing channels, including social media advertising, email marketing, and search engine marketing. With the growth of their customer base, they noticed fluctuations in sales that they couldn’t fully understand or predict. Despite increasing their marketing spend, XYZ Retail struggled to attribute sales increases accurately to specific channels. This ambiguity made it difficult for the marketing team to allocate their budget effectively. The company’s leadership sought a reliable method to assess the long-term effectiveness of their marketing strategies and understand how previous marketing efforts influenced future sales. AdStock Mixed Market Model was identified as an appropriate analytical approach to address XYZ Retail’s challenges. This model could help quantify the carryover effect of different marketing channels over time, taking into account both immediate impacts and longer-term influences on customer behavior.
However, I recently came across a scenario in the pharmaceutical industry where the goal was similar:
assess the long-term effectiveness of their clinical site engagement strategies and understand how previous marketing efforts influenced future sales.
However, instead of mass-deployed marketing channels, clinical study site engagements are very individually tailored. Clinical Research Associates (CRA’s) and Medical Science Liaisons (MSL’s) have custom talking points targeted towards a specific clinical study site and their current situation. For example, a site may be experiencing a barrier to patient recruitment, or a lack of understanding that could be fixed via a tailored knowledge share, or maybe they are doing really well finding patients but need vendor support.
Furthermore, in the clinical trial stages of drug development, there are no sales to keep track of, instead we measure patient recruitment events (screens and enrollments), a count variable! So, default mixed market models assuming a normally distributed target variable are out of the question.
Motivating Example
Site-targeted engagements can occur at different frequencies, sometimes as frequently as once a week, other times not at all for months. Furthermore, screens can be a recurring event at a clinical study site. For example, here is an example of a simulated clinical trial milestone swimlane chart of site-specific events:

Here, we assume sites cannot begin recruiting patients until they are activated. And site activations occur at different times throughout an ongoing trial. Furthermore, engagements (presented as phone calls and face to face visits) mainly occur at one site. Lastly, our target variable of interest, patient screens, is quite sparse.
The traditional mixed market model approach would treat the above raw data as one time series:

Due to the sparsity of the target variable, and the nature of the site-targeted engagements, it would appear from a modeling perspective that the single Face-to-Face visit was a significant contributor to the subsequent screening when we feed the data as one time series. However, we know from the raw data that these two events were independent from each other.
This possibly contrived example is what led me on vacation to formulate a random effect mixed market model for this particular use case. Not to mention the fact that random effects arise naturally in longitudinal data with repeated observations as they do with patients per site per day in a clinical study.
STAN Model Code
Once I got up to speed on the current state of mixed market models in practice, I soon realized I had to rely on Bayesian regression via STAN to handle the mixed effects and count target variable requirements of my proof of concept model.
As a complete novice to STAN, I found this presentation to be the most helpful resource to catch me up to speed on the underlying mechanisms and motivations of the STAN framework.
Next, I found this paper from Google particularly useful as it provided sample code in STAN to implement its single time series MMM use case on a normal target variable. This provided an excellent starting point for me to go in and tweak the model and its priors to my use case. After some trials and iterations, I ended up with this STAN file:
//poisson_log_adstock_mixed_model.stan
functions {
// the Hill function
real Hill(real t, real ec, real slope) {
return 1 / (1 + (t / ec)^(-slope));
}
// the adstock transformation with a vector of weights
real unweighted_adstock(row_vector t, row_vector weights) {
return dot_product(t, weights);
}
}
// mixed model with Poisson likelihood
data {
// number of observations
int < lower = 1 > N;
// number of non-engagement columns in X matrix
int < lower = 1 > num_ctrl;
// number of individuals
int < lower = 1 > N_indivs;
// subscripts indexing individuals
int < lower = 1 > indiv[N];
// count of events
int < lower=0 > y[N];
// X matrix without ones for intercept in column 1
matrix[N, num_ctrl] X_ctrl;
// the maximum duration of lag effect, in weeks
int<lower=1> max_lag;
// the number of media channels
int<lower=1> num_media;
// 3D array of media variables
// X: time [1:N]
// Y: engagement type [1:num_media]
// Z: lagged time [1:max_lag]
row_vector[max_lag] X_media[N, num_media];
}
transformed data {
// QR reparameterization for X_ctrl fixed matrix
matrix[N, num_ctrl] Q_ast;
matrix[num_ctrl, num_ctrl] R_ast;
matrix[num_ctrl, num_ctrl] R_ast_inverse;
// thin and scale the QR decomposition
Q_ast = qr_Q(X_ctrl)[, 1:num_ctrl] * sqrt(N - 1);
R_ast = qr_R(X_ctrl)[1:num_ctrl, ] / sqrt(N - 1);
R_ast_inverse = inverse(R_ast);
}
parameters {
real alpha; // coefficient of variation of random effects
real < lower=0 > sigma_indiv; // SD of random effects
vector[num_ctrl] theta; // coefficients on Q_ast
vector[N_indivs] xi; // random effects with standard normal prior
// the coefficients for media variables
vector<lower=0>[num_media] beta_medias;
// the retention rate and delay parameter for the adstock transformation of
// each media
vector<lower=0,upper=1>[num_media] retain_rate;
vector<lower=0,upper=max_lag-1>[num_media] delay;
// // Hill parameters
// vector<lower=0,upper=1>[num_media] ec;
// vector<lower=0>[num_media] slope;
}
transformed parameters {
vector[N] eta = Q_ast * theta;
vector[N_indivs] w = alpha + xi * sigma_indiv;
matrix[num_media, max_lag] lag_weights; // precomputed lag weights
row_vector[num_media] cum_effects[N];
row_vector[num_media] cum_effects_hill[N];
// Precompute lag weights
for (media in 1:num_media) {
for (lag in 1:max_lag) {
lag_weights[media, lag] = pow(retain_rate[media],
(lag - 1 - delay[media]) ^ 2);
}
}
// Compute cumulative media effects
for (nn in 1:N) {
for (media in 1:num_media) {
cum_effects_hill[nn, media] = unweighted_adstock(X_media[nn,media],
lag_weights[media]);
// cum_effects_hill[nn, media] = Hill(cum_effects[nn, media], ec[media], slope[media]);
}
}
}
model {
// half-cauchy prior, prior values taken from DIAL models
sigma_indiv ~ cauchy(0.7, 0.4);
alpha ~ normal(-3, 1);
theta ~ normal(0,1);
xi ~ normal(0, 1);
// slope ~ normal(1, 1);
// ec ~ beta(2,2);
beta_medias ~ normal(0, 1);
retain_rate ~ beta(4,2);
delay ~ uniform(0, max_lag);
for(i in 1:N) {
y[i] ~ poisson_log(eta[i] +
dot_product(cum_effects_hill[i,], beta_medias)+ w[indiv[i]]);
}
}
generated quantities {
vector[num_ctrl] beta;
beta = R_ast_inverse * theta; // coefficients on x
}
Here is a quick summary of key-takeaways I learned to land upon this final model:
- Hill Function: My fitted estimates were not matching with simulated data when both model and simulation were using the Hill(AdStock(x)) composite function like in the Google paper. The main difference here is we are simulating sparse Poisson count data in which the majority of time points observe a count of 0. The sparsity of this data would make it hard for the model to converge to the correct Hill parameters in my opinion. Therefore I did not simulate the Hill function for the simulated carryover effect and just stuck to the AdStock dot product with geometrically decaying weights.
- Unweighted AdStock: Similarly, with the Poisson log model, I was not able to satisfyingly agree my fitted parameters to the true simulation parameters when using the weighted ad-stock function as used in the Google paper’s sample code. Therefore, I modified the equation to be just the unweighted dot product.
- Poisson-log: I used to run rudimentary Poisson linear mixed models (GLMM) on actual pharmaceutical data assuming a fixed effect carryover window (with length derived from frequency of observed engagement) and confounding for fixed effects and individual random effects. Subsequent plots showed no sign of overdispersion which led me to feel comfortable developing this model using Poisson-log distribution.
Simulation Data Code
The following function:
- creates N time series that represents a weekly count time series for a simulated individual clinical study site.
- Each time series has a specified min and max length (min_obs_per_indiv and max_obs_per_indiv, respectively).
- Each simulated site is given an integer id between 1 and N.
- variable w_i contains the random normal intercepts for each site, standard deviation specified by individual_sd
- Two confounding variables are encoded for each site:
- days_since_activation: Continuous temporal variable that keeps track of the time since activation for each week worth of data
- confounder1: static binary variable with probability of success specified by prob_confounder1
- E.g. could represent whether site is in US or not
- Two types of engagement (CRA and MSL) are simulated using a binomial distribution with each engagement type associated with a separate probability of success (cra_rate and msl_rate respectively).
- Output target variable is simulated using log link function containing linear regression of fixed confounding effects added to carryover effects from engagements
- beta_0 represents fixed intercept
- beta_time represents fixed slope of log(days_since_activation + 1)
- beta_US represents fixed slope of confounder1 variable
- beta_cra represents fixed slope of CRA AdStock carryover effect
- beta_msl represents fixed slope of MSL AdStock carryover effect
- sd_noise is standard deviation of regression noise (residuals)
- cra_delay, msl_delay, cra_retain_rate, msl_retain_rate are AdStock parameters
- theta_scale_prior is the standard deviation for the Bayesian priors for confounding variable slopes in the STAN model
- max_lag is a right bound on the AdStock carryover effect
- i.e. we do not expect the carryover of any engagement to have any sort of meaningful effect after max_lag weeks.
library(rstan)
library(dplyr)
simulate_BASH_LMM_data <- function(min_obs_per_indiv,
max_obs_per_indiv,
prob_confounder1,
cra_rate,
msl_rate,
N,
beta_0,
beta_cra,
beta_msl,
beta_time,
beta_US,
sd_noise,
cra_delay,
msl_delay,
cra_retain_rate,
msl_retain_rate,
theta_scale_prior,
max_lag,
individual_sd,
output_dir){
# the total number of observations per individual
n_i = sample(min_obs_per_indiv:max_obs_per_indiv,N,replace = T)
indiv = c()
id = 1
for(ni in n_i){
indiv = append(indiv, rep(id,ni))
id = id+1
}
#site ID and days_since_activation vector
id = c()
days_since_activation = c()
confounder1 = c()
for(i in 1:N){
id = append(id, rep(i,n_i[i]))
days_since_activation = append(days_since_activation, 1:n_i[i])
if (rnorm(1) < qnorm(prob_confounder1)){
confounder1 = append(confounder1, rep(1,n_i[i]))
}else{
confounder1 = append(confounder1, rep(0,n_i[i]))
}
}
print("Simulating engagements")
#engagement flag variables
x_cra = sample(0:1,sum(n_i),replace=TRUE,prob=c(1-cra_rate,cra_rate))
x_msl = sample(0:1,sum(n_i),replace=TRUE,prob=c(1-msl_rate,msl_rate))
#input data frame
X = data.frame(SITE_NUM = id,
time=days_since_activation,
ecdp_eng_clinical_flag = x_cra,
ecdp_eng_medical_flag = x_msl,US=confounder1)
#simulate output
#random intercept per individual
z_0 = rnorm(N,mean = 0, sd = individual_sd)
#STAN Input data
N = nrow(X)
N_indivs = length(unique(X$SITE_NUM))
indiv = X$SITE_NUM
control_vars = c("time", "US")
# Create a control variables data frame
X_ctrl = X %>%
select(time, US) %>%
mutate(time = log(time+1))
drop_names = c("SITE_NUM")
drop_names =append(drop_names,names(X_ctrl))
# Identify media variables
media_vars <- setdiff(names(X),drop_names)
num_media = length(media_vars)
#2d media array with grouping var
X_eng = X[,append("SITE_NUM",media_vars)]
# Setup 3D array for media variables with lags
X_media <- array(0, c(nrow(X_eng), num_media, max_lag))
# Fill the unlagged time series for media
X_media[, , 1] <- as.matrix(X_eng[, media_vars])
# Compute lagged values and store in the 3D array
for (lag in 1:(max_lag-1)){
for (media in media_vars){
X_eng = X_eng %>% group_by(SITE_NUM) %>%
mutate("{media}_lag_{lag}" := dplyr::lag(!!sym(media),n=lag)) %>%
replace(is.na(.), 0)
}
X_media[,,lag+1] = as.array(as.matrix(X_eng[,(ncol(X_eng)-num_media+1):(ncol(X_eng))]))
}
# Initialize vectors
y <- numeric(N)
epsilon <- rnorm(N,sd = sd_noise)
delay <- c(cra_delay, msl_delay)
retain_rate <- c(cra_retain_rate, msl_retain_rate)
# Initialize matrices for cumulative effects with Hill transformation
cum_effects_hill <- matrix(0, nrow = N, ncol = num_media)
# Placeholder coefficients for media and control variables
beta_medias <- c(beta_cra, beta_msl)
gamma_ctrl <- c(beta_time, beta_US)
# Hill function
Hill <- function(t, ec, slope) {
return(1 / (1 + (t / ec)^(-slope)))
}
# Adstock transformation with a vector of weights
Adstock <- function(t, weights) {
return(sum(t * weights))
}
print("Simulating response variable")
# simulate response variable (normal assumption clipped > 0) with specified parameters
for (nn in 1:N) {
for (media in 1:num_media) {
lag_weights <- numeric(max_lag)
for (lag in 1:max_lag) {
lag_weights[lag] <- retain_rate[media]^((lag - 1 - delay[media])^2)
}
cum_effect <- Adstock(X_media[nn, media, ], lag_weights)
cum_effects_hill[nn, media] <- cum_effect
}
y[nn] <- rpois(1,exp(beta_0 +
sum(cum_effects_hill[nn, ] * beta_medias) +
sum(X_ctrl[nn,] * gamma_ctrl) + z_0[indiv[nn]] + epsilon[nn] ) )
}
write.csv(y, paste0(output_dir,"/y.csv"))
write.csv(X_ctrl, paste0(output_dir,"/X_ctrl.csv"))
write.csv(X_eng, paste0(output_dir,"/X_eng.csv"))
write.csv(cum_effects_hill, paste0(output_dir,"/cum_effects_hill.csv"))
write.csv(z_0[indiv], paste0(output_dir,"/indiv_effects.csv"))
write.csv(epsilon, paste0(output_dir,"/epsilon.csv"))
return(list("Y"=y,
"cum_effects_hill"=cum_effects_hill,
"N"=N,
"max_lag"=max_lag,
"num_media"=num_media,
"X_media"=X_media,
"X_eng"=X_eng,
"num_ctrl"=ncol(X_ctrl),
"X_ctrl"=X_ctrl,
"beta_medias"=beta_medias,
"gamma_ctrl"=gamma_ctrl,
"retain_rate"=retain_rate,
"delay"=delay,
"indiv"=indiv,
"N_indivs"=N_indivs
))
}
Next we have the function that initializes the STAN model, and feeds in our data in the expected formats for Bayesian sampling to estimate the parameters of interest, namely the fixed slopes (note beta_0 is represented by alpha in the STAN model), AdStock parameters: cra_delay, msl_delay, cra_retain_rate, msl_retain_rate and their associated beta’s.
The aforementioned parameters and fitted model are saved into the specified output_dir
BASH_LMM_sample <- function(gamma_ctrl,
beta_medias,
retain_rate,
delay,
X_ctrl,
N_indivs,
indiv,
priormean_icept,
theta_scale_prior,
y,
beta_0,
X_media,
num_media,
n_iter,
input_stan,
output_dir,
seed
){
time0 = Sys.time()
print("reading STAN file")
#poisson LMM
poisson_lmm_debug = stan_model(input_stan)
print("sampling begin")
options(mc.cores = parallel::detectCores())
fit <- sampling(poisson_lmm_debug,
list(N=nrow(X_ctrl),
num_ctrl=ncol(X_ctrl),
N_indivs=N_indivs,
indiv=indiv,
priormean_icept=priormean_icept,
theta_scale_icept = theta_scale_prior,
y=y,
X_ctrl=X_ctrl,
max_lag = dim(X_media)[3],
num_media=num_media,
X_media=X_media
), iter=n_iter
)
#summary of selected pararmsÂ
selected_summary <- data.frame(summary(fit, pars = c("alpha", "beta","beta_medias","retain_rate","delay"),
probs = c(0.1, 0.9))$summary)
actual_df <- data.frame(actual = c(beta_0,gamma_ctrl, beta_medias, retain_rate, delay))
selected_summary = cbind(actual_df,selected_summary)
selected_summary$runtime <- difftime(time0,Sys.time(),units="secs")
write.csv(selected_summary,paste0(output_dir,"/summary_table.csv"))
saveRDS(fit, paste0(output_dir,"/fit.rds"))
}
And lastly, the following function combines the prior two functions (simulation and sampling) into one for a single function call that takes a list of input_params:
simulate_and_sample_BASH_poisson <- function(input_params){
set.seed(input_params$seed)
dir.create(input_params$output_dir)
write.csv(unlist(input_params),paste0(input_params$output_dir,"/input_params.csv"))
BASH_sim_data = simulate_BASH_LMM_data(min_obs_per_indiv=input_params$min_obs_per_indiv,
max_obs_per_indiv=input_params$max_obs_per_indiv,
prob_confounder1=input_params$prob_confounder1,
cra_rate=input_params$cra_rate,
msl_rate=input_params$msl_rate,
N=input_params$N,
beta_0=input_params$beta_0,
beta_cra=input_params$beta_cra,
beta_msl=input_params$beta_msl,
beta_time=input_params$beta_time,
beta_US=input_params$beta_US,
sd_noise=input_params$sd_noise,
cra_delay=input_params$cra_delay,
msl_delay=input_params$msl_delay,
cra_retain_rate=input_params$cra_retain_rate,
msl_retain_rate=input_params$msl_retain_rate,
theta_scale_prior=input_params$theta_scale_prior,
max_lag=input_params$max_lag,
individual_sd=input_params$individual_sd,
output_dir=input_params$output_dir)
fit <- BASH_LMM_sample( gamma_ctrl=BASH_sim_data$gamma_ctrl,
beta_medias=BASH_sim_data$beta_medias,
retain_rate=BASH_sim_data$retain_rate,
delay=BASH_sim_data$delay,
X_ctrl=BASH_sim_data$X_ctrl,
N_indivs=BASH_sim_data$N_indivs,
indiv=BASH_sim_data$indiv,
priormean_icept=input_params$priormean_icept,
theta_scale_prior=input_params$theta_scale_prior,
y=BASH_sim_data$Y,
beta_0=input_params$beta_0,
X_media=BASH_sim_data$X_media,
num_media=BASH_sim_data$num_media,
n_iter=input_params$n_iter,
input_stan=input_params$input_stan,
output_dir=input_params$output_dir,
seed=input_params$seed)
}
Here are the inputs params I used to simulate a realistic synthetic dataset of clinical study-site engagement data with accompanying sampling parameters and final function call:
input_params <- list(min_obs_per_indiv=15,
max_obs_per_indiv=400,
prob_confounder1=0.5,
cra_rate=0.4,
msl_rate=0.2,
N=100,
beta_0=-1.5,
beta_cra=0.4,
beta_msl=0.2,
beta_time=-0.14,
beta_US=-0.33,
sd_noise=0.4,
cra_delay=1,
msl_delay=3,
cra_retain_rate=0.5,
msl_retain_rate=0.8,
priormean_icept=-1.5,
theta_scale_prior=0.5,
max_lag=13,
individual_sd=0.7,
n_iter=2000,
seed=9,
input_stan="BASH_Poisson_LMM_no_hill.stan",
output_dir = "simulation_results")
simulate_and_sample_BASH_poisson(input_params)
Here is the estimated vs actual (simulated) overall carryover effect for the first type of engagement (CRA):

Here we see the delay and geometric decay rate properly captured!
And for the second type of engagement (MSL):

The model was able to properly capture the retention rate and longer delay period despite simulating only half the MSL engagements as CRA engagements!
Lastly, the estimates for the intercept and confounding variables were not too far off either:

Conclusion
I hope the success of the model when faced with a realistic simulation can be repeated with other scenarios. With this current approach, we are able to visualize the AdStock carryover effect of individual-focused engagements while controlling for confounding variables and individual random effects (namely, intercept) regressed against a count variable. The model, as designed, can handle additional media and control variables in the respective design matrices. With some slight tweaks to the STAN code, adjustments to the model design such as adding a random slope and assuming a negative binomial distribution due to overdispersion.