Counterfactual estimation on nonstationary data, be careful!!!

By Gabriel Vasconcelos

In a recent paper, which can be downloaded here, Carvalho, Masini and Medeiros show that estimating counterfactuals in a non-stationary framework (when I say non-stationary it means integrated) is a tricky task. It is intuitive that the models will not work properly in the absence of cointegration (spurious case), but what the authors show is that even with cointegration, the average treatment effect (ATE) converges to a non-standard distribution. As a result, standard tests on the ATE will identify treatment effects in several cases when there is no effect at all.

For those who are not familiar with counterfactual models: normally, these models have a treated unit (we want to know the effects of the treatment) and several untreated units that we call peers. These units may be cities, countries, companies, etc. Assuming that only one unit was treated and that there is no contamination, we can use information from the peers to project the behaviour of the treated unit as if it was not treated, which is the counterfactual. The models should also be able to identify if the treatment had no effects on the treated unit.

Simulation tests

Here I am going to generate some cointegrated data with no treatment to show the behaviour of the Synthetic Control and the ArCo (Other techniques will have the same features). The data is very simple and it is based on the mentioned article’s example. We are going to generate 100 observations of 10 time-series: 9 random walks and one that is the sum of these random walks plus an error. This last variable is the treated unit. However, I will not include any treatment in the data and I want the models capture this feature. The hypothesis to be tested is that at $t_0=51$ we had some treatment in the treated unit.

library(ArCo)
library(glmnet)
library(Synth)

T=100 # Number of Observations
n=10 # Number of Time Series
set.seed(1) # Seed for replication

# Generate random walk peers
peers=matrix(0,T,n-1)
for(i in 2:T){
peers[i,]=peers[i-1,]+rnorm(n-1)
}

# Generate the treated unit
y1=rowSums(peers)+rnorm(T)

# Put all the TS together
Y=cbind(y1,peers)

# Plot all TS. The black line is the "treated unit"
matplot(Y,type="l")


First, I am going to estimate the ArCo. The $delta show the average treatment effect with 95% the confidence interval. As you can see, the test indicated that the treatment was statistically significant (the zero is very far from the confidence interval). The plot also show a dislocated counterfactual (blue line). # Estimate the ArCo in the non-stationary data arco=fitArCo(list(Y),cv.glmnet,predict,treated.unity = 1,t0=51) arco$delta

##                  LB     delta        UB
## Variable1 -6.157713 -5.236705 -4.315697

plot(arco)


Now I am going to estimate the Synthetic Control. The Synth package does not have an implementation of a formal test. However, we can have a good idea from the counterfactual plot, which also indicates that the treatment was effective (even though there was no treatment).

# Estimate the Synthetic Control in the nonstationary data

# Put the data in the Synth package notation
sY=as.vector(Y)
timeid=rep(1:T,ncol(Y))
unitid=sort(rep(1:ncol(Y),T))
synthdata=data.frame(time=timeid,unit=unitid,Y=sY)
synthdataprep&amp;lt;-
dataprep(
foo = synthdata,
predictors = c("Y"),
predictors.op = "mean",
dependent = "Y",
unit.variable = "unit",
time.variable = "time",
treatment.identifier = 1,
controls.identifier = c(2:10),
time.predictors.prior = c(1:50),
time.optimize.ssr = c(1:50),
time.plot = 1:100
)
SC=synth(synthdataprep)

path.plot(SC,synthdataprep)
abline(v=50,col="blue",lty=2)


As you can see, both the ArCo and the Synthetic Control wrongly rejected the null hypothesis of no treatment. The best way to solve the problem is to take the first difference of all non-stationary time-series. However, there is an important thing to point out here. Imagine the random walk:

$\displaystyle y_t = y_{t-1} + \varepsilon_t$

Suppose we have a treatment in $t=t_0$ that simply makes $y_{t_0} = c + y_{t_0-1}+\varepsilon_{t_0}$ only for $t=t_0$. For any $t\neq t_0$ we will have $\Delta y_t=\varepsilon_t$ and for $t=t_0$ we will have $\Delta y_{t_0}=c+\varepsilon_{t_0}$. Therefore, if we take the first difference of $y_t$ the treatment will have an impact only at $t=t_0$, which makes it impossible for any counterfactual model to identify an intervention. In this example, we would need an intervention that changed the drift of the Random Walk in a way that for all $t\geq t_0$ we have:

$\displaystyle y_t=d+y_{t-1}+\varepsilon_t$

This will be better illustrated in an example. The plot below shows the difference between the two treatments. The red line is the case we can’t identify if we take the first difference. The Blue line changes the drift of the Random Walk, and therefore we can identify. It is impossible, with the tools I am using here, to identify if we are in the black line or in the red line.

y=rep(0,T)
yt1=yt2=y
d=1
c=10
for(i in 2:T){
e=rnorm(1)
y[i]=y[i-1]+e
if(i==51){
yt1[i]=c+yt1[i-1]+e
}else{
yt1[i]=yt1[i-1]+e
}
if(i&amp;gt;=51){
yt2[i]=d+yt2[i-1]+e
}else{
yt2[i]=yt2[i-1]+e
}
}
plot(yt2,type="l",col=4,ylab="")
lines(yt1,col=2)
lines(y,col=1)
legend("topleft",legend=c("untreated","treated by a constant in t0","treated in the drift"),col=c(1,2,4),bty="n",seg.len=0.5,cex=0.8,lty=1)


Now let’s get back to the first example. The results clearly indicated an intervention where there was none. Now I will show what happens if we take the first difference.

# Take the first difference
dY=diff(Y,1)
# Estimate the ArCo
darco=fitArCo(list(dY),cv.glmnet,predict,treated.unity = 1,t0=50)
darco$delta  ## LB delta UB ## Variable1 -0.6892162 0.02737802 0.7439722  plot(darco)  # Adjust the data to the Synth and estimate the model dsY=as.vector(dY) timeid=rep(2:T,ncol(Y)) unitid=sort(rep(1:ncol(Y),T-1)) dsynthdata=data.frame(time=timeid,unit=unitid,Y=dsY) dsynthdataprep&amp;lt;- dataprep( foo = dsynthdata, predictors = c("Y"), predictors.op = "mean", dependent = "Y", unit.variable = "unit", time.variable = "time", treatment.identifier = 1, controls.identifier = c(2:10), time.predictors.prior = c(2:50), time.optimize.ssr = c(2:50), time.plot = 2:100 ) dSC=synth(dsynthdataprep)  path.plot(dSC,dsynthdataprep) abline(v=50,col="blue",lty=2)  As you can see, the $delta in the ArCo showed that the treatment effect was statistically zero, which is true. The Synthetic Control plot show similar results even though the model adjustment was not very good.

Finally, note that we have two scenarios. If there is no intervention and we do not take the first difference we will probably wrongly reject the null hypothesis of no intervention. However, if we are in the “red line” case of the plot and we take the first difference, we will probably not reject the null when it may be false.

References

Carvalho, Carlos, Ricardo Masini, and Marcelo C. Medeiros. “The perils of counterfactual analysis with integrated processes.” (2016).