Counterfactual estimation on nonstationary data, be careful!!!

By Gabriel Vasconcelos

In a recent paper, which can be downloaded here, Carvalho, Masini and Medeiros show that estimating counterfactuals in a non-stationary framework (when I say non-stationary it means integrated) is a tricky task. It is intuitive that the models will not work properly in the absence of cointegration (spurious case), but what the authors show is that even with cointegration, the average treatment effect (ATE) converges to a non-standard distribution. As a result, standard tests on the ATE will identify treatment effects in several cases when there is no effect at all.

For those who are not familiar with counterfactual models: normally, these models have a treated unit (we want to know the effects of the treatment) and several untreated units that we call peers. These units may be cities, countries, companies, etc. Assuming that only one unit was treated and that there is no contamination, we can use information from the peers to project the behaviour of the treated unit as if it was not treated, which is the counterfactual. The models should also be able to identify if the treatment had no effects on the treated unit.

Simulation tests

Here I am going to generate some cointegrated data with no treatment to show the behaviour of the Synthetic Control and the ArCo (Other techniques will have the same features). The data is very simple and it is based on the mentioned article’s example. We are going to generate 100 observations of 10 time-series: 9 random walks and one that is the sum of these random walks plus an error. This last variable is the treated unit. However, I will not include any treatment in the data and I want the models capture this feature. The hypothesis to be tested is that at t_0=51 we had some treatment in the treated unit.


T=100 # Number of Observations
n=10 # Number of Time Series
set.seed(1) # Seed for replication

# Generate random walk peers
for(i in 2:T){

# Generate the treated unit

# Put all the TS together

# Plot all TS. The black line is the "treated unit"

plot of chunk unnamed-chunk-16

First, I am going to estimate the ArCo. The $delta show the average treatment effect with 95% the confidence interval. As you can see, the test indicated that the treatment was statistically significant (the zero is very far from the confidence interval). The plot also show a dislocated counterfactual (blue line).

# Estimate the ArCo in the non-stationary data
arco=fitArCo(list(Y),cv.glmnet,predict,treated.unity = 1,t0=51)
##                  LB     delta        UB
## Variable1 -6.157713 -5.236705 -4.315697

plot of chunk unnamed-chunk-17

Now I am going to estimate the Synthetic Control. The Synth package does not have an implementation of a formal test. However, we can have a good idea from the counterfactual plot, which also indicates that the treatment was effective (even though there was no treatment).

# Estimate the Synthetic Control in the nonstationary data

# Put the data in the Synth package notation
    foo = synthdata,
    predictors = c("Y"),
    predictors.op = "mean",
    dependent = "Y",
    unit.variable = "unit",
    time.variable = "time",
    treatment.identifier = 1,
    controls.identifier = c(2:10),
    time.predictors.prior = c(1:50),
    time.optimize.ssr = c(1:50),
    time.plot = 1:100

plot of chunk unnamed-chunk-18

As you can see, both the ArCo and the Synthetic Control wrongly rejected the null hypothesis of no treatment. The best way to solve the problem is to take the first difference of all non-stationary time-series. However, there is an important thing to point out here. Imagine the random walk:

\displaystyle y_t = y_{t-1} + \varepsilon_t

Suppose we have a treatment in t=t_0 that simply makes y_{t_0} = c + y_{t_0-1}+\varepsilon_{t_0} only for t=t_0. For any t\neq t_0 we will have \Delta y_t=\varepsilon_t and for t=t_0 we will have \Delta y_{t_0}=c+\varepsilon_{t_0}. Therefore, if we take the first difference of y_t the treatment will have an impact only at t=t_0, which makes it impossible for any counterfactual model to identify an intervention. In this example, we would need an intervention that changed the drift of the Random Walk in a way that for all t\geq t_0 we have:

\displaystyle y_t=d+y_{t-1}+\varepsilon_t

This will be better illustrated in an example. The plot below shows the difference between the two treatments. The red line is the case we can’t identify if we take the first difference. The Blue line changes the drift of the Random Walk, and therefore we can identify. It is impossible, with the tools I am using here, to identify if we are in the black line or in the red line.

for(i in 2:T){
legend("topleft",legend=c("untreated","treated by a constant in t0","treated in the drift"),col=c(1,2,4),bty="n",seg.len=0.5,cex=0.8,lty=1)

plot of chunk unnamed-chunk-19

Now let’s get back to the first example. The results clearly indicated an intervention where there was none. Now I will show what happens if we take the first difference.

# Take the first difference
# Estimate the ArCo
darco=fitArCo(list(dY),cv.glmnet,predict,treated.unity = 1,t0=50)
##                   LB      delta        UB
## Variable1 -0.6892162 0.02737802 0.7439722

plot of chunk unnamed-chunk-20

# Adjust the data to the Synth and estimate the model

    foo = dsynthdata,
    predictors = c("Y"),
    predictors.op = "mean",
    dependent = "Y",
    unit.variable = "unit",
    time.variable = "time",
    treatment.identifier = 1,
    controls.identifier = c(2:10),
    time.predictors.prior = c(2:50),
    time.optimize.ssr = c(2:50),
    time.plot = 2:100


plot of chunk unnamed-chunk-21

As you can see, the $delta in the ArCo showed that the treatment effect was statistically zero, which is true. The Synthetic Control plot show similar results even though the model adjustment was not very good.

Finally, note that we have two scenarios. If there is no intervention and we do not take the first difference we will probably wrongly reject the null hypothesis of no intervention. However, if we are in the “red line” case of the plot and we take the first difference, we will probably not reject the null when it may be false.


Carvalho, Carlos, Ricardo Masini, and Marcelo C. Medeiros. “The perils of counterfactual analysis with integrated processes.” (2016).

This entry was posted in R and tagged , , , , , . Bookmark the permalink.

3 Responses to Counterfactual estimation on nonstationary data, be careful!!!

  1. Pingback: Counterfactual estimation on nonstationary data, be careful!!! – Cloud Data Architect

  2. Pingback: Counterfactual estimation on nonstationary data, be careful!!! – Mubashir Qasim

  3. Pingback: Counterfactual estimation on nonstationary data, be careful!!! | A bunch of data

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s