## Combining and comparing models using Model Confidence Set

By Gabriel Vasconcelos

In many cases, especially if you are dealing with forecasting models, it is natural to use a large set of models to forecast the same variable and then select the best model using some error measure. For example, you can break the sample into a training sample (in-sample) and a test sample (out-of-sample), estimate all models in the training sample and see how the perform in the test sample. You could compare the models using the root mean squared error (RMSE) or the mean absolute error (MAE).

If you want to make things a little more formal, then you can use some statistical test such as Diebold-Mariano to compare the models pairwise. This test will tell you if model A is better than model B. If you have $n$ models to compare you will need to perform $n(n-1)/2$ tests. If  $n$ is large you will have a lot of test results to look at and reporting the results will become a problem (imagine that you have 100 models, you will have 4950 p-values to report).

Fortunately, Hansen, Lunde and Nason proposed (here) a suitable solution in an article in 2011. Their solution is called Model Confidence Set (MCS). The MCS has an interpretation similar to a confidence interval, i. e. the $MCS_{1-\alpha}$ contains the best model with $100(1-\alpha)\%$ confidence. The number of models in the MCS increases as we decrease $\alpha$ just like the size of a confidence interval. You only need to perform the MCS procedure once to compare all models and you have to report only the p-value for each model, which is an output of the test.

## Application

Before going to the application, let us define a function that calculates the RMSE for several models at the same time, which we will be using a lot.

rmse=function(real,...){
forecasts=cbind(...)
sqrt(colMeans((real-forecasts)^2))
}


To the application!! We are going to generate some data with the following design:

• 300 observations ($t=1,\dots,300$) and 8 variables ($n=1,\dots,8$),
• All variables are generated from $y_{i,t}=\phi y_{i,t-1}+\lambda_if_t +\varepsilon_t$ with $\phi=0.6$, $\lambda$ is sampled from a uniform(-1,1) distribution and $\varepsilon_{i,t}$ is Normal with mean equals 0 and variance equals 4,
• $f_t$ is a common factor generated from $f_{t}=\rho f_{t-1}+\epsilon_t$ with $\rho=0.75$ and $\epsilon_t$ is Normal with mean equals 0 and variance equals 4,
• I generated two independent factors: four variables are generated from factor 1 and the remaining four variables are generated from factor 2.
• We assume that we do not know which factor generates each variable.
library(reshape2)
library(grid)
library(gridExtra)
library(ggplot2)
library(boot)
T = 300 # = number of observations = #
n = 8 # = Number of Variables = #
rho = 0.75 # = factors AR coefficient = #
phi = 0.6 # = Variables AR coefficient = #

set.seed(1) # = seed for replication = #

# = Generate two autoregressive factors = #
f = matrix(0, T, 2)
for(i in 2:T){
f[i, ] = rho * f[i-1, ] + rnorm(2)
}

# = Generate variables = #
Y = matrix(0, T, n)
for(i in 2:T){
# = Based on the first factor = #
Y[i, 1:4] = phi * Y[i-1, 1:4] + lambda[1:4] * f[i, 1] + rnorm(4, 0, 2)
# = Based on the second factor = #
Y[i, 5:8] = phi * Y[i-1, 5:8] + lambda[5:8] * f[i,2] + rnorm(4, 0, 2)
}

# = Organize variable lags = #
# = y is the first (dependent) variable = #
Y = as.data.frame(embed(Y, 2)[ , c(1, 9:16)])
# = y is the first (dependent) variable = #
colnames(Y) = c("y", paste("x", 1:n, sep = ""))


Now that we have our data, we are going to estimate all possible combinations of linear regressions using our 8 variables. The first variable will be the dependent variable, defined as $y$. The explanatory variables are the first lags of all eight variables (including the first), defined as $x_i, i=1,\dots,8$. This design results in 255 models to be estimated and compared.

# = First we will generate formulas for all the 255 equations = #
variables = colnames(Y)[-1]
combinations = lapply(1:n, function(x)combn(1:n, x))
formulas = lapply(combinations, function(y){
apply(y, 2, function(x){
paste("~", paste(variables[x], collapse = " + "), sep=" ")
})
})
formulas = unlist(formulas)

# = Now we break the sample into in-sample and out-of-sample = #
in_sample = Y[1:100, ]
out_of_sample = Y[-c(1:100), ]

# = Estimate the models = #
models = lapply(formulas, function(x) lm(paste("y", x), data = in_sample))
# = Compute forecasts = #
forecasts = Reduce(cbind, lapply(models, function(x) predict(x, out_of_sample)))
colnames(forecasts) = paste("model", 1:ncol(forecasts), sep = "")

# RMSES (Mean of all models, model with all variables)
rmse(out_of_sample$y,rowMeans(forecasts),forecasts[,ncol(forecasts)])  ## [1] 2.225175 2.322508  As you can see, the average forecast is more accurate than the model using all 8 variables. Now we are going to compute the MCS using the function below. If you want more details on the algorithm, just look at the original article here. mcs=function(Loss,R,l){ LbarB=tsboot(Loss,colMeans,R=R,sim="fixed",l=l)$t
Lbar=colMeans(Loss)
zeta.Bi=t(t(LbarB)-Lbar)

save.res=c()
for(j in 1:(ncol(Loss)-1)){
Lbardot=mean(Lbar)

t.stat=(Lbar-Lbardot)/sqrt(vard)
t.max=max(t.stat)
model.t.max=which(t.stat==t.max)

t.max.b=apply(t.stat.b,1,max)

p=length(which(t.max&lt;t.max.b))/R

save.res=c(save.res,p)
names(save.res)[j]=names(model.t.max)

Lbar=Lbar[-model.t.max]
zeta.Bi=zeta.Bi[,-model.t.max]

}
save.res=c(save.res,1)
names(save.res)[j+1]=names(Lbar)
save.p=save.res
for(i in 2:(length(save.res)-1)){
save.p[i]=max(save.res[i-1],save.res[i])
}
aux=match(colnames(Loss),names(save.p))
save.p=save.p[aux]
save.res=save.res[aux]
return(list(test=save.p,individual=save.res))
}


The MCS is computed in a single line of code, but first we are going to break the out-of-sample into two sub-samples. We are going to compute the MCS in the first sub-sample and compare the models in the second sub-sample. The MCS function returns the p-value needed to include each model in the set. For example, if model1 has a p-value=0.1, it will be included in the MCS of 90\% confidence.

# = generate 2 subsamples = #
out1 = out_of_sample[1:100, ]
out2 = out_of_sample[-c(1:100), ]
f1 = forecasts[1:100, ]
f2 = forecasts[-c(1:100),]

# = Compute MCS = #
set.seed(123) # = Seed for replication = #
MCS=mcs( Loss=(f1-out1$y)^2, R=1000, l=3 )$test # = R block bootstrap samples with block length = 3  = #
# = If you want to see the p-values write (print(sort(MCS)) = #
# = Selected models in the 50% MCS  = #
selected=which(MCS&gt;0.5)

# RMSE (Mean of all models, model with all variables, MCS models)
rmse(out2$y,rowMeans(f2),f2[,ncol(f2)],rowMeans(f2[,selected]))  ## [1] 2.142932 2.240110 2.013547  As you can see, if we combine only the models in the MCS the RMSE becomes smaller. This is not always the case, but it happens a lot. The combined results are better when the forecasts do not have a very high positive correlation. The MCS is not useful only to combine forecasts. Even if the combinations are not better than individual models, you still have a confidence set that includes the best model with the desired significance. If some model is excluded from the confidence set you have strong evidence that this model is worst than the ones in the set (depending on the confidence level, of course). Naturally, if you increase the confidence level, the number of models in the set also increases just like in confidence intervals. Finally, we can find some interesting relationships in the plots below. The first plot shows the forecasting errors as function of the confidence level. The optimal confidence level is around 55\%. If we increase it more the MCS starts to select a lot o bad models and the forecast becomes less accurate. The second plot shows the number of models as a function of the confidence level and the third plot shows the errors as a function of the number of models. alpha = seq(0,1,0.01) nmodels = sapply(alpha, function(x) length(which(MCS &gt;= x))) errors = sapply(alpha, function(x) rmse(out2$y, rowMeans(as.matrix(f2[ , which(MCS &gt;= x)]))))

df = data.frame(confidence = 1 - alpha, errors = errors, nmodels = nmodels)
dfmelt = melt(df, id.vars = "errors")

g1=ggplot(data=df) + geom_point(aes(x=confidence,y=errors))
g2=ggplot(data=df) + geom_point(aes(x=confidence,y=nmodels))
g3=ggplot(data=df) + geom_point(aes(x=nmodels,y=errors))
grid.arrange(g1,g2,g3,ncol=2)


## References

Hansen, Peter R., Asger Lunde, and James M. Nason. “The model confidence set.” Econometrica 79.2 (2011): 453-497.