## When the LASSO fails???

By Gabriel Vasconcelos

## When the LASSO fails?

The LASSO has two important uses, the first is forecasting and the second is variable selection. We are going to talk about the second. The variable selection objective is to recover the correct set of variables that generate the data or at least the best approximation given the candidate variables. The LASSO has attracted a lot of attention lately because it allows us to estimate a linear regression with thousands of variables and the model select the right ones for us. However, what many people ignore is when the LASSO fails.

Posted in R | | 21 Comments

## Non gaussian time-series, let’s handle it with score driven models!

By Henrique Helfer

## Motivation

Until very recently, only a very limited classes of feasible non Gaussian time series models were available. For example, one could use extensions of state space models to non Gaussian environments (see, for example, Durbin and Koopman (2012)), but extensive Monte Carlo simulation is required to numerically evaluate the conditional densities that define the estimation process of such models.

Posted in R | | 3 Comments

## Complete Subset Regressions, simple and powerful

By Gabriel Vasconcelos

The complete subset regressions (CSR) is a forecasting method proposed by Elliott, Gargano and Timmermann in 2013. It is as very simple but powerful technique. Suppose you have a set of variables and you want to forecast one of them using information from the others. If your variables are highly correlated and the variable you want to predict is noisy you will have collinearity problems and in-sample overfitting because the model will try to fit the noise.

## Bagging, the perfect solution for model instability

By Gabriel Vasconcelos

## Motivation

The name bagging comes from boostrap aggregating. It is a machine learning technique proposed by Breiman (1996) to increase stability in potentially unstable estimators. For example, suppose you want to run a regression with a few variables in two steps. First, you run the regression with all the variables in your data and select the significant ones. Second, you run a new regression using only the selected variables and compute the predictions.

## Problems of causal inference after selecting controls

By Gabriel Vasconcelos

## Inference after model selection

In many cases, when we want to estimate some causal relationship between two variables we have to solve the problem of selecting the right control variables. If we fail, our results will be very fragile and the estimator potentially biased because we left some important control variables out. This problem is known as omitted variables bias. It happens because some variables correlated with our variable of interest were left out and went to the errors term, making the errors correlated with the variable of interest.

Posted in R | Tagged , , , , | 4 Comments

## Realy, Realy Big VARs

By Gabriel Vasconcelos

## Overview

If you have studied Vector Autorregressive (VAR) models you are probably familiar with the “curse of dimensionality” (CD). It is very frustrating to see how ordinary least squares (OLS) fails to produce reliable results even for moderate size VARs. For those who are new to VARs, the CD means that the number of parameters to estimate grow very fast with the size of the model. Consider the VAR(1):

$y_t=c+Ay_{t-1}+\varepsilon_t$

## New Publication: Real-time inflation forecasting with high-dimensional models: the case of Brazil

Check out our new publication on forecasting inflation using large datasets and statistical learning techniques.

Real-time inflation forecasting with high-dimensional models: The case of Brazil
International Journal of Forecasting (2017)
Márcio Garcia, Marcelo C. Medeiros, Gabriel F. R. Vasconcelos