By Gabriel Vasconcelos and Yuri Fonseca
We are happy to introduce our new machine learning method called Boosting Smooth Trees (BooST) (full article here). This model was a joint work with professors Marcelo Medeiros and Álvaro Veiga. The BooST uses a different type of regression tree that allows us to estimate the derivatives of very general nonlinear models. In other words, the model is differentiable and it has an analytical solution. The consequence is that now we can estimate partial effects of a characteristic on the response variable, which provide us much more interpretation than traditional importance measures.
The (Artificial Counterfactual) ArCo package is now fully described in a paper in the R Journal (click here). There you can find details about the model, examples and applications on simulated and real data and a comparison with the Synthetic Control.
By Henrique Helfer Hoeltgebaum
I am happy to introduce the package HCmodelSets, which is now available on CRAN. This package implements the methods proposed by Cox, D.R. and Battey, H.S. (2017). In particular it performs the reduction, exploratory and model selection phases given in the aforementioned reference. The software supports linear regression, likelihood-based fitting of generalized linear regression models and the proportional hazards model fitted by partial likelihood.
In this previous post I discussed some of the parameters we have to tune to estimate a boosting model using the xgboost package. In this post I will discuss the two parameters that were left out in part I, which are the gamma and the min_child_weight. These two parameters are much less obvious to understand but they can significantly change the results. Unfortunately, the best way to set them changes from dataset to dataset and we have to test a few values to select the best model. Note that there are many other parameters in the xgboost package. I am only showing the ones I use more.
In the previous post about pricing optimization (link here), we discussed a little about linear demand and how to estimate optimal prices in that case. In this post we are going to compare three different types of demand models for homogeneous products and how to find optimal prices for each one of them.
By Gabriel Vasconcelos
Before we begin, I would like to thank Anuj for kindly including our blog in his list of the top40 R blogs! Check out the full list at his page, FeedSpot
Tuning a Boosting algorithm for the first time may be a very confusing task. There are so many parameters to choose and they all have different behaviour on the results. Also, the best choice may depends on the data. Every time I get a new dataset I learn something new. A good understanding of classification and regression trees (CART) is also helpful because we will be boosting trees, you can start here if you have no idea of what a CART is.
There are several ways to do portfolio optimization out there, each with its advantages and disadvantages. We already discussed some techniques here. Today I am going to show another method to perform portfolio optimization that works very well in large datasets because it produces very robust weights, which results in a good out-of-sample performance. This technique is called Parametric Portfolio Policies (PPP) and it was proposed by Brandt, Santa-Clara and Valkanov in 2009 (click here to read the full article).