By Gabriel Vasconcelos & Yuri Fonseca
This post is the second of a series of examples of the BooST (Boosting Smooth Trees) model. You can see an introduction to the model here and the first example here. Our objective in this post is to use the derivatives of the BooST to obtain prices that maximize the profit for a given set of products. We will use a very simple setup that we know the true optimal prices to compare with the estimated prices. The tricky thing here is that the demand functions we defined are for substitute products. Therefore, if we increase the price of product A it will affect positively the demand for product B.
This will be a short post about a simple, but very important concept that can drastically increase the speed of poorly written codes. It is very common to see R loops written as follows:
v = NULL
n = 1e5
for(i in 1:n) v = c(v, i)
This seems like a natural way to write such a task: at each iteration, we increase our vector v to add one more element to it.
This is the first of a series of post on the BooST (Boosting Smooth Trees). If you missed the first post introducing the model click here and if you want to see the full article click here. The BooST is a model that uses Smooth Trees as base learners, which makes it possible to approximate the derivative of the underlying model. In this post, we will show some examples on generated data of how the BooST approximates the derivatives and we also will discuss how the BooST may be a good choice when dealing with smooth functions if compared to the usual discrete Regression Trees.
By Gabriel Vasconcelos and Yuri Fonseca
We are happy to introduce our new machine learning method called Boosting Smooth Trees (BooST) (full article here). This model was a joint work with professors Marcelo Medeiros and Álvaro Veiga. The BooST uses a different type of regression tree that allows us to estimate the derivatives of very general nonlinear models. In other words, the model is differentiable and it has an analytical solution. The consequence is that now we can estimate partial effects of a characteristic on the response variable, which provide us much more interpretation than traditional importance measures.
The (Artificial Counterfactual) ArCo package is now fully described in a paper in the R Journal (click here). There you can find details about the model, examples and applications on simulated and real data and a comparison with the Synthetic Control.
By Henrique Helfer Hoeltgebaum
I am happy to introduce the package HCmodelSets, which is now available on CRAN. This package implements the methods proposed by Cox, D.R. and Battey, H.S. (2017). In particular it performs the reduction, exploratory and model selection phases given in the aforementioned reference. The software supports linear regression, likelihood-based fitting of generalized linear regression models and the proportional hazards model fitted by partial likelihood.
In this previous post I discussed some of the parameters we have to tune to estimate a boosting model using the xgboost package. In this post I will discuss the two parameters that were left out in part I, which are the gamma and the min_child_weight. These two parameters are much less obvious to understand but they can significantly change the results. Unfortunately, the best way to set them changes from dataset to dataset and we have to test a few values to select the best model. Note that there are many other parameters in the xgboost package. I am only showing the ones I use more.