Writing Julia functions in R with examples

By Gabriel Vasconcelos

The Julia programming language is growing fast and its efficiency and speed is now well-known. Even-though I think R is the best language for Data Science, sometimes we just need more. Modelling is an important part of Data Science and sometimes you may need to implement your own algorithms or adapt existing models to your problems.

Continue reading

Advertisements
Posted in R | Tagged , , , , | 8 Comments

Uber assignment with lpSolve

By Yuri Fonseca

In this post we are going to make an Uber assignment simulation and calculate some metrics of waiting time through simulation.

Continue reading

Posted in R | Tagged , , , , , , | 3 Comments

How Random Forests improve simple Regression Trees?

By Gabriel Vasconcelos

Regression Trees

In this post I am going to discuss some features of Regression Trees an Random Forests. Regression Trees are know to be very unstable, in other words, a small change in your data may drastically change your model. The Random Forest uses this instability as an advantage through bagging (you can see details about bagging here) resulting on a very stable model.

Continue reading

Posted in R | Tagged , , , , , , | 5 Comments

R Course in Rio de Janeiro

R Course in Rio de Janeiro

We are preparing an R for Data-Science course (direct link here) in partnership with the IBPAD (Brazilian Institute of Research and Data Analysis). It is a great course for those who want to have a solid start in R. No prior knowledge in statistics, calculus or programming is required.

Continue reading

Posted in R | Tagged , , , , , , | 4 Comments

Combining and comparing models using Model Confidence Set

By Gabriel Vasconcelos

In many cases, especially if you are dealing with forecasting models, it is natural to use a large set of models to forecast the same variable and then select the best model using some error measure. For example, you can break the sample into a training sample (in-sample) and a test sample (out-of-sample), estimate all models in the training sample and see how the perform in the test sample. You could compare the models using the root mean squared error (RMSE) or the mean absolute error (MAE).

Continue reading

Posted in R | Tagged , , , , , | 3 Comments

Pricing Optimization: How to find the price that maximizes your profit

By Yuri Fonseca

Basic idea

In this post we will discuss briefly about pricing optimization. The main idea behind this problem is the following question: As manager of a company/store, how much should I charge in order to maximize my revenue or profit?

Continue reading

Posted in R | Tagged , , , , , | 5 Comments

Treating your data: The old school vs tidyverse modern tools

By Gabriel Vasconcelos

When I first started using R there was no such thing as the tidyverse. Although some of the tidyverse packages were available independently, I learned to treat my data mostly using brute force combining pieces of information I had from several sources. It is very interesting to compare this old school programming with the tidyverse writing using the magrittr package. Even if you want to stay old school, tidyverse is here to stay and it is the first tool taught in many data science courses based on R.

Continue reading

Posted in R | Tagged , , , , , | 15 Comments