Using the tuber package to analyse a YouTube channel

By Gabriel Vasconcelos

So I decided to have a quick look at the tuber package to extract YouTube data in R. My cousin is a singer (a hell of a good one) and he has a YouTube channel (dan vasc), which I strongly recommend, where he posts his covers. I will focus my analysis on his channel. The tuber package is very friendly and it downloads YouTube statistics on comments, views, likes and more straight to R using the YouTube API.

Continue reading

Advertisements
Posted in R | Tagged , , , , , , , , | 7 Comments

A crazy day in the Bitcoin World

By Gabriel Vasconcelos

Today, November 29, 2017 was a crazy day in the Bitcoin world and the craziness is still going on as I write this post. The price range was of thousands of Dollars in a few hours. Bitcoins were today the main topic in all discussion groups I participate. Some people believe we are in the middle of a giant bubble and are very skeptical about Bitcoins intrinsic value and other people believe cryptocurrencies are the future and are already counting on a price of hundreds of thousands of dollars in a few years. I am no expert and I have no idea which group is right, but I hope it is the second because I really like the Bitcoin idea as the money of the future.

Continue reading

Posted in R | Tagged , , , , , | 8 Comments

Formal ways to compare forecasting models: Rolling windows

By Gabriel Vasconcelos

Overview

When working with time-series forecasting we often have to choose between a few potential models and the best way is to test each model in pseudo-out-of-sample estimations. In other words, we simulate a forecasting situation where we drop some data from the estimation sample to see how each model perform.

Continue reading

Posted in R | Tagged , , , , , | 7 Comments

Writing Julia functions in R with examples

By Gabriel Vasconcelos

The Julia programming language is growing fast and its efficiency and speed is now well-known. Even-though I think R is the best language for Data Science, sometimes we just need more. Modelling is an important part of Data Science and sometimes you may need to implement your own algorithms or adapt existing models to your problems.

Continue reading

Posted in R | Tagged , , , , | 8 Comments

Uber assignment with lpSolve

By Yuri Fonseca

In this post we are going to make an Uber assignment simulation and calculate some metrics of waiting time through simulation.

Continue reading

Posted in R | Tagged , , , , , , | 3 Comments

How Random Forests improve simple Regression Trees?

By Gabriel Vasconcelos

Regression Trees

In this post I am going to discuss some features of Regression Trees an Random Forests. Regression Trees are know to be very unstable, in other words, a small change in your data may drastically change your model. The Random Forest uses this instability as an advantage through bagging (you can see details about bagging here) resulting on a very stable model.

Continue reading

Posted in R | Tagged , , , , , , | 5 Comments

R Course in Rio de Janeiro

R Course in Rio de Janeiro

We are preparing an R for Data-Science course (direct link here) in partnership with the IBPAD (Brazilian Institute of Research and Data Analysis). It is a great course for those who want to have a solid start in R. No prior knowledge in statistics, calculus or programming is required.

Continue reading

Posted in R | Tagged , , , , , , | 4 Comments