The Julia programming language is growing fast and its efficiency and speed is now well-known. Even-though I think R is the best language for Data Science, sometimes we just need more. Modelling is an important part of Data Science and sometimes you may need to implement your own algorithms or adapt existing models to your problems.
If performance is not essential and the complexity of your problem is small, R alone is enough. However, if you need to run the same model several times on large datasets and available implementations are not suit to your problem, you will need to go beyond R. Fortunately, you can go beyond R in R, which is great because you can do your analysis in R and call complex models from elsewhere. The book “Extending R” from John Chambers presents interfaces in R for C++, Julia and Python. The last two are in the XRJulia and in the XRPython packages, which are very straightforward.
XRJulia (Small example)
Now I will show how to write a small function in Julia and call it from R. You can also call existing Julia functions and run Julia packages (see the book and the package documentation for more information).
First let’s see if there is an existing Julia application in the system:
library(XRJulia) findJulia(test = TRUE)
## [1] TRUE
Great! My Julia is up and running. A good way to define your Julia functions is though the JuliaEval (you can also use JuliaSource to run scripts). This function evaluates a Julia code and returns an object. However, this does not allow us to call the Julia function. Next we need to tell R that this object is a function with JuliaFunction. Note that everything I am doing here is in Julia 0.5.2 and there may be some differences if you use older or newer versions.
# = Define a function in Julia for a linear regression = # regjl <- juliaEval(" function reg(x,y) n=size(x,1) xreg=hcat(ones(size(x)[1],1),x) k=size(xreg,2) p1=((xreg'xreg)^(-1)) b=p1*xreg'y r=y-xreg*b sig=(r'r)/(n-k) vmat=sig[1]*p1 sigb=sqrt(diag(vmat)) t=b./sigb return (b,t) end ") # = Tell R regjl is a function = # regjl_function=JuliaFunction(regjl)
Now we can just call the Julia function as if it was an R function. The data will automatically be converted for you by the package. However, there are some possible pitfalls. For example, Julia’s coercion is different from R. It does not understand that a round float may be interpreted as an integer. If, for example, one of the arguments in the Julia function is used to index variables, it needs to be integer and you may have to convert in the Julia function or use L in R (0L, 1L).
Let us generate some goofy data just to run a regression. We have 30 explanatory variables. Once we call the Julia regression function we will get an object that prints something like this: Julia proxy object Server Class: Array{Array{Float64,1},1}; size: 2.
# = Generate Data = # set.seed(1) x=matrix(rnorm(9000),300,30) y=rnorm(300) # = Run Julia Regression = # test=regjl_function(x,y) # = Convert Julia object to R = # JLreg=juliaGet(test)[[1]] # = Run regression using lm and compare = # Rreg=coef(lm(y~x)) cbind(Rreg,JLreg)
## Rreg JLreg ## (Intercept) 0.039077527 0.039077527 ## x1 -0.007701335 -0.007701335 ## x2 -0.056821632 -0.056821632 ## x3 0.105108373 0.105108373 ## x4 0.067983306 0.067983306 ## x5 0.008417105 0.008417105 ## x6 0.078409799 0.078409799 ## x7 0.125869518 0.125869518 ## x8 -0.083682827 -0.083682827 ## x9 0.071740308 0.071740308 ## x10 -0.005219314 -0.005219314 ## x11 -0.031737503 -0.031737503 ## x12 -0.059554921 -0.059554921 ## x13 -0.006208602 -0.006208602 ## x14 -0.020050620 -0.020050620 ## x15 0.008620698 0.008620698 ## x16 -0.030422210 -0.030422210 ## x17 -0.070605635 -0.070605635 ## x18 0.085396818 0.085396818 ## x19 -0.039982253 -0.039982253 ## x20 0.066555544 0.066555544 ## x21 -0.059725849 -0.059725849 ## x22 -0.037880127 -0.037880127 ## x23 -0.009742695 -0.009742695 ## x24 0.074308087 0.074308087 ## x25 0.067308613 0.067308613 ## x26 0.049848935 0.049848935 ## x27 -0.020548783 -0.020548783 ## x28 -0.014468850 -0.014468850 ## x29 -0.038258085 -0.038258085 ## x30 0.102003013 0.102003013
A big Example: Performance
If you just needed an example to adapt to your own problems you do not need to read the rest of this post. However, it is interesting to see how a complex Julia call performs in R. The main question is: is it worthy to use Julia in R even with all the data conversions we need to do from R to Julia and Julia to R? I will answer with an example!!!
The model I will estimate is the Complete Subset Regression (CSR). Basically, it is a combination of many (really many!!) linear regressions. If you want more information on the CSR click here. In this design, we will need to estimate 4845 regressions. The Julia function for the CSR is in the chunk below and the R function can be downloaded from my github here or using install_github(“gabrielrvsc/HDeconometrics”) (devtools must be loaded).
# = Load Combinatorics package in Julia. You need to install it. = # # = In julia, type: Pkg.add("Combinatorics") comb=juliaEval("using Combinatorics") # = Evaluate the CSR function = # csrjl <- juliaEval(" function csr(x,y,k,K,fixed_controls) fixed_controls=floor(Int,fixed_controls) if fixed_controls!=0 w=x[:,fixed_controls] nonw=setdiff(1:size(x,2),fixed_controls) end if fixed_controls[1]==0 nonw=1:size(x,2) end save_stat=zeros(size(x,2),2) save_stat[:,2]=1:size(x,2) for i in nonw if fixed_controls[1]==0 (betas,tstat)=reg(x[:,i],y) save_stat[i,1]=abs(tstat[2]) end if fixed_controls!=0 (betas,tstat)=reg(hcat(w,x[:,i]),y) save_stat[i,1]=abs(tstat[end]) end end t_ord = sortperm(save_stat[:,1],rev=true) save_stat=save_stat[t_ord,:] selected=save_stat[1:K,2] comb=collect(Base.combinations(selected,k)) m=size(comb)[1] final_coef=zeros(m,size(x,2)) final_const=zeros(m) if fixed_controls!=0 for i in 1:m xr=hcat(x[:,fixed_controls],x[:,floor(Int,comb[i])]) (model,tstat)=reg(xr,y) final_const[i]=model[1] model1=model[2:end] if length(fixed_controls)>1 final_coef[i,fixed_controls]=model1[1:(size(fixed_controls)[1])] model2=model1[size(fixed_controls)[1]+1:end] else final_coef[i,fixed_controls]=model1[1] model2=model1[2:end] end final_coef[i,floor(Int,comb[i])]=model2 end elseif fixed_controls[1]==0 for i in 1:m (model,tstat)=reg(x[:,floor(Int,comb[i])],y) final_const[i]=model[1] final_coef[i,floor(Int,comb[i])]=model[2:end] end end aux=hcat(final_const,final_coef) result=[mean(aux[:,i]) for i in 1:size(aux)[2]] end ") # = Define it as a function to R = # csrjl_function=JuliaFunction(csrjl)
Finally, lets compare the R implementation and the Julia implementation. The R CSR is not as good as it could be, I still have some improvements to do. However, the Julia CSR is also far from perfect given my skills in Julia.
# = Run Julia CSR = # t1=Sys.time() test=csrjl_function(x,y,4L,20L,0) t2=Sys.time() t2-t1
## Time difference of 0.5599275 secs
# = Run R CSR = # t1=Sys.time() testR=HDeconometrics::csr(x,y) t2=Sys.time() t2-t1
## Time difference of 5.391584 secs
As you can see, the Julia implementation is nearly 10 times faster than the R implementation. Most fast machine learning functions implemented in R such as glmnet or RandomForest are not written in R. Instead, these functions calls other languages like C++ and Fortran. Now you can do these things yourself with Julia and Python, which are much easier to learn. The last chunk below just shows that the two functions return the same result.
cbind("R"=colMeans(coef(testR)),"Julia"=juliaGet(test))
## R Julia ## intersect 0.033839276 0.033839276 ## 0.000000000 0.000000000 ## -0.010053924 -0.010053924 ## 0.020620605 0.020620605 ## 0.011568725 0.011568725 ## 0.011807227 0.011807227 ## 0.010009517 0.010009517 ## 0.028981984 0.028981984 ## -0.008858019 -0.008858019 ## 0.013655475 0.013655475 ## 0.000000000 0.000000000 ## -0.008555735 -0.008555735 ## -0.014146991 -0.014146991 ## 0.000000000 0.000000000 ## 0.000000000 0.000000000 ## 0.000000000 0.000000000 ## 0.000000000 0.000000000 ## -0.010730430 -0.010730430 ## 0.015761635 0.015761635 ## -0.009543594 -0.009543594 ## 0.015337589 0.015337589 ## -0.008913380 -0.008913380 ## -0.009944097 -0.009944097 ## 0.000000000 0.000000000 ## 0.008933383 0.008933383 ## 0.010226204 0.010226204 ## 0.000000000 0.000000000 ## 0.000000000 0.000000000 ## 0.000000000 0.000000000 ## -0.007886220 -0.007886220 ## 0.021236322 0.021236322
Pingback: Writing Julia functions in R with examples – Cloud Data Architect
Pingback: Writing Julia functions in R with examples | A bunch of data
Pingback: Writing Julia functions in R with examples – Mubashir Qasim
Cutting and pasting your code in my machine gives:
I am running Debian stretch 64bits with package
julia 0.4.7-6+b3 amd64
LikeLike
Hi Fernando. It seems like, in your case, the Julia regression is repeating the intercept. I get the deprecated warning here too, on ubuntu. The problem is probably in My Julia regression function. You may try smaller functions if you like to check if they work.
LikeLike
It just hangs. And doesn’t return regjl. I am using Windows 7 Microsoft Open R 3.4.1
LikeLike
Hi ZJ. I have not really tested in in Windows. Would you send a print of the error?
LikeLike
Pingback: Distilled News | Data Analytics & R
Hi, this is great. The first example works for me in Windows 10 (after I got Julia installed and on the Path) but the second example gives different answers in R and Julia. It might be because Combinatorics is not installed properly? What should the value of
comb
be?LikeLike
Hi. It is hard to know. It seems like a Julia problem and it might be the package. Julia is relatively new and it changes a lot. It is hard to know for how long a code will remain working.
LikeLike
What packages do I need to run your code?
LikeLike
What package is reg() in?
LikeLike
Oops I see it now
LikeLike
Got it working now. Had to change sqrt() to sqrt.() and Base.combinations() to Combinatorics.combinations(). Julia was about 3 times faster than R. Thanks!
LikeLike
Gabriel, how would I modify this if the Julia code is in a separate text file, instead of string literals embedded in the R script? Thanks
LikeLike
You need a text editor such as Atom or visual Studio. Ir you installed Julia pro you should already have atom.
LikeLike
Hi, yes I mean, if the Julia code is in the file called “insightr.jl” which has something like
module insightr
using Combinatoris
function rep() …
end
function csr()…
end
end
how do I modify the R code, especially the calls the XRJulia? I can’t get it to work. Thanks.
LikeLike
got it working now
library(XRJulia)
library(HDeconometrics)
findJulia(test=TRUE) # checks that Julia exists
create Julia evaluator and get module combin
ev <- RJulia()
ev$AddToPath(getwd())
ev$Using(“combin”)
= Generate Data =
set.seed(1)
x=matrix(rnorm(90000),3000,30)
y=rnorm(3000)
call reg(x,y)
test=ev$Call(“reg”, x, y)
JLreg=juliaGet(test)[[1]]
run regression using lm and compare
Rreg=coef(lm(y~x))
check1 <- cbind(Rreg,JLreg)
more serious example
call Julia version of csr()
t1=Sys.time()
test=ev$Call(“csr”,x,y,4L,20L,0)
t2=Sys.time()
j_time <- as.numeric(t2-t1)
j_results <- juliaGet(test)
call R version of csr
t1=Sys.time()
testR=HDeconometrics::csr(x,y)
t2=Sys.time()
r_time <- as.numeric(t2-t1)
r_results <- colMeans(coef(testR))
check2 <- cbind(“R”=r_results,”Julia”=j_results)
speedup <- r_time/j_time
LikeLike
The problem with julia is there isn’t a clear position on what framework to use to contain your data and compatible with libraries.
There are too many, tables, tabletraits, dataframes, dataframes, iteratetables, queryverse, staticarrays, juliadb, dataframemeta, indexedtables and many many more.
The user would like to learn just one or two.
LikeLike