Writing Julia functions in R with examples

By Gabriel Vasconcelos

The Julia programming language is growing fast and its efficiency and speed is now well-known. Even-though I think R is the best language for Data Science, sometimes we just need more. Modelling is an important part of Data Science and sometimes you may need to implement your own algorithms or adapt existing models to your problems.

If performance is not essential and the complexity of your problem is small, R alone is enough. However, if you need to run the same model several times on large datasets and available implementations are not suit to your problem, you will need to go beyond R. Fortunately, you can go beyond R in R, which is great because you can do your analysis in R and call complex models from elsewhere. The book “Extending R” from John Chambers presents interfaces in R for C++, Julia and Python. The last two are in the XRJulia and in the XRPython packages, which are very straightforward.

XRJulia (Small example)

Now I will show how to write a small function in Julia and call it from R. You can also call existing Julia functions and run Julia packages (see the book and the package documentation for more information).

First let’s see if there is an existing Julia application in the system:

library(XRJulia)
findJulia(test = TRUE)
## [1] TRUE

Great! My Julia is up and running. A good way to define your Julia functions is though the JuliaEval (you can also use JuliaSource to run scripts). This function evaluates a Julia code and returns an object. However, this does not allow us to call the Julia function. Next we need to tell R that this object is a function with JuliaFunction. Note that everything I am doing here is in Julia 0.5.2 and there may be some differences if you use older or newer versions.

# = Define a function in Julia for a linear regression = #
regjl <- juliaEval("
  function reg(x,y)
    n=size(x,1)
    xreg=hcat(ones(size(x)[1],1),x)
    k=size(xreg,2)
    p1=((xreg'xreg)^(-1))
    b=p1*xreg'y
    r=y-xreg*b
    sig=(r'r)/(n-k)
    vmat=sig[1]*p1
    sigb=sqrt(diag(vmat))
    t=b./sigb

    return (b,t)
  end
")

# = Tell R regjl is a function = #
regjl_function=JuliaFunction(regjl)

Now we can just call the Julia function as if it was an R function. The data will automatically be converted for you by the package. However, there are some possible pitfalls. For example, Julia’s coercion is different from R. It does not understand that a round float may be interpreted as an integer. If, for example, one of the arguments in the Julia function is used to index variables, it needs to be integer and you may have to convert in the Julia function or use L in R (0L, 1L).

Let us generate some goofy data just to run a regression. We have 30 explanatory variables. Once we call the Julia regression function we will get an object that prints something like this: Julia proxy object Server Class: Array{Array{Float64,1},1}; size: 2.

# = Generate Data = #
set.seed(1)
x=matrix(rnorm(9000),300,30)
y=rnorm(300)

# = Run Julia Regression = #
test=regjl_function(x,y)
# = Convert Julia object to R = #
JLreg=juliaGet(test)[[1]]

# = Run regression using lm and compare = #
Rreg=coef(lm(y~x))
cbind(Rreg,JLreg)
##                     Rreg        JLreg
## (Intercept)  0.039077527  0.039077527
## x1          -0.007701335 -0.007701335
## x2          -0.056821632 -0.056821632
## x3           0.105108373  0.105108373
## x4           0.067983306  0.067983306
## x5           0.008417105  0.008417105
## x6           0.078409799  0.078409799
## x7           0.125869518  0.125869518
## x8          -0.083682827 -0.083682827
## x9           0.071740308  0.071740308
## x10         -0.005219314 -0.005219314
## x11         -0.031737503 -0.031737503
## x12         -0.059554921 -0.059554921
## x13         -0.006208602 -0.006208602
## x14         -0.020050620 -0.020050620
## x15          0.008620698  0.008620698
## x16         -0.030422210 -0.030422210
## x17         -0.070605635 -0.070605635
## x18          0.085396818  0.085396818
## x19         -0.039982253 -0.039982253
## x20          0.066555544  0.066555544
## x21         -0.059725849 -0.059725849
## x22         -0.037880127 -0.037880127
## x23         -0.009742695 -0.009742695
## x24          0.074308087  0.074308087
## x25          0.067308613  0.067308613
## x26          0.049848935  0.049848935
## x27         -0.020548783 -0.020548783
## x28         -0.014468850 -0.014468850
## x29         -0.038258085 -0.038258085
## x30          0.102003013  0.102003013

A big Example: Performance

If you just needed an example to adapt to your own problems you do not need to read the rest of this post. However, it is interesting to see how a complex Julia call performs in R. The main question is: is it worthy to use Julia in R even with all the data conversions we need to do from R to Julia and Julia to R? I will answer with an example!!!

The model I will estimate is the Complete Subset Regression (CSR). Basically, it is a combination of many (really many!!) linear regressions. If you want more information on the CSR click here. In this design, we will need to estimate 4845 regressions. The Julia function for the CSR is in the chunk below and the R function can be downloaded from my github here or using install_github(“gabrielrvsc/HDeconometrics”) (devtools must be loaded).

# = Load Combinatorics package in Julia. You need to install it. = #
# = In julia, type: Pkg.add("Combinatorics")
comb=juliaEval("using Combinatorics")

# = Evaluate the CSR function = #
csrjl <- juliaEval("
function csr(x,y,k,K,fixed_controls)
fixed_controls=floor(Int,fixed_controls)
if fixed_controls!=0
w=x[:,fixed_controls]
nonw=setdiff(1:size(x,2),fixed_controls)
end
if fixed_controls[1]==0
nonw=1:size(x,2)
end

save_stat=zeros(size(x,2),2)
save_stat[:,2]=1:size(x,2)
for i in nonw
if fixed_controls[1]==0
(betas,tstat)=reg(x[:,i],y)
save_stat[i,1]=abs(tstat[2])
end
if fixed_controls!=0
(betas,tstat)=reg(hcat(w,x[:,i]),y)
save_stat[i,1]=abs(tstat[end])
end
end

t_ord = sortperm(save_stat[:,1],rev=true)
save_stat=save_stat[t_ord,:]

selected=save_stat[1:K,2]

comb=collect(Base.combinations(selected,k))
m=size(comb)[1]
final_coef=zeros(m,size(x,2))
final_const=zeros(m)

if fixed_controls!=0
for i in 1:m
xr=hcat(x[:,fixed_controls],x[:,floor(Int,comb[i])])
(model,tstat)=reg(xr,y)
final_const[i]=model[1]
model1=model[2:end]
if length(fixed_controls)>1
final_coef[i,fixed_controls]=model1[1:(size(fixed_controls)[1])]
model2=model1[size(fixed_controls)[1]+1:end]
else
final_coef[i,fixed_controls]=model1[1]
model2=model1[2:end]
end

final_coef[i,floor(Int,comb[i])]=model2
end
elseif fixed_controls[1]==0
for i in 1:m
(model,tstat)=reg(x[:,floor(Int,comb[i])],y)
final_const[i]=model[1]
final_coef[i,floor(Int,comb[i])]=model[2:end]
end
end
aux=hcat(final_const,final_coef)
result=[mean(aux[:,i]) for i in 1:size(aux)[2]]

end
")
# = Define it as a function to R = #
csrjl_function=JuliaFunction(csrjl)

Finally, lets compare the R implementation and the Julia implementation. The R CSR is not as good as it could be, I still have some improvements to do. However, the Julia CSR is also far from perfect given my skills in Julia.

# = Run Julia CSR = #
t1=Sys.time()
test=csrjl_function(x,y,4L,20L,0)
t2=Sys.time()
t2-t1
## Time difference of 0.5599275 secs
# = Run R CSR = #
t1=Sys.time()
testR=HDeconometrics::csr(x,y)
t2=Sys.time()
t2-t1
## Time difference of 5.391584 secs

As you can see, the Julia implementation is nearly 10 times faster than the R implementation. Most fast machine learning functions implemented in R such as glmnet or RandomForest are not written in R. Instead, these functions calls other languages like C++ and Fortran. Now you can do these things yourself with Julia and Python, which are much easier to learn. The last chunk below just shows that the two functions return the same result.

cbind("R"=colMeans(coef(testR)),"Julia"=juliaGet(test))
##                      R        Julia
## intersect  0.033839276  0.033839276
##            0.000000000  0.000000000
##           -0.010053924 -0.010053924
##            0.020620605  0.020620605
##            0.011568725  0.011568725
##            0.011807227  0.011807227
##            0.010009517  0.010009517
##            0.028981984  0.028981984
##           -0.008858019 -0.008858019
##            0.013655475  0.013655475
##            0.000000000  0.000000000
##           -0.008555735 -0.008555735
##           -0.014146991 -0.014146991
##            0.000000000  0.000000000
##            0.000000000  0.000000000
##            0.000000000  0.000000000
##            0.000000000  0.000000000
##           -0.010730430 -0.010730430
##            0.015761635  0.015761635
##           -0.009543594 -0.009543594
##            0.015337589  0.015337589
##           -0.008913380 -0.008913380
##           -0.009944097 -0.009944097
##            0.000000000  0.000000000
##            0.008933383  0.008933383
##            0.010226204  0.010226204
##            0.000000000  0.000000000
##            0.000000000  0.000000000
##            0.000000000  0.000000000
##           -0.007886220 -0.007886220
##            0.021236322  0.021236322
This entry was posted in R and tagged , , , , . Bookmark the permalink.

19 Responses to Writing Julia functions in R with examples

  1. Pingback: Writing Julia functions in R with examples – Cloud Data Architect

  2. Pingback: Writing Julia functions in R with examples | A bunch of data

  3. Pingback: Writing Julia functions in R with examples – Mubashir Qasim

  4. Fernando TUSELL says:

    Cutting and pasting your code in my machine gives:

    test=regjl_function(x,y)
    WARNING: [a,b] concatenation is deprecated; use [a;b] instead
    in depwarn at ./deprecated.jl:73
    while loading no file, in expression starting on line 0

    = Convert Julia object to R =

    JLreg=juliaGet(test)[[1]]

    = Run regression using lm and compare =

    Rreg=coef(lm(y~x))
    cbind(Rreg,JLreg)
    Rreg JLreg
    (Intercept) 0.039077527 0.03907753
    x1 -0.007701335 0.03907753
    x2 -0.056821632 0.03907753
    …….

    I am running Debian stretch 64bits with package
    julia 0.4.7-6+b3 amd64

    Like

    • insightr says:

      Hi Fernando. It seems like, in your case, the Julia regression is repeating the intercept. I get the deprecated warning here too, on ubuntu. The problem is probably in My Julia regression function. You may try smaller functions if you like to check if they work.

      Like

  5. ZJ Dai says:

    It just hangs. And doesn’t return regjl. I am using Windows 7 Microsoft Open R 3.4.1

    Like

  6. Pingback: Distilled News | Data Analytics & R

  7. manukajoe says:

    Hi, this is great. The first example works for me in Windows 10 (after I got Julia installed and on the Path) but the second example gives different answers in R and Julia. It might be because Combinatorics is not installed properly? What should the value of comb be?

    test=csrjl_function(x,y,4L,20L,0)
    Error: Evaluating Julia expression: R_1_29(reshape([-0.626453810742333,0.183643324222082,-0.835628612410047,…..

    Like

  8. insightr says:

    Hi. It is hard to know. It seems like a Julia problem and it might be the package. Julia is relatively new and it changes a lot. It is hard to know for how long a code will remain working.

    Like

    • manukajoe says:

      What packages do I need to run your code?

      Like

    • manukajoe says:

      Gabriel, how would I modify this if the Julia code is in a separate text file, instead of string literals embedded in the R script? Thanks

      Like

      • insightr says:

        You need a text editor such as Atom or visual Studio. Ir you installed Julia pro you should already have atom.

        Like

      • manukajoe says:

        Hi, yes I mean, if the Julia code is in the file called “insightr.jl” which has something like

        module insightr
        using Combinatoris
        function rep() …
        end
        function csr()…
        end
        end

        how do I modify the R code, especially the calls the XRJulia? I can’t get it to work. Thanks.

        Like

      • manukajoe says:

        got it working now

        library(XRJulia)
        library(HDeconometrics)

        findJulia(test=TRUE) # checks that Julia exists

        create Julia evaluator and get module combin

        ev <- RJulia()
        ev$AddToPath(getwd())
        ev$Using(“combin”)

        = Generate Data =

        set.seed(1)
        x=matrix(rnorm(90000),3000,30)
        y=rnorm(3000)

        call reg(x,y)

        test=ev$Call(“reg”, x, y)
        JLreg=juliaGet(test)[[1]]

        run regression using lm and compare

        Rreg=coef(lm(y~x))
        check1 <- cbind(Rreg,JLreg)

        more serious example

        call Julia version of csr()

        t1=Sys.time()
        test=ev$Call(“csr”,x,y,4L,20L,0)
        t2=Sys.time()
        j_time <- as.numeric(t2-t1)
        j_results <- juliaGet(test)

        call R version of csr

        t1=Sys.time()
        testR=HDeconometrics::csr(x,y)
        t2=Sys.time()
        r_time <- as.numeric(t2-t1)
        r_results <- colMeans(coef(testR))

        check2 <- cbind(“R”=r_results,”Julia”=j_results)
        speedup <- r_time/j_time

        Like

  9. juan says:

    The problem with julia is there isn’t a clear position on what framework to use to contain your data and compatible with libraries.
    There are too many, tables, tabletraits, dataframes, dataframes, iteratetables, queryverse, staticarrays, juliadb, dataframemeta, indexedtables and many many more.
    The user would like to learn just one or two.

    Like

Leave a comment