Accelerating R

code
analysis
Author
Affiliation

University of Toronto Scarborough

Published

July 21, 2023

One of my favourite types of data to work with is real estate transactions. There is always something interesting to be learned from using hedonic models to tease out or quantify the implicit preferences of homebuyers (and renters) with respect to the old adage of “location, location, location!”.

Such data tend to exhibit strong spatial autocorrelation and its associated ills. In response, I often turn to spatial econometric models that take spatial relationships among observations into account. R has a great ecosystem of packages for spatial econometrics, including the {spatialreg} package that I use often.

However, running these models with many observations can be a slow process in R.

This is due to the computational costs associated with the manipulation of (potentially very large) spatial weights matrices. When using R’s default math libraries, this can lead to models taking hours or even days to run. Moreover, there isn’t much benefit to just getting better computer hardware: open up task manager on Windows or Activity Monitor on the Mac and you will see that R is only using one CPU core.

Accelerating R

Figure 1: Moar fasteR!

Under the hood, R uses two libraries for performing common mathematical operations: Basic Linear Algebra Subprograms (BLAS) and LAPACK (Linear Algebra Package). R ships with “reference” versions of BLAS and LAPACK that will return identical results across any installation of R, but are otherwise un-optimized (see Eddelbuettel, 2016).

Thankfully, there are several solutions out there for replacing R’s reference libraries with more optimized ones that can lead to dramatic reductions in computational time. Since I use both Windows (Intel processors) and macOS (Intel and M-series processors) in my work, below you can find instructions for speeding up R in those two environments. AFAIK Linux users and those with AMD processors can utilize OpenBLAS to acheve the same results, but I don’t have any experience in linking these libraries with R.

Warning

These solutions require some knowledge of the file system and administrator privileges (Windows) and the Terminal (macOS). Like R itself, I offer no warranty or support of any kind for these changes! Indeed, customizing your R install might (and has in the past - see below) broken some R packages. Moreover, and as the R team cautions, “Note that fast BLAS implementations may give different (and often slightly less accurate) results than the reference BLAS included in R”. It is pretty easy to revert these changes if you need to, but certainly stop here if you aren’t comfortable with the risk.

Windows and Intel

My initial search for ways to speed up R led to me adopting of Microsoft’s R Open, which, by default, used Intel’s Math Kernel Library (MKL). The MKL uses versions of the BLAS and LAPACK libraries that are optimized for fast multithreaded performance on Intel’s processors. Sure enough, this led to a dramatic reduction in computational time associated with running spatial econometric models. However, Microsoft discontinued the R Open project with the last release using R 4.0.2 in 2020.

As I further scoured the web for solutions, I came across this stackoverflow thread, which had details for how to manually link Intel’s MKL with more recent versions of R. The write up is a bit complex, with some more recent working solutions now found within comments to the original answers in the thread. So here is a more streamlined version.

First, you will need an install of R on your PC. At the time of writing, the current version is 4.3.1, so my file paths below will use that example. Second, download and install Intel’s MKL, which is now part of their larger oneAPI product. I used the offline installer for Windows found here. This downloads oneMKL version 2023.2.0 (the new 2024 version has a very different file structure that I will need to troubleshoot). With these two software packages installed, now we have to do some manual steps (accepting the administrator rights prompts along the way):

  1. copy all the files inside the C:\Program Files (x86)\Intel\oneAPI\mkl\2023.2.0\redist\intel64 folder and paste them inside C:\Program Files\R\R-4.3.1\bin\x64
  2. copy all the files inside the C:\Program Files (x86)\Intel\oneAPI\compiler\2023.2.0\windows\redist\intel64_win\compiler folder and paste them inside C:\Program Files\R\R-4.3.1\bin\x64
  3. in the C:\Program Files\R\R-4.3.1\bin\x64 folder, rename the existing Rblas.dll to something like Rblas_ref.dll
  4. rename the existing Rlapack.dll to something like Rlapack_ref.dll
  5. make two copies of mkl_rt.2.dll
  6. rename one of the copies Rblas.dll
  7. rename the other copy Rlapack.dll

Large math operations should now use all of your CPU’s logical cores.

Issues and Reversion

The one issue I have encountered in the past with linking the MKL to R is that the {igraph} package (and those depending on it) stopped working. According to a recent answer on the original stackoverflow thread that inspired this, the issue has recently been addressed.

To undo these customizations, you would have to either rename (or delete) the two MKL files we renamed to be Rblas.dll and Rlapack.dll and substitute the original reference files back in by renaming them to their original names.

macOS

Accelerating R on a (newer) Mac is more straightforward, and instructions for doing so are actually contained within the official R for macOS FAQ. R for macOS ships with vecLib, which is part of Apple’s Accelerate framework of optimizations for Apple hardware. All you have to do is tell R to use the vecLib BLAS rather than the reference one.

To do so, open Terminal and run:

  • cd /Library/Frameworks/R.framework/Resources/lib

followed by:

  • ln -sf libRblas.vecLib.dylib libRblas.dylib

On an Intel Mac, this will result in optimized math functions using all available CPU cores. On M-series processors, it looks like this behaves a bit differently, as optimized math routines are instead handled by the specialized AMX co-processor, so you will not see full CPU utilization.

Issues and Reversion

I have not run into any compatibility issues using R with Apple’s Accelerate framework. To revert the changes, run the following two commands in Terminal to point R back to the original reference BLAS:

  • cd /Library/Frameworks/R.framework/Resources/lib
  • ln -sf libRblas.0.dylib libRblas.dylib

Checking that it works

A simple check to ensure R is using the optimized libraries is to run the following code from this answer on stackoverflow.

m <- 10000
n <- 2000
A <- matrix (runif (m*n),m,n)
system.time (S <- svd (A,nu=0,nv=0))
   user  system elapsed 
  2.314   0.167   1.875 

On my PC with an 18-core Intel Xeon processor, this takes about 40 seconds without using the MKL. After linking the libraries, the computation time drops to just over 3 seconds! My MacBook Pro with an M2 processor is even faster at about 2 seconds. In effect, this drops my modelling time for some large spatial econometric models from hours to minutes.