One of my favourite types of data to work with is real estate transactions. There is always something interesting to be learned from using hedonic models to tease out or quantify the implicit preferences of homebuyers (and renters) with respect to the old adage of “location, location, location!”.
Such data tend to exhibit strong spatial autocorrelation and its associated ills. In response, I often turn to spatial econometric models that take spatial relationships among observations into account. R has a great ecosystem of packages for spatial econometrics, including the {spatialreg} package that I use often.
However, running these models with many observations can be a slow process in R.
This is due to the computational costs associated with the manipulation of (potentially very large) spatial weights matrices. When using R’s default math libraries, this can lead to models taking hours or even days to run. Moreover, there isn’t much benefit to just getting better computer hardware: open up task manager on Windows or Activity Monitor on the Mac and you will see that R is only using one CPU core.
Accelerating R
Under the hood, R uses two libraries for performing common mathematical operations: Basic Linear Algebra Subprograms (BLAS) and LAPACK (Linear Algebra Package). R ships with “reference” versions of BLAS and LAPACK that will return identical results across any installation of R, but are otherwise un-optimized (see Eddelbuettel, 2016).
Thankfully, there are several solutions out there for replacing R’s reference libraries with more optimized ones that can lead to dramatic reductions in computational time. Since I use both Windows (Intel processors) and macOS (Intel and M-series processors) in my work, below you can find instructions for speeding up R in those two environments. AFAIK Linux users and those with AMD processors can utilize OpenBLAS to acheve the same results, but I don’t have any experience in linking these libraries with R.
These solutions require some knowledge of the file system and administrator privileges (Windows) and the Terminal (macOS). Like R itself, I offer no warranty or support of any kind for these changes! Indeed, customizing your R install might (and has in the past - see below) broken some R packages. Moreover, and as the R team cautions, “Note that fast BLAS implementations may give different (and often slightly less accurate) results than the reference BLAS included in R”. It is pretty easy to revert these changes if you need to, but certainly stop here if you aren’t comfortable with the risk.
Windows and Intel
My initial search for ways to speed up R led to me adopting of Microsoft’s R Open, which, by default, used Intel’s Math Kernel Library (MKL). The MKL uses versions of the BLAS and LAPACK libraries that are optimized for fast multithreaded performance on Intel’s processors. Sure enough, this led to a dramatic reduction in computational time associated with running spatial econometric models. However, Microsoft discontinued the R Open project with the last release using R 4.0.2 in 2020.
As I further scoured the web for solutions, I came across this stackoverflow thread, which had details for how to manually link Intel’s MKL with more recent versions of R. The write up is a bit complex, with some more recent working solutions now found within comments to the original answers in the thread. So here is a more streamlined version.
First, you will need an install of R on your PC. At the time of writing, the current version is 4.3.1, so my file paths below will use that example. Second, download and install Intel’s MKL, which is now part of their larger oneAPI product. I used the offline installer for Windows found here. This downloads oneMKL version 2023.2.0 (the new 2024 version has a very different file structure that I will need to troubleshoot). With these two software packages installed, now we have to do some manual steps (accepting the administrator rights prompts along the way):
- copy all the files inside the
C:\Program Files (x86)\Intel\oneAPI\mkl\2023.2.0\redist\intel64
folder and paste them insideC:\Program Files\R\R-4.3.1\bin\x64
- copy all the files inside the
C:\Program Files (x86)\Intel\oneAPI\compiler\2023.2.0\windows\redist\intel64_win\compiler
folder and paste them insideC:\Program Files\R\R-4.3.1\bin\x64
- in the
C:\Program Files\R\R-4.3.1\bin\x64
folder, rename the existingRblas.dll
to something likeRblas_ref.dll
- rename the existing
Rlapack.dll
to something likeRlapack_ref.dll
- make two copies of
mkl_rt.2.dll
- rename one of the copies
Rblas.dll
- rename the other copy
Rlapack.dll
Large math operations should now use all of your CPU’s logical cores.
Issues and Reversion
The one issue I have encountered in the past with linking the MKL to R is that the {igraph} package (and those depending on it) stopped working. According to a recent answer on the original stackoverflow thread that inspired this, the issue has recently been addressed.
To undo these customizations, you would have to either rename (or delete) the two MKL files we renamed to be Rblas.dll
and Rlapack.dll
and substitute the original reference files back in by renaming them to their original names.
macOS
Accelerating R on a (newer) Mac is more straightforward, and instructions for doing so are actually contained within the official R for macOS FAQ. R for macOS ships with vecLib, which is part of Apple’s Accelerate framework of optimizations for Apple hardware. All you have to do is tell R to use the vecLib BLAS rather than the reference one.
To do so, open Terminal and run:
cd /Library/Frameworks/R.framework/Resources/lib
followed by:
ln -sf libRblas.vecLib.dylib libRblas.dylib
On an Intel Mac, this will result in optimized math functions using all available CPU cores. On M-series processors, it looks like this behaves a bit differently, as optimized math routines are instead handled by the specialized AMX co-processor, so you will not see full CPU utilization.
Issues and Reversion
I have not run into any compatibility issues using R with Apple’s Accelerate framework. To revert the changes, run the following two commands in Terminal to point R back to the original reference BLAS:
cd /Library/Frameworks/R.framework/Resources/lib
ln -sf libRblas.0.dylib libRblas.dylib
Checking that it works
A simple check to ensure R is using the optimized libraries is to run the following code from this answer on stackoverflow.
<- 10000
m <- 2000
n <- matrix (runif (m*n),m,n)
A system.time (S <- svd (A,nu=0,nv=0))
user system elapsed
2.314 0.167 1.875
On my PC with an 18-core Intel Xeon processor, this takes about 40 seconds without using the MKL. After linking the libraries, the computation time drops to just over 3 seconds! My MacBook Pro with an M2 processor is even faster at about 2 seconds. In effect, this drops my modelling time for some large spatial econometric models from hours to minutes.