The easiest way to implement R in parallel is to use some or all of the cores on one node. Obviously, this means you can only scale to the number of cores on a node. Memory availability can also become an issue. Due to hyperthreading of cores on many HPC clusters, the function detects the maximum number of tasks allowable, not the number of physical cores (though this is untrue of Frontera, as mentioned in the Frontera documentation). For example, on a node of an older, Xeon Phi-based cluster that has only 68 physical cores in total, we see that we would not wish to use all the available "hardware threads":

It is fairly straightforward to do multicore processing in R. First you need to invoke the R parallel library, then you replace functions with their multicore equivalents wherever possible. Coding changes are minimal.

mclapply

The following is a trivial example using instead of . The function , which stands for "list apply", performs an operation on each element of a list or vector; is its parallel, multicore implementation. If performance or memory is an issue with , you can set to specify a number of "cores" up to some maximum.

Note that performance with 272 "cores" is worse than running with one! Even 32 "cores" is better than 68 (both are better than on one). You may need to experiment to find the best specification.

OpenMP and MKL

There is a completely different way to take advantage of multiple cores on a node. R can be built and linked with Intel MKL, which provides high-performance implementations of BLAS and LAPACK routines. If your R program makes heavy use of matrix computations, MKL offers you a way to multithread such work through OpenMP. Often you don't need to change your code at all to allow MKL to take advantage of multiple cores in the hardware; it can be as easy as setting the environment variable.

Note that unless otherwise specified, will generally default to the number of hardware threads on a node. On Frontera, it is preset to 1 on compute nodes accessed via batch; on the TACC Analysis Portal, or the compute nodes of other clusters, it may be set to some other value. Therefore, it is safest to set explicitly, yourself. You will probably have to run some tests of this, but beware that MPI tasks (such as those outlined later) could then each spawn additional threads via MKL.

 
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Access Statement
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)