Christopher Cameron, Adam Brazier, Linda Woodard (original author)
Cornell Center for Advanced Computing

Revisions: 10/2021, 5/2018, 7/2014 (original)

This topic is a brief introduction to running R in parallel. It covers two basic strategies, multi-core processing and multi-node parallelism, on TACC's Stampede3 and Frontera supercomputers.

Most R packages use a single core, but there are a number of ways of running R in parallel. The most obvious is to do embarrassingly parallel jobs, where you invoke the same R script with different inputs. In a shared memory environment (one node), you can take advantage of built-in multithreaded functions in R in an analogous fashion to using OpenMP. You can also use libraries such as Rmpi, pbdR or snow that are built on top of MPI. Of these, snow requires the least knowledge of MPI to use.

Objectives

After you complete this topic, you should be able to:

  • Run parallel jobs in R
  • Explain using multithreaded functions in R
  • Describe using multicore processing in R on Stampede2 and Frontera
  • Explain how to use SNOW in an batch job
Prerequisites

This topic assumes the reader has no prior experience with R. The exercises and examples assume some familiarity with statistical analysis and working through the exercises on Stampede2 or Frontera requires a basic knowledge of Linux and the ability to access these systems via SSH.

Carrying out activities on Stampede2 or Frontera will require an appropriate TACC allocation. As an alternative, some activities could be carried out on a local installation of R and others on another HPC resource with Slurm and R installed.

 
©   Cornell University  |  Center for Advanced Computing  |  Copyright Statement  |  Inclusivity Statement