The snow library is intended for parallelizing R workloads on clusters. This is implied in the name "snow", which stands for Simple Network Of Workstations. For many applications, however, it is sufficient just to parallelize the workload across the cores on a single node. Whether on one node or many, snow is a good choice for embarrassingly parallel applications that can use more than one core. It relies on a manager/worker model where one process (the manager) controls the rest of the processes which do the actual work. These workers communicate their results back to the manager process, which may do more work on the aggregated data.

snow on a single node

Frontera has a generous number of cores on each compute node, so we shall start with single-node usage. You will note also that in the examples to follow, we do not request as many tasks as a node supports. This is simply to make the output, which involves each task printing to a file, easier to read. You should feel free to increase the number of requested tasks.

TACC provides convenient environment modules that help you to create the proper setup for R and other applications. On Frontera, you just do , and you'll have access to a fairly recent version of R plus a number of useful libraries. However, the snow library is not included in any module, so you will need to install it as a package on Frontera before you can start to use it. Here's how to do it:

You may also need to perform the above steps if you are running your R code on a different system, or if you start using a different version of R on Frontera. Please refer to the Additional Packages page for more details on how to install packages such as snow.

The following short example illustrates how to use snow in an interactive session on Frontera. It is taken from a presentation titled "Introduction to R" by David Walling at TACC (who kindly provided a few of the examples on this page). The goal of the code is to simulate a classic problem from probability theory: if there are n people in a room, how likely is it that two or more of them share the same birthday?

Below is the birthday.R code. It includes snow functions that serve to distribute the major calculations to an intra-node cluster of workers.

Here is how to run the code interactively on Frontera, after loading the Rstats module (as described above, as well as previously on the Setting up R on Frontera page of the R on HPC Systems topic):

Note that, as per Frontera default guidance, is set to 56. If you don't wish to view the line-by-line progress of the code, you can run the script with the command .

snow on multiple nodes with Rmpi

What if you want to run workers on more than one node? This is where the Rmpi library comes in. It turns out that snow is built on top of Rmpi—and you don't need to be familiar with MPI to use Rmpi and snow on multiple nodes. On many HPC systems, Rmpi may be preinstalled for you; if not, it can be installed just like snow through R's usual methods for installing packages.

Frontera users will encounter a slight obstacle, though, as the standard Rmpi package lacks built-in support for Intel MPI or Intel compilers. Moreover, TACC does not include Rmpi in any of the modules that they currently provide. Nevertheless, you can install a working Rmpi library for yourself on Frontera by specifying a couple of paths and pretending that Intel MPI is no different from Open MPI. Use the following commands to install your own personal Rmpi library (note that the exact paths to MPI include files and libraries may differ in other Intel software releases):

Once Rmpi is installed, the pair of scripts below should let you try snow in batch. It should also give you an idea of how to run snow across multiple nodes on other HPC clusters where Rmpi is supported. Construct the two files given below, and . Then just submit the job with .

The batch script above is . Next, we have :

In the output, notice that there are initially only 11 workers on the first node, even though in the batch script we specified 12 on that node and 12 on the second node. This is because the manager process takes up the first core on the first node. After snow assigns 23 workers to the remaining 23 cores, it returns to the first node to place the last worker there. This is not necessarily a problem, and it can even be good, assuming that the manager process is relatively idle.

Let's suppose for a moment we are on a system that supports , and we call that function instead of . Then snow would only allow 23 workers in total, because it assumes that the manager process really does require a full core to fulfill its role. But by using , we are able to start as many workers as we want, up to (or even beyond) the full number of cores if desired.

snowfall

You should be aware that there is a second package called snowfall which provides an even friendlier wrapper around snow. It can make developing parallel R code somewhat easier. To get access to it on Frontera, install the snowfall package in R using the function, just like you did for snow.

The following batch script serves as an example of how to use snowfall together with Rmpi on two Frontera nodes. You can call it . Notice the use of the Unix "hereis" () shell syntax to combine the batch script and the R script into one file.

Just submit the job as normal:

The same script will work with , i.e., with Slurm parameters that specify only one node. In that case, the script may be further modified by removing all its MPI-related parts: specifically, by deleting and , and commenting out the lines with and . The latter modifications would be especially helpful if you don't have Rmpi installed and are therefore limited to a single node.

You can find more information on snowfall at CRAN.

 
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Access Statement
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)