Consider these suggestions to make your R code faster and more efficient.

  • Avoid loops. Use one of the apply functions instead (lapply, sapply, mclapply, etc.).
  • Avoid data frames whenever possible; they are only necessary when columns of data are different data types. Data.table is a performant substitute that can work on large datasets of 10s of GB and more.
  • Installing the Tidyverse will give you a set of well-tested, mutually compatible packages.
  • Think vectors; use vectorized versions of R functions whenever possible.
  • Even better — think parallel; use parallel versions of R functions whenever possible.
  • Be aware of memory management; purge objects that you are not using. R wants to allocate memory in contiguous chunks; making and removing matrices willy-nilly can leave your memory like Swiss cheese. It is much better to pre-allocate your matrices rather than allowing them to grow dynamically.
  • Convert cpu intensive tasks to C code and call these functions from R.
  • Search for optimized packages that compute what you need; one of R's strengths is the large number of packages available. Use CRAN task views to determine what is available.
  • Profile your code. Rprof and summaryRprof are standard R functions. The profr and proftools packages provide graphical representations of profiling output.
 
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Inclusivity Statement