Performance Tips
Consider these suggestions to make your R code faster and more efficient.
- Avoid loops. Use one of the apply functions instead (lapply, sapply, mclapply, etc.).
- Avoid data frames whenever possible; they are only necessary when columns of data are different data types. Data.table is a performant substitute that can work on large datasets of 10s of GB and more.
- Installing the Tidyverse will give you a set of well-tested, mutually compatible packages.
- Think vectors; use vectorized versions of R functions whenever possible.
- Even better — think parallel; use parallel versions of R functions whenever possible.
- Be aware of memory management; purge objects that you are not using. R wants to allocate memory in contiguous chunks; making and removing matrices willy-nilly can leave your memory like Swiss cheese. It is much better to pre-allocate your matrices rather than allowing them to grow dynamically.
- Convert cpu intensive tasks to C code and call these functions from R.
- Search for optimized packages that compute what you need; one of R's strengths is the large number of packages available. Use CRAN task views to determine what is available.
- Profile your code. Rprof and summaryRprof are standard R functions. The profr and proftools packages provide graphical representations of profiling output.
©
|
Cornell University
|
Center for Advanced Computing
|
Copyright Statement
|
Access Statement
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)