Performance Tips
Consider these suggestions to make your R code faster and more efficient.
- Avoid loops. Use one of the apply functions instead (lapply, sapply, mclapply, etc.).
- Avoid data frames whenever possible; they are only necessary when columns of data are different data types. Data.table is a performant substitute that can work on large datasets of 10s of GB and more.
- Installing the Tidyverse will give you a set of well-tested, mutually compatible packages.
- Think vectors; use vectorized versions of R functions whenever possible.
- Even better — think parallel; use parallel versions of R functions whenever possible.
- Be aware of memory management; purge objects that you are not using. R wants to allocate memory in contiguous chunks; making and removing matrices willy-nilly can leave your memory like Swiss cheese. It is much better to pre-allocate your matrices rather than allowing them to grow dynamically.
- Convert cpu intensive tasks to C code and call these functions from R.
- Search for optimized packages that compute what you need; one of R's strengths is the large number of packages available. Use CRAN task views to determine what is available.
- Profile your code. Rprof and summaryRprof are standard R functions. The profr and proftools packages provide graphical representations of profiling output.