Timers
Basic timing can be obtained by measuring the entire run time of a
code. This is useful in assessing the impact of performance enhancements on
overall runtime. The basic Unix time
command reports the total
run time of its argument process, including any
child processes or threads it may spawn. The time
command reports three separate timing values (in seconds) to the Unix standard
error stream.
Real refers to the total time elapsed from program execution to termination. Because this does not record the consumption of any specific system resource, it is often referred to as "wall clock time" (or "wall time" for short) because it is equivalent to measuring execution time by looking at a wall clock instead of measuring accounting information kept by the system. Ultimately, wall time is the overall unit of measurement most important to a researcher executing a scientific code. It is the unit of account for most HPC systems (including Frontera) and it determines the turnaround time for job execution.
User refers to the total time spent by the CPU(s) processing the instructions contained in your program. If your code is multithreaded, this time will be the sum of the CPU time taken by all the of the parent process. It does not include time when the operating system scheduled other processes/threads to run on the same cores, or when your program was waiting for services from the operating system—even if the OS was itself using CPU time to handle a request from your program. Combined with "real" time, "user" time is a useful metric for gauging the effectiveness of optimizations performed to increase computational efficiency. For example, the impact of utilizing an optimized and/or multithreaded math library will be seen in this category.
Sys refers to the total time spent by the CPU processing service requests (known as "system calls") for your program from the operating system. These service requests most often are I/O calls requesting disk, network, or terminal access. This does not include time waiting for such processes to complete if the operating system is not required to poll them.
Because a program often spends at least some of its time waiting for I/O requests to complete, at least some portion of its wall time is not attributable to "user" (your code) or "sys" (actual processing by the operating system). Therefore, for a serial code, it is expected that "real" will be greater than the sum of "user" and "sys." This unlisted waiting time serves as a de facto third timing category that is very informative in targeting code optimizations. A process that spends most of its time waiting can often improve its efficiency by overlapping its computation with communication or I/O using asynchronous methods.
On Frontera, time
may be used for parallel processes, but only the "real"
result will be relevant because the time
process will attach only to the TACC
MPI launcher (ibrun
) and not to each parallel task.
Still, "real" is often the desired metric; e.g., it is used for calculating parallel speedup.
Accordingly, the parallel run can be executed as:
for Bourne shells (like bash), and
for C shells (like tcsh).
time
command.
Note that it is necessary to supply the full path to the time command, as many
shells have built-in timing routines. Although these routines typically report
similar information, their format is not always consistent. The -p
option (which stands for "Posix") is a further guarantee of consistency: results
are stated in seconds, where the number of decimals relects the accuracy of the timer.