Cornell Virtual Workshop > Profiling and Debugging > Profiling

Timers

Basic timing can be obtained by measuring the entire run time of a code. This is useful in assessing the impact of performance enhancements on overall runtime. The basic Unix time command reports the total run time of its argument process, including any child processes or threads it may spawn. The time command reports three separate timing values (in seconds) to the Unix standard error stream.

Real refers to the total time elapsed from program execution to termination. Because this does not record the consumption of any specific system resource, it is often referred to as "wall clock time" (or "wall time" for short) because it is equivalent to measuring execution time by looking at a wall clock instead of measuring accounting information kept by the system. Ultimately, wall time is the overall unit of measurement most important to a researcher executing a scientific code. It is the unit of account for most HPC systems (including Frontera) and it determines the turnaround time for job execution.

User refers to the total time spent by the CPU(s) processing the instructions contained in your program. If your code is multithreaded, this time will be the sum of the CPU time taken by all the of the parent process. It does not include time when the operating system scheduled other processes/threads to run on the same cores, or when your program was waiting for services from the operating system—even if the OS was itself using CPU time to handle a request from your program. Combined with "real" time, "user" time is a useful metric for gauging the effectiveness of optimizations performed to increase computational efficiency. For example, the impact of utilizing an optimized and/or multithreaded math library will be seen in this category.

Sys refers to the total time spent by the CPU processing service requests (known as "system calls") for your program from the operating system. These service requests most often are I/O calls requesting disk, network, or terminal access. This does not include time waiting for such processes to complete if the operating system is not required to poll them.

Because a program often spends at least some of its time waiting for I/O requests to complete, at least some portion of its wall time is not attributable to "user" (your code) or "sys" (actual processing by the operating system). Therefore, for a serial code, it is expected that "real" will be greater than the sum of "user" and "sys." This unlisted waiting time serves as a de facto third timing category that is very informative in targeting code optimizations. A process that spends most of its time waiting can often improve its efficiency by overlapping its computation with communication or I/O using asynchronous methods.

On Frontera, time may be used for parallel processes, but only the "real" result will be relevant because the time process will attach only to the TACC MPI launcher (ibrun) and not to each parallel task. Still, "real" is often the desired metric; e.g., it is used for calculating parallel speedup. Accordingly, the parallel run can be executed as:

for Bourne shells (like bash), and

for C shells (like tcsh).

Note: Use the full path with the time command.

Note that it is necessary to supply the full path to the time command, as many shells have built-in timing routines. Although these routines typically report similar information, their format is not always consistent. The -p option (which stands for "Posix") is a further guarantee of consistency: results are stated in seconds, where the number of decimals relects the accuracy of the timer.

Back