Profiling
Profiling is a process for characterizing the number of function calls, and the run times associated with those function calls, in all or part of a program. As such, it can be invaluable for identifying code hotspots and suggesting prime targets for performance optimization. If 80% of the run time is being spent in one function, it is clear that you should prioritize the optimization of that particular function. (And if you are surprised that your program is actually spending 80% of the run time in that one function, you might want to rethink the algorithms you are using.) Profiling has long been used in conjunction with compiled languages; typically, compiler flags are provided to direct the instrumentation of code for assessing function calls and run times, and an associated profiling program (e.g., gprof) is called to assemble those raw data into summary form for users.
Similar functionality is provided for use within Python, with an even simpler integration into the workflow. The core functionality of the Python profiling process is provided by the cProfile and profile modules, which are included in the Python Standard Library. These are both deterministic profilers, in which every function call is tracked through instrumentation (as compared with statistical profilers, which sample the state of an uninstrumented executable at regular intervals).
Both modules provide the same interface, and cProfile is recommended for most users (with profile being more useful for those wishing to extend the profiler in some way); we demonstrate here just the functionality of cProfile. Within the enhanced IPython interpreter, the "magic" command %prun
offers a convenient interface to the functionality provided by the cProfile module. In addition to the modules in the Python Standard Library, the line_profiler module can also be installed, providing line-by-line profiling counts for functions of interest.
Profiling involves executing some code and getting statistics on function calls and run times for that executed code. Execution can be done either at the level of an entire program via the command-line, or at the level of a Python expression within an interpreter. Profiling statistics can be written either directly to the screen, or output to a file which can be processed separately by the pstats module in the standard library. The simplest use is the profiling of a single expression within Python. Let's revisit code we previously looked at to integrate the Lorenz model using SciPy:
The main function here integrate_lorenz()
can be called to integrate the Lorenz equations in time and produce a dynamical trajectory, and at each time step, the lorenz()
function is called to evaluate the instantaneous time derivatives of the state variables. This code can be analyzed with the use of the %prun
magic function in IPython and Jupyter, which will run the integrate_lorenz()
function and print out profiling statistics:
which produces output like:
The profiling report provides:
- a function-by-function breakdown on information on the number of calls to each function (ncalls)
- the total time spent in that function (tottime)
- the time spent per function call (percall)
- the cumulative time spent in that function and all subfunctions called by it (cumtime)
- the cumulative per call time (percall column 5)
- filename, function name and line number in question (filename:lineno(function))
We see that integrate_lorenz
is called once and takes a cumulative time of 0.257 seconds, calling the solve_ivp
function once, which uses almost all that time (0.255 seconds). solve_ivp
in turn calls the time derivative function lorenz()
9950 times, which takes a cumulative time of 0.022 seconds. (The time spent by solve_ivp
that is not spent in recurring calls to lorenz
is used by solve_ivp
itself to carry out the time integration with the information contained in the instantaneous time derivatives.)
The cProfile
module can be used in a similar fashion, allowing for profiling statistics to be optionally output to a file for further processing:
The cProfile documentation emphasizes that the instrumentation required to profile Python code introduces some discrepancies in timing results for code written in Python versus C-level functions that are called from Python, namely that "the profilers introduce overhead for Python code, but not for C-level functions, and so the C code would seem faster than any Python one.
To profile an entire Python program rather than a single expression, the following command can be executed in the shell:
where
-o writes the profile results to a file instead of to stdout
-s specifies one of the sort_stats() sort value to sort the output by
The -s sort_order option is used to print profiling results directly to the screen only when an output_file is not specified with the -o option.
The sort_order option used in conjunction with -s represents the name of any of the valid statistics to sort by (e.g., ncalls, tottime
, etc.);
you can view the full list of available sort_stats.
As mentioned, the line_profiler module can be used to provide line-by-line profiling results. Within IPython/Jupyter, we can call this interactively on our code to profile the lorenz
using the %lprun
magic function provided by that package:
When we do so, we get output like this:
The line-by-line profiling of the lorenz
tells us, perhaps somewhat surprisingly, that 38.9% of the time spent in that function is just unpacking the 3-element array xyz
into 3 separate scalars x,y,z
. We've done this to make the code more readable, so that we can write a function that looks similar to the underlying mathematics of the model, but we're paying a performance price for that convenience. If performance is an issue, we might choose instead not to unpack the array, and just refer to elements xyz[i]
as needed in the function.