Configuring for Performance
As well-optimized, bundled distributions of Python and associated libraries become increasingly available, it is less likely that you will build those libraries yourself, although when you do, you will probably want to ensure that they are compiled and linked for good performance. In this section, we address a variety of issues associated with configuring and compiling Python for performance.
Linking to BLAS and LAPACK
While NumPy and SciPy provide a gateway to numerical and scientific computing, it is important that they be linked against optimized versions of lower-level BLAS and LAPACK libraries, since failure to do so can impact performance significantly. On Frontera at TACC, the installed versions of NumPy and SciPy are compiled specifically to take best advantage of the system's processors and instruction sets, and outperform even the highly optimized, but more generically configured, Intel Distribution for Python. On other Intel-based systems, there are several freely available Python distributions with versions of NumPy and SciPy that are linked against the Intel Math Kernel Library (MKL).
To determine what libraries numpy and scipy are linked against, the following commands can be used within the Python interpreter:
On Frontera, for example, the numpy configuration looks something like this:
On other systems, if you have chosen not to install a Python distribution that already does so, you might need to install libraries that provide higher-performance implementations of the BLAS and LAPACK interfaces. Some freely available choices include:
- Intel Math Kernel Library (MKL): highly optimized, for use on Intel processors
- ATLAS (Automatically Tuned Linear Algebra Software): portable performance across different platforms, by tuning algorithm parameters on a per-machine basis
- OpenBLAS: an optimized BLAS library based on GotoBLAS2
Multithreading for linear algebra with NumPy
NumPy supports a variety of operations in linear algebra (matrix and vector multiplications, eigenvalue computations, singular value decompositions, etc.), using the underlying BLAS and LAPACK libraries for the heavy lifting. On a multicore machine, NumPy can make use of those cores through multithreading to do linear algebra, controlled simply by environment variables that specify the number of threads to use:
- MKL_NUM_THREADS=N # if using Intel MKL
- OMP_NUM_THREADS=N # if libraries support OpenMP
Not all NumPy operations can take advantage of this parallelism, however. Elementwise array operations (e.g., array additions, application of ufuncs, etc.) do not appear to benefit from this multithreading.
Further information on configuring installations with setup.py
Many times when you download and install a Python package, it gets compiled on the fly. How do you ensure that these packages, especially performance-sensitive packages like NumPy, are compiled with the fastest compilers and the right options? This section explains how to manipulate the Python setup.py installation process.
Most Python packages come with a file called setup.py, and you install the package with the "install" command defined in the setup.py file. The typical command is:
That's what the standard README tells you to do. But if you want to select optional compilers and libraries, how do you control that?
The setup.py script specifies the subdirectories of Python code to install, and more importantly, it specifies the extension modules which are the libraries of C code to compile and install. For the latter, the setup.py script relies on a Python module called distutils. The setup.py file defines the targets, and distutils figures out how to build those targets. You can affect this process in a few ways.
- Command-line options to setup.py
- Environment variables
- Modifications to the code within distutils (if you use virtualenv)
To start, you can ask the setup.py file what command-line options it recognizes:
The --help option will mention commands called build and install. Usually there are other setup commands besides these, such as config: the --help-commands option gives a much more complete listing. Some of the commands are actually subcommands for the config and build commands. A complicated setup might have many subcommands for config and build, arranged in a hierarchy like this:
- config
- config_cc
- config_fc
- build
- build_src
- build_clib
- build_ext
- build_src
- install
Some subcommands may not be listed, but you can figure them out by watching the output of the build process. It helps to record the installation process to a file while it’s underway:
In general, subcommands have their own options to make them separately configurable. While the higher-level commands do configure the lower-level subcommands, the lower-level ones may offer more detailed configuration choices. So look through them, too, to see what extra options they recognize. As a rule, each one comes with its own help information:
Finally, there is a special help command to tell you what compilers are recognized by a given setup.py:
The whole installation process, including all subcommands and options, can be specified on a single command line. Each option follows the name of its command or subcommand. If the setup recognized "intelem" as one of its known compilers, we could run this one-line build:
The subcommand config_cc, when it is defined in setup.py, allows you to pass architecture-specific optimization options to the C compiler. As always, optimization options (and everything else you want to know about the compilers on your system) are described in the man pages and User Guides.
If the command line does not specify a compiler, the distutils module will try to incorporate standard environment variables, such as $CC and $INCLUDE. But the Intel compilers define their own set of variables with which you can refine language and optimization choices. On Frontera, the $MKLROOT environment variable points to where the Intel compilers and libraries reside; there is additional information about compilers and available options in $MKLROOT/../documentation.
If you run into real problems, look inside distutils. The file will be located within your directory, for those who decide to use virtualenv. But in any case, Python can tell you where the file is located:
Be aware that distutils is performing the complex task of discovering the state of any machine on which it runs, so it is a complex library. The most likely focus of any hunt is the directory of classes that define the compilers and their options. You can modify these by hand.