Cornell Virtual Workshop > Python for High Performance > Third Party Libraries

NumPy

NumPy is Python's core numerical library. Central to NumPy are n-dimensional arrays (formally, objects of type numpy.ndarray). NumPy arrays are multidimensional analogues to Python lists, but restricted to contain elements of homogeneous type such as int, float or complex. NumPy arrays have an underlying C representation, and functions on those arrays that are implemented in C or Fortran. For scientific code, these compiled libraries codify numerical representations of floating-point objects and methods. The libraries include statistical functions as well as an extensive collection of random number routines. Arrays in NumPy can even be backed by memory-mapped files so that they can grow beyond the bounds of available RAM.

NumPy itself makes use of lower-level libraries such as BLAS and LAPACK, to support operations arising in linear algebra, and can be configured to link to optimized versions of those libraries if they exist on the system — see the page on Configuring for Performance for more information on this topic. The NumPy library lies in a module named numpy, but numpy is conventionally imported via the statement import numpy as np, and many examples of numpy code use the shorthand name np. (The danger of such a shorthand name, however, is a greater risk of name clashes — be sure not to create a variable named np and overwrite the numpy module you have imported.)

NumPy provides a rich syntax for operating on arrays in a compact and efficient manner without explicit looping and indexing into the array. For example, one can add element-wise two arrays of the same size and shape with a single operation as:

c = a + b    # throws ValueError if a and b not the same shape

which is more or less equivalent to:

assert(a.shape == b.shape)  # throws AssertionError if a and b not the same shape
c = np.zeros_like(a)        # prefills a zero array of the correct shape
for i in range(a.shape[0]):
    for j in range(a.shape[1]):
        c[i,j] = a[i,j] + b[i,j]

The virtue of the first form of array addition is not only that it is more compact and readable, but also more numerically efficient, as the looping, indexing and addition are done in a compiled C library with a single call from the Python interpreter, rather than in multiple calls from the Python interpreter, similar to what was illustrated schematically in panels (b) and (c) of a previous figure. One downside of this type of array syntax, as described in more detail below, involves the construction of temporary arrays in complicated expressions.

The simple array addition above only scratches the surface of the set of compact and efficient functionality available with numpy arrays. Many functions in numpy (so-called "ufuncs") act efficiently in an element-wise fashion over arrays (e.g., arithmetic operations, np.exp(a), np.sin(b), etc.). Elaborate computations can sometimes be encoded in a single line of numpy code; consider this statement that uses random number generator functionality in numpy to create an adjacency matrix for an Erdös-Renyi random graph consisting of 100 nodes, where each node is connected to every other node with probability p=0.05:

adj = 1 * (np.random.random((100,100)) < 0.05)

Of course, just because you can easily create the adjacency matrix above does not mean that is the best way to encode such a graph, especially since for a small connection probability like p=0.05, the matrix will be quite sparse.

Array-level operations of the sort provided by numpy are found in other programming languages, such as MATLAB and Fortran 90. These array operations are sometimes characterized as a type of "vectorization", in the sense that one can specify a high-level operation on a data collection that is that processed efficiently through some other mechanism (which, in the case of numpy, involves iterated computations in a compiled library). Although conceptually somewhat similar, this should not be confused with what is more generally thought of as vectorization, whereby mathematical operations found in tight loops in scientific code are executed in parallel on special vector hardware found in CPUs and coprocessors.

Experiment with some of the numpy array operations. Is there a computation that you are currently doing using for-loops and array indexing that could be carried out more compactly and efficiently using array operations?

It is worth noting that, like the array addition example above, array multiplication is also element-wise, and does not involve a matrix multiplication as one might expect. If a matrix multiplication is desired, then the @ operator can be used (e.g., X @ Y to multiply two 2D arrays X and Y), or the np.matmul function can be used. For those with prior experience working with MATLAB, the guide to NumPy for MATLAB users provides a useful resource to accessing similar sorts of functionality in the two environments.

More complex operations among arrays can be carried out, sometimes involving sophisticated indexing, slicing, filtering, and broadcasting operations. Sometimes the necessary NumPy array operation is obvious, and sometimes work and testing are required to figure out how to express a complicated calculation in a set of array-level operations rather than resorting to explicit looping and indexing.

NumPy, for example, provides a function named outer to compute the multiplicative outer product of two vectors (1D arrays), where a 2D array is constructed such that the (i,j) entry in the array is the product a[i] * b[j], for vectors a and b. This can be done without the need for looping over all pairs of vector elements, implemented in numpy as such:

def outer(a,b):
    multiply(a.ravel()[:, newaxis], b.ravel()[newaxis, :], out)

The downside: Temporary creation

One downside to the convenience and conciseness of expression provided by these high-level array operations in NumPy is the hidden creation of temporary arrays, which can impact numerical performance. Consider this partial differential equation (PDE) describing the spatiotemporal dynamics of a field u:

\frac{\partial u}{\partial t} = D \nabla^{2} u + u (1 - u)

If we were integrating this PDE using a finite-difference method, for example, we could represent the field u with a NumPy array, write a function named Del2 that computes a finite-difference Laplacian of the data in an array, and do time-stepping with a forward Euler step that would closely resemble the underlying PDE as such:

u += dt * ( D * Del2(u) + u * (1 - u) )

In doing so, however, a series of temporary arrays will be created and destroyed as the right-hand side of this expression is evaluated in stages, slowing down performance. Performance can also be impacted by inefficient use of cache acting on multiple temporary arrays. These temporaries include:

Del2(u)
D * Del2(u)
1 - u
u * (1 - u)
(D * Del2(u)) + u*(1-u))
dt * (D * Del2(u)) + u*(1-u))

Compilation frameworks such as numba and cython — which will be addressed later in this tutorial — can achieve performance gains over numpy by unrolling these sorts of operations. For typical sorts of operations, these frameworks can increase performance substantially (e.g., run times decreased by a factor of 2 to 10).

Aside from the issue of increased run time, the creation of temporary arrays has an impact on the overall memory footprint of a CPython process. Because of the complex memory management system used by the CPython interpreter, however, it is difficult to predict what the overall impact on that footprint will be. Even with multiple temporary arrays created, as in the example above, in some cases CPython appears to be able to garbage collect a substantial amount of that memory as an expression is evaluated, such that the overall memory footprint does not grow to the extent implied by the number and size of temporaries created. We return to the issue of memory management in more detail later in this tutorial.

Back

© 2025 | Cornell University | Center for Advanced Computing | Copyright Statement | Access Statement
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)