Algorithms and Performance
You might not be interested in computational performance for its own sake, but rather as a means to an end, namely a shorter timeline to scientific productivity. Many factors contribute to that overall timeline:
- your effort developing and debugging code, and analyzing results
- the intrinsic efficiency of algorithms you are using
- the performance and optimization of the code you have developed
Many scientists have chosen to use Python because it facilitates step (1): the rapid development of complex programs and interrogation of data. They then skip ahead to step (3), and start looking for resources (such as this topic) for help in mitigating the performance drawbacks of an interpreted language. But it is important not to overlook step (2). The choice of appropriate algorithms is crucial in any context, and you should not focus on optimizing your implementation until you understand the implications of the algorithmic choices you have made. Furthermore, the existence of many compiled numerical libraries for use with Python means that implementations of efficient algorithms might already be available for your use, relieving you of the need and the burden to develop them yourself. We will look at some examples of the power of already-developed software in the section on third-party libraries.
An important aspect of the analysis of algorithms is to characterize their asymptotic complexity, that is, how run time (or other computational resources) scales with the size of a problem, in the asymptotic limit of infinitely large problem sizes. This scaling is then summarized using "big O" notation; for example, an algorithm with time complexity O(N) requires a time that grows linearly with problem size N for large N. In general, an algorithm whose run time grows more slowly with N is preferable, especially if large problems are to be tackled; we'll see some concrete examples of this in an upcoming section. Asymptotic complexity analysis, however, says nothing about the prefactors to these scaling relations: one algorithm might be more efficient asymptotically than another, but might only provide shorter run times for extremely large problem sizes that are not encountered in practice.
Even with appropriate choice of algorithms and libraries, there are a number of factors that can impact the run time performance of your code, especially as algorithms and computer architectures become increasingly complex. Computations involve the interplay of many calls to functions and methods, and the overall run time depends in part on both the time spent in each function and the number of times those functions are called. Thus, it is also quite important to take an empirical perspective on run time optimization to complement the insights offered by algorithmic complexity analysis. This can be achieved through the use of tools for performance assessment, such as those providing profiling and timing information, which we will address in more detail in a later section. For now, we only mention a very useful magic function in the ipython interpreter named %timeit
, which we will use repeatedly throughout this
tutorial to get timing information on different code snippets.
Finally, an important characteristic of algorithms is whether or not they are parallelizable. Some algorithms might have better run time performance or complexity on a single processor, but cannot be easily extended to make efficient use of multiple cores or processors on a high-performance computational resource. While this topic is not focused on the parallelization of algorithms, we will highlight some tools for use in Python that allow for a variety of different styles of parallel computation.