Cornell Virtual Workshop: Python for High Performance

Roadmap: Python for High Performance

OverviewThird Party LibrariesCompiling Custom CodeWriting Faster PythonParallel PythonPerformance AssessmentPython at TACCExercise & Notebook

Python is a very popular programming language for scientific computing, due to both the expressiveness of the language itself and the availability of a rich ecosystem of packages, tools, and libraries that have been developed by the community to support a wide array of different computational tasks. Python is an interpreted language, however, and therefore Python programs are intrinsically slower than equivalent programs written in a compiled language. This roadmap introduces packages, tools, and strategies that are useful for achieving high computational performance with Python, both on workstations and on multiprocessor clusters.

Objectives

After you complete this workshop, you should be able to:

Call efficient, third-party libraries for number-crunching from the Python scientific computing ecosystem.
Compile parts of your custom code to be callable from Python, allowing for increased performance.
Write faster pure Python code by understanding performance implications of data structures, lazy evaluation, and memory management.
Use tools for executing parallel computations in Python.
Use profiling and timing tools in Python to assess performance of different algorithms and implementations.
Run Python programs on the Frontera supercomputer at TACC.

Prerequisites

This tutorial assumes the reader has some prior experience programming in Python. A working knowledge of UNIX/Linux and general programming concepts is assumed. Some of the material assumes prior exposure to parallel programming, especially in MPI, but much of the other material will be useful to people without such exposure, and additional information is available to those who need to come up to speed on that topic. The target audience is scientists and engineers who are already programming in Python, and are interested in achieving improved computational performance, both on personal workstations and on high performance computing systems. If additional introductory material about Python is needed, readers can consult An Introduction to Python as well as the documentation on the python.org website.

Requirements

System requirements include:

Python can be run on computers from typical laptops up to the most powerful High Performance Computing (HPC) systems. Being able to run the code examples described in this roadmap will require either being able to install Python and related packages on your local machine, or having access to a managed system that has the relevant packages installed. If you wish to run python on the Frontera system at TACC, you will need an allocation to run there.