Python Distributions
Python is a general-purpose programming language that is widely used for many tasks, is installed by default in many OS distributions, and is used for many systems administration activities. Therefore, it is generally advisable to leave the default Python installation alone, especially since that default version might not be updated as frequently as users might want. Fortunately, multiple Python installations can happily coexist on a single system, since each installation can keep track of which libraries are installed for use with that version. With an alternate installation of Python, you can configure the system with additional libraries needed to support your research. Below, we describe various approaches to such configuration.
Most specific functionality tailored toward numerical computing and data science is encoded in third-party libraries of the sort described in Key Packages. The number and functionality of packages available for data science in Python continues to grow at an impressive rate. The downside of this are potential headaches associated with installing additional packages and managing dependencies among them. Fortunately, a number of tools and utilities have been developed to assist with that sort of package management. Better still, a number of freely available Python distributions are available which bundle together most if not all of the packages one might need to carry out some scientific computation and analysis of interest. These include Python distributions produced by Anaconda, Enthought, Active State, and Intel. In addition to providing convenient support for installation of the Python scientific computing ecoystem, many of these distributions link to highly-optimized numerical libraries, such as the Intel Math Kernel Library (MKL). So if you want to get up and running quickly, you might want to start with one of these distributions.
If you use a shared system, check to see if a suitable Python ecosystem has already been constructed for your use. On managed computational clusters, for example, different Python installations might be installed and managed through an environment module system, such as the Lmod utilities on the clusters supported by the Texas Advanced Computing Center (TACC) and the San Diego Supercomputing Center (SDSC).
Environments
While multiple Python installations can happily coexist on a single system, that happiness can begin to dissolve if there are other dependencies that need to be kept separate, or if you need to maintain separate installations for separate projects. The best way to address this sort of situation is to construct and manage separate environments, such as those supported through python virtual environments or conda environments. If you are using the Anaconda python distribution, or have installed the miniconda system, you will want to create conda environments, and use the conda package manager to coordinate things. Otherwise, creating python virtual environments is the way to go, using the pip package manager to install packages once you have created a new environment.
Python virtual environments
For python3, the venv module — part of the Python Standard Library — is used to create virtual environments. One or more virtual environments can be created, with packages installed into them, and any one of those environments can be activated within a shell at any time. The pip package manager is used to install packages, which are available through the Python Package Index (PyPI) at pypi.org. To create an environment named "myenv" and install a few packages, you would do something like the following:
Conda environments
The conda package manager is part of both the Anaconda python distribution and the smaller miniconda system. The Anaconda distribution installs by default a large number of packages, getting you up to speed quickly for many different sorts of tasks. The miniconda system installs the conda package manager but not much else, enabling you to efficiently build up new custom environments for different projects. In either case, creating a new conda environment goes like this:
It is worth noting that conda-based system also comes with pip installed, which is useful if you need to install packages that are not available directly through the conda system.
setup.py
Packages not installable via pip might provide support for installation through a file named setup.py
in the package directory. Typical commands for building and installing a package via setup.py
might look like:
# first, cd to package directory
python setup.py build
python setup.py install --user # install to $HOME/.local
# or, alternatively, install to a custom location
# first, cd to package directory
python setup.py build
python setup.py install --prefix=$INSTALLDIR # custom location; add to PYTHONPATH
There are multiple packages available that coordinate the configuration and compilation of other Python packages. The old standard is distutils
, which is included in the Python standard library and is used under the covers by setup.py
in a wide range of legacy packages. A newer standard is setuptools
, which pip
typically relies on for package configuration. Both distutils
and setuptools
provide support, e.g., for specifying different compiler and linker options if those should be necessary.
sys.path
Any given Python installation will know where its associated standard and third-party libraries are installed, and will add those locations to the sys.path
variable that the Python interpreter consults when looking for modules to import. sys.path
can contain a mixture of directories storing both Python libraries intended for all users of the system and user-specific libraries (typically stored in the user's own directory space). The Python standard library is typically stored in ${PREFIX}/lib/pythonX.Y
, where ${PREFIX}$
refers to the root of the Python installation and X.Y
refers to the major and minor version numbers of the installation. Third-party libraries intended for all users on the system are typically installed in ${PREFIX}/lib/pythonX.Y/site-packages
. Some installations provide a default local directory for user-specific libraries, but users can always set an environmental variable PYTHONPATH
that specifies a list of additional directories that are added to sys.path
.