Cornell Virtual Workshop > Python for High Performance > Third Party Libraries

Python Ecosystem

The Python ecosystem for scientific computing consists of third-party libraries, such as NumPy and SciPy, which can be called to support particular algorithms or applications, as well as frameworks and tools that facilitate the development of numerically efficient programs, which will be discussed in subsequent sections. While NumPy and SciPy represent the core of the Python Scientific Computing Ecosystem, and have been in wide use for several years, that ecosystem continues to grow and mature, with support for a large number of applications in data science, machine learning, image processing, visualization, and network analysis, to name a few. Many of these packages use NumPy arrays as a common data structure. The availability of good tools can not only spare you from having to implement relevant methods yourself, but they can enable you to leverage the expertise of others in these diverse fields. Below is a very incomplete list of some of the available packages that you might consider using in your own research.

Data Structures and Algorithms

The Python Standard Library (select specific version on that page)
NumPy: multi-dimensional arrays, array-level mathematical operations, linear algebra, random numbers, etc.
SciPy: a Python-based ecosystem of open-source software for mathematics, science, and engineering
Pandas: dataframes and series for representing tabular data, and rich set of operations for analyzing such data
H5py: support for HDF5-formatted data files
Dask: distributed arrays and dataframes, scheduling of distributed workflows
Sympy: symbolic mathematics
NetworkX: creation, manipulation, and study of the structure, dynamics, and functions of complex networks
igraph: a network analysis library, written in C++, with front-ends in Python, R, Mathematica

Interpreters, Notebooks and Development Environments

IPython: powerful interactive Python shell
Jupyter: interactive notebooks integrating code, results, graphics and documentation
Spyder: integrated development environment (IDE) combining editing, analysis, debugging, and profiling functionality

Data Visualization

Matplotlib: 2D plotting library that makes easy things easy and hard things possible
Seaborn: data visualization library based on matplotlib, providing a high-level interface for informative statistical graphics
Bokeh: interactive visualization library that targets modern web browsers for presentation
Plotly: interactive visualization library that targets modern web browsers for presentation

Relational Databases

sqlalchemy: Python SQL toolkit and Object Relational Mapper, providing access to SQL databases
sqlite3: Python interface to SQLite, a C library that provides a lightweight disk-based SQL database

Scientific Computing and Statistics

Statsmodels: classes and functions for the estimation of statistical models, conducting statistical tests, and statistical data exploration
Scikits: add-on packages for SciPy, hosted and developed separately and independently from the main SciPy distribution, providing more specialized functionality in a large number of topic areas

Machine Learning

Scikit-learn: many different machine learning methods in Python
TensorFlow: Deep Learning in Python (an end-to-end open source platform for machine learning)
PyTorch: open source machine learning framework that accelerates the path from research prototyping to production deployment
Keras: high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano
Caffe: deep learning framework made with expression, speed, and modularity in mind

Image Processing

Scikit-image: collection of algorithms for image processing
Pillow: a fork of PIL, the Python Imaging Library

Natural Language Processing

Natural Language Toolkit (NLTK): platform for building Python programs to work with human language data
Spacy: industrial-strength natural language processing in Python
Textblob: Python library for processing textual data
python-Levenshtein: for computing string similarities and edit distances

Systems for Big Data

PySpark: Python interface to the Spark programming model

Back

© | Cornell University | Center for Advanced Computing | Copyright Statement | Access Statement
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)