Python Ecosystem
The Python ecosystem for scientific computing consists of third-party libraries, such as NumPy and SciPy, which can be called to support particular algorithms or applications, as well as frameworks and tools that facilitate the development of numerically efficient programs, which will be discussed in subsequent sections. While NumPy and SciPy represent the core of the Python Scientific Computing Ecosystem, and have been in wide use for several years, that ecosystem continues to grow and mature, with support for a large number of applications in data science, machine learning, image processing, visualization, and network analysis, to name a few. Many of these packages use NumPy arrays as a common data structure. The availability of good tools can not only spare you from having to implement relevant methods yourself, but they can enable you to leverage the expertise of others in these diverse fields. Below is a very incomplete list of some of the available packages that you might consider using in your own research.
Data Structures and Algorithms
- The Python Standard Library (select specific version on that page)
- NumPy: multi-dimensional arrays, array-level mathematical operations, linear algebra, random numbers, etc.
- SciPy: a Python-based ecosystem of open-source software for mathematics, science, and engineering
- Pandas: dataframes and series for representing tabular data, and rich set of operations for analyzing such data
- H5py: support for HDF5-formatted data files
- Dask: distributed arrays and dataframes, scheduling of distributed workflows
- Sympy: symbolic mathematics
- NetworkX: creation, manipulation, and study of the structure, dynamics, and functions of complex networks
- igraph: a network analysis library, written in C++, with front-ends in Python, R, Mathematica
Interpreters, Notebooks and Development Environments
- IPython: powerful interactive Python shell
- Jupyter: interactive notebooks integrating code, results, graphics and documentation
- Spyder: integrated development environment (IDE) combining editing, analysis, debugging, and profiling functionality
Data Visualization
- Matplotlib: 2D plotting library that makes easy things easy and hard things possible
- Seaborn: data visualization library based on matplotlib, providing a high-level interface for informative statistical graphics
- Bokeh: interactive visualization library that targets modern web browsers for presentation
- Plotly: interactive visualization library that targets modern web browsers for presentation
Relational Databases
- sqlalchemy: Python SQL toolkit and Object Relational Mapper, providing access to SQL databases
- sqlite3: Python interface to SQLite, a C library that provides a lightweight disk-based SQL database
Scientific Computing and Statistics
- Statsmodels: classes and functions for the estimation of statistical models, conducting statistical tests, and statistical data exploration
- Scikits: add-on packages for SciPy, hosted and developed separately and independently from the main SciPy distribution, providing more specialized functionality in a large number of topic areas
Machine Learning
- Scikit-learn: many different machine learning methods in Python
- TensorFlow: Deep Learning in Python (an end-to-end open source platform for machine learning)
- PyTorch: open source machine learning framework that accelerates the path from research prototyping to production deployment
- Keras: high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano
- Caffe: deep learning framework made with expression, speed, and modularity in mind
Image Processing
- Scikit-image: collection of algorithms for image processing
- Pillow: a fork of PIL, the Python Imaging Library
Natural Language Processing
- Natural Language Toolkit (NLTK): platform for building Python programs to work with human language data
- Spacy: industrial-strength natural language processing in Python
- Textblob: Python library for processing textual data
- python-Levenshtein: for computing string similarities and edit distances
Systems for Big Data
- PySpark: Python interface to the Spark programming model