Cornell Virtual Workshop > Python for Data Science > Overview

There are several key libraries and packages in the Python ecosystem for data science, although the list is far from complete.

The Python Standard Library (select specific version on that page)
NumPy: multi-dimensional arrays, array-level mathematical operations, linear algebra, random numbers, etc.
SciPy: a Python-based ecosystem of open-source software for mathematics, science, and engineering
Pandas: dataframes and series for representing tabular data, and rich set of operations for analyzing such data
H5py: support for HDF5-formatted data files
Dask: distributed arrays and dataframes, scheduling of distributed workflows
Sympy: symbolic mathematics
NetworkX: creation, manipulation, and study of the structure, dynamics, and functions of complex networks
igraph: a network analysis library, written in C++, with front-ends in Python, R, Mathematica

IPython: powerful interactive Python shell
Jupyter: interactive notebooks integrating code, results, graphics and documentation
Spyder: integrated development environment (IDE) combining editing, analysis, debugging, and profiling functionality

Matplotlib: 2D plotting library that makes easy things easy and hard things possible
Seaborn: data visualization library based on matplotlib, providing a high-level interface for informative statistical graphics
Bokeh: interactive visualization library that targets modern web browsers for presentation
Plotly: interactive visualization library that targets modern web browsers for presentation

sqlalchemy: Python SQL toolkit and Object Relational Mapper, providing access to SQL databases
sqlite3: Python interface to SQLite, a C library that provides a lightweight disk-based SQL database

Statsmodels: classes and functions for the estimation of statistical models, conducting statistical tests, and statistical data exploration
Scikits: add-on packages for SciPy, hosted and developed separately and independently from the main SciPy distribution, providing more specialized functionality in a large number of topic areas

Scikit-learn: many different machine learning methods in Python
TensorFlow: Deep Learning in Python (an end-to-end open source platform for machine learning)
PyTorch: open source machine learning framework that accelerates the path from research prototyping to production deployment
Keras: high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano
Caffe: deep learning framework made with expression, speed, and modularity in mind

Natural Language Toolkit (NLTK): platform for building Python programs to work with human language data
Spacy: industrial-strength natural language processing in Python
Textblob: Python library for processing textual data
python-Levenshtein: for computing string similarities and edit distances