At its core, Python is a programming language with a defined syntax and specification. Python is a general-purpose programming language, by which we mean that it was developed to support a broad spectrum of applications, not just those in data science or related fields of statistical and numerical computing. Python is interpreted, allowing for rapid prototyping of algorithms and interrogation of datasets, facilitated through an expressive syntax and a variety of high-level built-in data types. And Python is object-oriented, providing support for defining new types of data and associated behaviors; as such, it is useful for capturing useful abstractions needed in complex scientific and numerical applications.

Python also provides support for defining external libraries that can be imported into and called from within Python programs, including libraries written in other programming languages. Much of the power of Python for specific application domains, such as data science, comes from these external third-party libraries, which leverage the expertise of developers from a variety of fields. External libraries written in compiled languages such as C/C++ and Fortran allow for numerically efficient computational kernels that can be accessed from within the python interpreter, providing both high-level control and low-level performance. The scientific computing community was an early and enthusiastic adopter of Python, and has pushed the development of a wide range of tools and packages to support numerical computing and data science.

Key Python packages and tools for use in data science are listed in this tutorial in the section on Key Packages. Information on installing and configuring Python distributions, many of which are geared toward data science and come with these key packages preinstalled, is provided below in the section on Python Distributions. And further related information — on writing numerically efficient Python programs, the integration of interpreted Python with compiled code, on core libraries for scientific computing and parallel processing — are described in greater detail in our companion tutorial on Python for High Performance.

 
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Inclusivity Statement