Chris Myers (CAC), Jeff Sale (SDSC)
Cornell Center for Advanced Computing and San Diego Supercomputing Center

Revisions: 6/2023, 1/2021 (original)

Data science involves the integration of a variety of techniques for processing and analyzing a wide variety of different types of data. Python has emerged as one of the key technologies supporting data science, in large part due to its rich ecosystem of tools and libraries to facilitate research and production workflows in data science. In this topic, we introduce some key elements of the field of data science and of the Python ecosystem to support that work.

Objectives

After you complete this roadmap, you should be able to:

  • Identify different facets of data science, and different forms of data
  • Describe the characteristics of Python distributions and the Python data science ecosystem
  • Identify some of the key packages for carrying out data science using Python
Prerequisites

This tutorial assumes the reader has some working knowledge of general programming concepts, even if not directly with the Python programming language. The target audience is scientists and engineers who are already programming in Python, and are interested in using Python tools and packages to carry out various analyses of datasets. If additional introductory material about Python is needed, readers can consult An Introduction to Python as well as the documentation on the python.org website.

 
©   Cornell University  |  Center for Advanced Computing  |  Copyright Statement  |  Inclusivity Statement