Datasets
Chris Myers (CAC), Jeff Sale (SDSC)
Cornell Center for Advanced Computing and San Diego Supercomputing Center
Revisions: 6/2023, 2/2020 (original)
Throughout this tutorial, we will be working with some real-world datasets in order to illustrate various concepts, methods, and tools supported by the Python ecosystem. Our goal is to provide an overview of some of these tools, with links to more detailed information elsewhere, along with some concrete examples of how they can be used in various data science applications. Therefore, rather than discussing tools and methods in the abstract, we will introduce them specifically in the context of these datasets, while hoping that the approaches are broadly applicable.
Objectives
After you complete this segment, you should be able to:
- Describe the different datasets being used to illustrate various concepts and techniques
- Access the datasets and Jupyter notebooks being used here
Prerequisites
If you would like to access this roadmap's data files and associated Jupyter notebooks, some familiarity with git and Jupyter might be helpful.