Cornell Virtual Workshop > Python for Data Science > Overview

Facets of Data Science

Data science is focused on making sense of complex datasets and in building predictive models from those data. As such, it encompasses a wide array of different activities, from the upstream processes of data acquisition, cleaning and integration to downstream processes of data analysis, modeling and prediction. There are many facets of data science, including:

Identifying the structure of data
Accessing and importing data
Cleaning, filtering, reorganizing, augmenting, and aggregating data
Visualizing data
Data analysis, statistics, and modeling
Machine Learning
Assembling data processing pipelines to link these steps
Leveraging high-end computational resources for large-scale problems

Often, different tools address different parts of this process. Therefore, interoperability among tools, based on common data structures and interfaces, is an important element in enabling the construction of complex, multifaceted data analysis pipelines. It is in this sense that we can talk about an ecosystem for data science. For any particular application, you might need only a subset of these operations.

Back