Data science is focused on making sense of complex datasets and in building predictive models from those data. As such, it encompasses a wide array of different activities, from the upstream processes of data acquisition, cleaning and integration to downstream processes of data analysis, modeling and prediction. There are many facets of data science, including:

  • Identifying the structure of data
  • Accessing and importing data
  • Cleaning, filtering, reorganizing, augmenting, and aggregating data
  • Visualizing data
  • Data analysis, statistics, and modeling
  • Machine Learning
  • Assembling data processing pipelines to link these steps
  • Leveraging high-end computational resources for large-scale problems

Often, different tools address different parts of this process. Therefore, interoperability among tools, based on common data structures and interfaces, is an important element in enabling the construction of complex, multifaceted data analysis pipelines. It is in this sense that we can talk about an ecosystem for data science. For any particular application, you might need only a subset of these operations.

 
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Inclusivity Statement