Machine Learning
Chris Myers (CAC), Jeff Sale (SDSC) 
 Cornell Center for Advanced Computing and San Diego Supercomputing Center
Revisions: 6/2023, 1/2021 (original)
Machine learning (ML) involves the use of algorithms that can learn about patterns in data, without being specifically instructed about the details of those patterns. As such, machine learning straddles the fields of artificial intelligence and data science, and makes connections to a variety of different types of algorithms, such as those for statistical modeling, optimization and inference. In this topic, we will describe some tools in the Python ecosystem for carrying out some problems in machine learning.
Objectives
After you complete this segment, you should be able to:
- Distinguish between supervised and unsupervised machine learning
- Use sklearn to build a classifier, to cluster data, or to carry out dimensionality reduction with data
- Integrate machine learning methods with other tools in the Python ecosystem to analyze data
- Understand connections among machine learning, deep learning, and big data
Prerequisites
This tutorial assumes the reader has some working knowledge of general programming concepts, even if not directly with the Python programming language. The target audience is scientists and engineers who are already programming in Python, and are interested in using Python tools and packages to carry out various analyses of datasets. If additional introductory material about Python is needed, readers can consult An Introduction to Python as well as the documentation on the python.org website.
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)