Cornell Virtual Workshop > Python for Data Science

Machine Learning

Chris Myers (CAC), Jeff Sale (SDSC)
Cornell Center for Advanced Computing and San Diego Supercomputing Center

Revisions: 6/2023, 1/2021 (original)

Machine learning (ML) involves the use of algorithms that can learn about patterns in data, without being specifically instructed about the details of those patterns. As such, machine learning straddles the fields of artificial intelligence and data science, and makes connections to a variety of different types of algorithms, such as those for statistical modeling, optimization and inference. In this topic, we will describe some tools in the Python ecosystem for carrying out some problems in machine learning.

Objectives

After you complete this segment, you should be able to:

Distinguish between supervised and unsupervised machine learning
Use sklearn to build a classifier, to cluster data, or to carry out dimensionality reduction with data
Integrate machine learning methods with other tools in the Python ecosystem to analyze data
Understand connections among machine learning, deep learning, and big data

Prerequisites

This tutorial assumes the reader has some working knowledge of general programming concepts, even if not directly with the Python programming language. The target audience is scientists and engineers who are already programming in Python, and are interested in using Python tools and packages to carry out various analyses of datasets. If additional introductory material about Python is needed, readers can consult An Introduction to Python as well as the documentation on the python.org website.