Cornell Virtual Workshop > An Overview of AI > AI and Science

AI in Science

While AI is mostly receiving a lot of public attention due to the availability of applications that can generate text, images, and other content, the underlying suite of tools and techniques from machine learning (ML) and deep learning (DL) are also increasingly being integrated into the fabric of scientific research in a number of interesting and innovative ways. This includes the use of specific ML and DL techniques to address important problems in science and engineering, as well as broader methodological endeavors that are subsumed under terms such as "Scientific Machine Learning", "Scientific AI", and "AI for Science".

Many uses of machine learning in the physical sciences were summarized in a comprehensive 2019 review article. A more recent review article summarized many of the important trends concerning scientific discovery in the age of artificial intelligence. This involves not just using these tools to analyze data and make predictive models (which we will discuss more on the following pages), but also integrating such tools into the fabric of the scientific research process, broadly construed. That second review emphasized the following elements:

AI-aided data collection and curation for scientific research

Data selection
Data annotation
Data generation
Data refinements

Learning meaningful representations of scientific data

Geometric priors
Geometric deep learning
Self-supervised learning
Language modelling
Transformer architectures
Neural operators

AI-based generation of scientific hypotheses

Black-box predictors of scientific hypotheses
Navigating combinatorial hypothesis spaces
Optimizing differentiable hypothesis spaces

AI-driven experimentation and simulation

Efficient evaluation of scientific hypotheses
Deducing observables from hypotheses using simulations

Further integration of AI into the scientific process will surely continue. In many cases, insights and techniques from one problem domain can be productively recast for use in another domains. In one such example, researchers have been able to leverage techniques from Natural Language Processing (NLP) to build protein language models and chemical language models that are able to generate candidate sequences and chemical structures for proteins and small molecules, so as to accelerate the process of developing new molecules for biological and medical therapies.

Back