Cornell Virtual Workshop > An Overview of AI > Introduction to AI

Abuzz with AI

AI has been a recognized scientific field of study for many decades, and has long played a role in popular culture in many works of science fiction, often associated with ominous threats to humanity and human life. Since late 2022, however, AI has created a very loud new buzz, with many stories about new advances, new opportunities, and new concerns. The buzz is best associated with the release of ChatGPT, an application developed by the company OpenAI to respond to textual prompts and to generate new and often convincing text in response, but in reality there are many factors that are contributing to the recent increase in interest and excitement about AI and its possibilities.

Central to this recent explosion of interest is the availability of AI applications, such as ChatGPT, that can be used without needing to write code or without understanding what technologies are used under the hood to implement those applications. Other applications have also garnered widespread attention, such as DALL-E and Stable Diffusion for the generation of images and art based on textual prompts, Github Co-Pilot and other tools to automatically generate computer code based on descriptions or code text, and AlphaFold to carry out protein structure prediction based on a given amino acid sequence. These applications all attain their power by leveraging the tools of machine learning (ML) and deep learning (DL), but users do not need to know about those implementation details in order to make use of them. Furthermore, because many of these applications operate using natural language, they are often broadly accessible to users without requiring specific technical training. That being said, for those interested in writing code to support work in AI, there are a number of useful open-source software packages to enable the implementation of ML and DL workflows.

Another key advance has been the emphasis on Generative AI, which refers to the ability of tools to generate new content (such as text, images, video, code, etc.) rather than just to analyze existing content. Much of the core ML and DL functionality deployed previously has been focused on learning about patterns in data, in order to either better understand the structure of those data, or to make predictions about new and unseen data. Generative AI models are able take those processes a step further, by leveraging what has been learned about data in order to generate new content with similar structure. It has been said that ChatGPT and related text-generation tools have, by analyzing lots and lots of text, developed an ability to predict the next word in a sentence, in a manner that is often convincing and plausible. That process of text generation is open-ended. In contrast, the computer vision system in a self-driving car needs to be good at identifying all of the relevant objects in its field of view (and needs to continue to do so as its field of view changes), but that process is not generative — it is merely perceiving and identifying existing content. In more technical terms, generative models are able to generate new content by not just representing data samples themselves. Instead, generative models estimate probability distributions in some latent space that enable the models to generate samples that are statistically similar to the observed data, without being constrained to refer only to the discrete data samples themselves. By building internal representations of the "space between" the data, generative models are able to create content beyond just that which they have already seen, even if that sometimes leads to weird creations that seem disconnected from the data they were trained on.

Yet another important advance is the increasing availability of Large Language Models and other sorts of Foundation Models. A model in the context of AI, ML, and DL is a specific "machine" (a system, a program, a device, a neural network) that has been trained to operate in some defined problem space, and is thus capable of responding to inputs and producing outputs in that problem space. Those outputs might be the labels that a computer vision system assigns to different parts of an image that it receives, the 3D geometric structure that a protein structure prediction algorithm makes based on an input sequence, or the next word in a sentence based on a text prompt. Different models are trained to do different tasks, and to work with different sorts of input data. Earlier in the history of ML and DL, models were developed and trained to perform specific, "narrow" tasks. A handwriting recognition program might be trained to recognize and assign handwritten letters and digits, but it typically would not do a very good job identifying objects in the field of view of a self-driving car, nor would it typically be able to paint a picture of letters and digits.

As more training data have become available, and computational models have been sufficiently powerful to make use of those data, foundation models have emerged to serve as the basis for a broader spectrum of tasks. Computer vision models trained on lots of natural images, but not specifically letters and digits, might have developed an effective enough internal representation of images that they can say something useful about letters and digits, perhaps without much additional work or training needed. Large Language Models (LLMs) are foundation models built from extremely large collections of text spanning many different subject areas, such that they can process and generate text arising in many different contexts. These capabilities are used for developing chatbots, summarizing large collections of documents, and drafting documents based on prompts. Often, one can use an existing foundation model as a starting point for some fine-tuning or transfer learning to a new domain, leveraging what has already been learned, without resorting to building and training a new model from scratch. Many companies, organizations, and research groups are rethinking how to approach model development in the face of this ongoing availability of different foundation models with different capabilities. Whereas much of the focus in developing models was previously involved with the internals of deep learning architectures to maximize performance, the availability of pre-trained foundation models has shifted some of the development focus to the use of such models, emphasizing new strategies for prompt engineering and transfer learning. Despite the current enthusiasm for LLMs and other foundation models, however, some researchers have been recognizing that smaller, more targeted models — perhaps built from large models via transfer learning, or combined with other models focused on doing particular tasks well — might ultimately be more productive than large foundational monoliths that are unwieldy to work with and might not do anything astoundingly well.

This explosion of tools for Generative AI and the availability of pre-trained Foundation Models to power them has led to some enthusiasm that the field is beginning to move beyond what is referred to as Narrow AI (or Weak AI), whereby machines are capable of carrying out only narrowly defined tasks (even if they are able to carry out those tasks extremely well). General AI (or Strong AI) would represent a next major step in the evolution of machine intelligence, whereby machines could autonomously learn to carry out new tasks about which they were not explicitly built or trained, or perhaps even to think about and comprehend the world around them, much like humans. (This set of capabilities also goes under the name of Artificial General Intelligence, or AGI.) Finally, Super AI, or Artificial Superintelligence (ASI) refers to a machine intelligence that would surpasses human intelligence in any task. Super AI is still the stuff of science fiction, but where training ends, and intelligence and cognition begin, are active areas of study.

Back