Abuzz with AI
AI has been a recognized scientific field of study for many decades, and has long played a role in popular culture in many works of science fiction, often associated with ominous threats to humanity and human life. Since late 2022, however, AI has created a very loud new buzz, with many stories about new advances, new opportunities, and new concerns. The buzz is best associated with the release of ChatGPT, an application developed by the company OpenAI to respond to textual prompts and to generate new and often convincing text in response, but in reality there are many factors that are contributing to the recent increase in interest and excitement about AI and its possibilities. Let's try to dig into this a bit more.
Generative AI applications, with natural language interfaces, using trained models, creating new opportunities
Much excitement centers around new capabilities in Generative AI, which refers to the ability of tools to generate new content (such as text, images, video, code, molecular sequences, mathematical proofs, etc.) rather than just to analyze existing content. Much of the core ML and DL functionality deployed previously has been focused on learning about patterns in data, in order to either better understand the structure of those data, or to make predictions about new and unseen data. Generative AI models are able take those processes a step further, by leveraging what has been learned about data in order to generate new content with similar structure. It has been said that ChatGPT and related text-generation tools have, by analyzing lots and lots of text, developed an ability to predict the next word in a sentence, in a manner that is often convincing and plausible. That process of text generation is open-ended. In contrast, the computer vision system in a self-driving car needs to be good at identifying all of the relevant objects in its field of view (and needs to continue to do so as its field of view changes), but that process is not generative — it is merely perceiving and identifying existing content. In more technical terms, generative models are able to generate new content by not just representing data samples themselves. Instead, generative models estimate probability distributions in some latent space that enable the models to generate samples that are statistically similar to the observed data, without being constrained to refer only to the discrete data samples themselves. By building internal representations of the "space between" the data, generative models are able to create content beyond just that which they have already seen, even if that sometimes leads to weird creations that seem disconnected from the data they were trained on.
Central to this recent explosion of interest is the availability of applications, with natural language interfaces, such as ChatGPT, that can be used without needing to write code or without understanding what technologies are used under the hood to implement those applications. Other applications have also garnered widespread attention, such as DALL-E and Stable Diffusion for the generation of images and art based on textual prompts, Github Co-Pilot and other tools to automatically generate computer code based on descriptions or code text, and AlphaFold to carry out protein structure prediction based on a given amino acid sequence. These applications all attain their power by leveraging the tools of machine learning (ML) and deep learning (DL), but users do not need to know about those implementation details in order to make use of them. Furthermore, because many of these applications operate using natural language, they are often broadly accessible to users without requiring specific technical training, or knowing how to write code to interact through an Application Programming Interface (API).
Because natural language can be vague, imprecise, and elaborate, users of these NLP-enabled applications have developed different strategies for phrasing prompts to provide to Generative AI tools, as part of a broadly defined field known as prompt engineering. Engineering useful prompts can involve: providing information about the structure of intended outputs: providing examples that can used as a basis for reasoning and output generation; providing Chain-of-Thought (CoT) prompts that suggest a series of sub-steps to be used in responding to a complex query; and Retrieval-augmented generation (RAG), whereby an additional information is retrieved — say, through a web search — as a way to augment and provide context to a tool that is responding to a prompt.
Despite the broad appeal of natural language interfaces, for those interested in writing code to support work in AI, there are a number of useful open-source software packages to enable the implementation of ML and DL workflows.
Yet another important advance is the support for using trained models, such as Large Language Models and other sorts of Foundation Models. A model in the context of AI, ML, and DL is a specific "machine" (a system, a program, a device, a neural network) that has been trained to operate in some defined problem space, and is thus capable of responding to inputs and producing outputs in that problem space. Those outputs might be the labels that a computer vision system assigns to different parts of an image that it receives, the 3D geometric structure that a protein structure prediction algorithm makes based on an input sequence, or the next word in a sentence based on a text prompt. Different models are trained to do different tasks, and to work with different sorts of input data. Earlier in the history of ML and DL, models were developed and trained to perform specific, "narrow" tasks. A handwriting recognition program might be trained to recognize and assign handwritten letters and digits, but it typically would not do a very good job identifying objects in the field of view of a self-driving car, nor would it typically be able to paint a picture of letters and digits.
As more training data have become available, and computational models have been sufficiently powerful to make use of those data, foundation models have emerged to serve as the basis for a broader spectrum of tasks. Computer vision models trained on lots of natural images, but not specifically letters and digits, might have developed an effective enough internal representation of images that they can say something useful about letters and digits, perhaps without much additional work or training needed. Large Language Models (LLMs) are foundation models built from extremely large collections of text spanning many different subject areas, such that they can process and generate text arising in many different contexts. These capabilities are used for developing chatbots, summarizing large collections of documents, and drafting documents based on prompts.
Foundation models are named as such because one can often use an existing foundation model as a starting point for some further downstream processing. Common types of downstream processes using foundation models include:
- In-context learning: whereby a system is capable of carrying out a task simply by receiving a prompt, even if it was not specifically trained to carry out that task
- Transfer learning: whereby a model trained for one set of tasks can be applied to other tasks, due to the broad training of a foundation model
- Fine-tuning: whereby additional small-scale training, starting from a pre-trained foundation model, can be used for more specialized or domain-specific tasks
- Zero-shot, one-shot, few-shot learning: whereby outputs can be elicited from a model by providing a small number of examples
Of course, much of the recent enthusiasm involves how AI developments are creating new opportunities. These include opportunities for making lots of money and defining new markets (hence the overwhelming amount of corporate interest), for producing novel content (of broad interest for both professional and personal reasons), and for doing science. Looking beyond short-term opportunities, this explosion of tools for Generative AI and the availability of pre-trained Foundation Models to power them has led to some enthusiasm that the field is beginning to move beyond what is referred to as Narrow AI (or Weak AI), whereby machines are capable of carrying out only narrowly defined tasks (even if they are able to carry out those tasks extremely well). Artificial General Intelligence (AGI) would represent a next major step in the evolution of machine intelligence, whereby machines could autonomously learn to carry out new tasks about which they were not explicitly built or trained, or perhaps even to think about and comprehend the world around them, much like humans. Finally, Super AI, or Artificial Superintelligence (ASI) refers to a machine intelligence that would surpasses human intelligence in any task. Super AI is still the stuff of science fiction, but where training ends, and intelligence and cognition begin, are active areas of study.
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)