It’s been only a year since ChatGPT was first commercially released, making generative AI widely available to the public. Yet in Amazon’s most recent earnings call, Chief Executive Officer Andy Jassy said he believes it’s an opportunity that will already mean “tens of billions of dollars of revenue for AWS over the next several years.” “Our generative AI business,” he noted, “is growing very, very quickly.”
Amazon is not alone. Microsoft has been a leading investor in what Bill Gates has called “the most important advance in technology since the graphical user interface.”
Share prices of Microsoft and Alphabet, another major AI investor, have surged this year with Alphabet up 57% and Microsoft 38%. And it’s not just lighting up Wall Street: AI, and especially generative AI, is everywhere, with search terms like “chat gpt” up 3,150% on Google over the past year.
It’s been likened to magic and the atomic bomb, but how, exactly, does it work? This article looks to answer that question, for any who experts in the field are not already.
What do we mean by AI?
With AI (artificial intelligence), we refer to an intelligence that’s a constructed thing (software, machine) that can learn, reason, and problem-solve. It can be broken down further into weak AI, or an intelligence with a narrow focus (like those built to master chess, or self-driving cars), and strong AI, which is expected to be more robust, or diverse, capable of considering problems it may not have been specifically built to address.
Machine learning is a subset of AI that’s focused on learning from data and experience, and you can check out this PTP article for more on how it works and is different from AI in general.
Deep learning is a subset of machine learning (we’ll get to this below).
For an overview of AI in general, check out this PTP article.
What about generative AI?
Looking specifically at generative AI, we mean an AI capable of creating something, using an input or prompt (such as a text request you may give it).
OpenAI’s ChatGPT and DALL-E are examples of generative AIs that take text in from users, creating text or image back as outputs. (Note that text is not the only thing that can be used as input. A user can input images or audio, for example, into some generative AIs to transcribe, extract, or transform them into other things entirely.)
If you haven’t done so, give GPT-4 a try on Bing for free right now.
As a demonstration, I just entered “write a poem about snowfall on Halloween” and it gave me back four stanzas, including:
The children dressed in costumes,
Trick or treating in the snow,
Their laughter echoes through the streets,
As they watch the snowflakes glow.
The pumpkins carved with care,
Now covered in a blanket of white,
The snowflakes fall like confetti,
On this eerie Halloween night.
This demonstrates knowledge of what Halloween is (costumes, trick or treating, pumpkins, eerie) and snowfall (a blanket of white, flakes like confetti). It even created a poetic structure with a rhyming pattern, and did it all in seconds.
The arrival of ChatGPT and DALL-E have caused enormous waves (see articles from Harvard Business Review and New York Times for examples), because they seem such leaps and bounds beyond what we’ve known. Suddenly, these complicated systems are spitting out original text, artwork, photography, and videos, all from simple suggestions.
How does it do it?
All intelligence begins with learning, and one big reason we’re seeing this surge in generative AI now is the incredible volume of data that are available for use. Via the internet, you can find the vast troves of text, video, audio, and images you’d need, tied to text cues to help give meaning. (For a sense of scale of what’s available, every day there are 95 million new photos uploaded to Instagram and more than 720,000 hours of content added to YouTube.)
Data used for teaching an AI is broken down to find structures and patterns, such as by grouping clusters of words together by meaning and context, or image representations across various dimensions. For a computer’s use, it must all be reduced to numerical assignments, and user requests are likewise taken in and converted. From here it’s easier to see how an AI can make connections between requests and comparable outputs: taking in our words, and through clusters, finding similar words, or affiliated images it’s clustered to pass back.
Still, it’s a gigantic leap from that to knowing what is meant by Halloween, winter, and poetry well enough to create the poem above.
When we have to answer a question, we don’t know the answer to, we may have to just guess. This is also true with AIs: they begin by guessing. Like an infant’s first learnings, their initial outputs may be nowhere near the kind of success we expect to see from generative AIs.
This is where training comes into play. By taking feedback on which of its guesses are successful and which are not, adjustments can be made within the AI to improve its guess each time. This is a part of deep learning, a kind of machine learning.
You’ve maybe heard that AIs use neural networks, modeled on the human brain. These networks are made up of individual neurons (sometimes called perceptrons or nodes) linked together across numerous layers. Beginning from the initial inputs, these neurons are connected to larger groups of neurons as they progress through so-called hidden layers (the black box core), and back out to fewer neurons again, providing the desired outputs. At their most basic, these neurons contain a numerical value between 0–1, and their connections (as in graphs) typically also have value, or a weight, for each connection.
These arrangements help break down problems into tiny pieces (such as representations of a word for language, or individual pixels for images). They also note context (such as word order, and proximity). Generative AIs make use of neural network architectures (such as the influential transformer model, or the T in GPT), to process as much of the request, and the generated content they’re building, as they can at once, in parallel, speeding up the process to add to miraculous appearance of creative power.
Though generative AIs often originate with small models (weaker AIs), such as image generators that only create faces, for example, they’ve grown into vast (stronger, deeper) models, working across a full spectrum of possible images or forms of text output that can be generated from a single input.
Conclusion
The speed of generative AI’s explosion on the scene is both exciting and terrifying, and there are certainly risks it can pose, as considered in this PTP piece. A November 8 hack of ChatGPT showed that generative AI is an appealing target, and clearly not above the same security vulnerabilities as other technologies. Sensitive personal data that’s been shared online could also be included in the data that fuels a generative AI, further increasing the need to safeguard an individual or organization’s data that’s exposed online.
No matter how you view it, one thing is for sure: generative AI is here to stay. Given its undeniable impact on the world now and to come, it benefits all of us to learn as much as we can about how it works.