AI appears to have just been invented, looking at the buzz on socials and media. However, a keen eye would point out IBM’s Deep Blue, for example, which defeated the world chess champion Garry Kasparov in 1997. So what is AI?
Strong Versus Weak AI
Broadly classified, AI is strong or weak. Before the advent of generative AI models (strong), there was weak AI, designed to perform specific tasks without possessing generalized reasoning abilities. Strong AI, the kind that has the world talking, learns, reasons, perceives, and understands, much like a human brain.
A weak AI is the chess engine; it references a list of finite movements on a chess board, and has a predictive algorithm to help it come up with moves in new positions. Give an AI chess engine a game of checkers, however, and it might crash completely.
A strong AI is ChatGPT. You could ask it to play chess, and checkers — even at the same time. Afterwords, you could tell it to write up the game synopsis in the style of Abraham Lincoln.
Merriam-Webster Abridged
There are likely hundreds of new terms coined or suddenly relevant within the last year, but the following list enumerates the most common and useful terms. If a word is bolded, I define it in this blog. If a definition uses a word you don’t know, come back to it after you’ve read further, and it should make more sense.
OpenAI
The company behind ChatGPT, and DALLE, most famously.
LLM
Large Language Model. ChatGPT, Card, and Claude are LLMs. This is the primary expression of generative AI (strong AI). You may hear the following terms as well, when discussing LLMs specifically.
Input
This is the text submitted to the chatbot. Any service you use may also be putting their own text into any submission you input.
Output
This is the text outputted by the chatbot.
Prompting
The act of crafting an input to evoke a certain response. If “write me an email to my boss that says thanks” is too colloquial, “write me a professional email…” might produce better results, and would be an act of prompting.
Token
A token is the smallest unit of information an input or output can be broken down into. Check out this tokenizer for a cool visual on how this works. Roughly speaking, a token is equal to 4-5 English characters.
Context Window
The number of tokens an LLM is capable of intaking before it no longer remembers previous tokens.
Multimodal
This refers to what kind of input and output an LLM can interact with. A chatbot that can only input text and output text is unimodal. ChatGPT-4 is soon to be capable of image input and output, while conversing through text about it, making it multimodal.
Generative AI
Refers to the type of AI behind LLMs, and more broadly refers to the ability for machines to be predictive and not static. Companies like Runaway use generative AI to make videos from images and prompts, in the same way OpenAI’s ChatGPT makes words and code from prompts.
ChatGPT
ChatGPT is an LLM, the most capable modal being ChatGPT-4, which is a multimodal LLM with a 32,000 token context window.
DALLE
DALLE is an image generation tool, taking input text and outputting images.
Copilot
On its own, you need the situation’s context to figure out what it refers to. Microsoft has released Windows Copilot, Github Copilot, Sales Copilot, and more, so it’s typically safe to assume it refers to one of those products. Generically, it represents the idea of generative AI working alongside the user, and not replacing them.
From Hardware to ChatGPT
Why Is AI Costly?
When ChatGPT released, it became obvious that AI would be extremely useful if applied correctly. However, it requires a special computing setup to develop advanced AI models like ChatGPT — and they are expensive. Why do we need such setups?
Generative AI models utilize embeddings to perform their human-like abilities. Embeddings turn raw data (text, images, sound) into vectors.
Vectors are math. Graphics cards are really good at math. Graphics cards are expensive. But why vectors?
What are Embeddings?
Embeddings allow a chatbot to take in what you say and compare it to the dataset it was trained on to determine how ‘mathematically close’ your words are to words in its dataset. It does this across a lot of possible similarities. Imagine you ask a computer scientist and a Taylor swift fan how close the words “red” and “Taylor” are. You’ll get two different answers, and the vector will store them both.
Translating text into a vector, comparing two vectors, and then generating a response vector all require graphical computation.
What is “developing” AI?
Developing AI is the process of teaching it the correlations between data, e.g. man and guy are closer than man and woman, a photo of a bird is better described as ‘animal’ than ‘mountain’.
To generate embeddings, models are trained on massive text corpora. The models self-organize embeddings purely based on contextual similarity, and not manually labels declaring which words are related. This means you need lots of words for them to naturally perceive the statistical correlation.
For example, GPT-3 trained on trillions of words.
Fine-tuning then adapts these embeddings to specific tasks, which is where ‘labeled data’, or data that says “that’s a good response, this is a less good response” comes into play.
Closing
This is Part 1 of my Learn AI series! There’s lots to cover, and I’m excited to showcase code, use cases, and user-friendly explanations of everything gen-AI. If you want to get into AI from a practical, hands on and knowledge-driven approach, check out my other writing and follow this series.