HelloAI
L0 Chapter 1 🥚 🕒 12 min

AI, ML, DL, LLM — How Are They Related?

The four most confused terms in tech today. After this article, no buzzword salad can fool you again.

H
HelloAI Editors
5/27/2026

If you’ve gotten interested in AI only recently, you’ve probably heard these words flying around: Artificial Intelligence, Machine Learning, Deep Learning, Neural Networks, Large Models, LLM, Generative AI, Transformer, ChatGPT

What are they, really?

Ask three different people and you’ll get three conflicting answers. Some say “AI is machine learning”; some say “deep learning is just a kind of ML”; some say “large models aren’t AI, they’re LLMs”. You hear enough of this and you give up and decide it’s just industry jargon you’ll never understand.

Don’t give up. The terminology is actually crystal clear—it’s just that everyone uses it loosely. In 12 minutes, you’ll have it all sorted out. It’s a Russian nesting doll.

💡 The TL;DR

Artificial Intelligence (AI) ⊃ Machine Learning (ML) ⊃ Deep Learning (DL) ⊃ Large Models (incl. LLMs). Each layer is an “implementation” of the layer outside it.

1. Artificial Intelligence — the outermost ambition

The word “Artificial Intelligence” was coined in 1956 at the Dartmouth Workshop. From day one, it was a goal, not a technique:

“Make machines exhibit some form of ‘intelligent’ behavior.”

Notice it never specifies how.

So historically, “AI” has been an umbrella for wildly different methods:

  • Expert systems (1960s)—humans write rules by hand
  • Symbolic reasoning (1970s)—logic engines that “think”
  • Search algorithms (1980s)—exhaustive game tree exploration (Deep Blue)
  • Machine learning (1990s+)—let machines figure rules out from data

So the bottom line is: anything that makes a machine “appear intelligent” is AI. A calculator computing 1+1 isn’t (no apparent intelligence). AlphaGo is. ChatGPT obviously is.

AI is an aspirational word. It tells you what we want; it doesn’t tell you how to do it.

2. Machine Learning — letting machines figure it out

By the 1980s–90s, researchers realized writing rules by hand doesn’t scale.

Example. Want a computer to recognize whether an image contains a cat? Old approach:

# Traditional rule-based approach
def is_cat(image):
    if has_pointy_ears(image) and has_whiskers(image) and is_furry(image):
        return True
    # ... but what about long-haired cats? hairless cats? cats from behind?
    # ... you can never enumerate all cases.

Dead end. So someone said: forget rules. Show the computer a million pre-labeled cat/not-cat images. Let it figure out the patterns itself.

That’s Machine Learning (ML).

The core triad:

  1. Data—thousands of labeled samples (input, label)
  2. Model—a function f(input) → output with tunable parameters, initially random
  3. Training—repeatedly nudge the parameters to reduce the function’s mistakes on the training data
🔬 Why 'training' is such a good word

It’s exactly like training a dog: you don’t explain “the rules” of fetch, you give feedback (right/wrong), and the dog’s internal parameters gradually shift in the right direction.

ML had its golden age in the 1990s–2000s. It includes things like:

  • Linear regression—fit a line
  • Decision trees—a chain of yes/no questions
  • K-nearest neighbors—classify by neighborhood
  • Support Vector Machines (SVM)—find the best dividing hyperplane
  • Random forests—ensemble of decision trees

All of these are ML. None of them are deep learning.

3. Deep Learning — the heavyweight branch of ML

Around 2010, researchers noticed that one specific ML method (neural networks, originally proposed in 1958) had been underperforming for decades—but only because the data and compute weren’t ready.

When GPUs became cheap and image datasets became huge, neural networks exploded.

The “deep” doesn’t mean “philosophical”—it means many layers.

A traditional neural net might have 1–2 layers. Deep learning stacks them to tens, hundreds, even thousands of layers. Each layer transforms its input once. More layers = more complex patterns the model can learn.

2012 — the AlexNet moment

In 2012, Geoffrey Hinton’s team at University of Toronto entered an 8-layer deep network called AlexNet into the ImageNet competition. Error rate dropped from 25.8% (previous year’s winner) to 16.4%.

The deep-learning era had begun. Within a few years, it swept:

  • Image recognition (CNNs, 2012)
  • Speech recognition (RNNs, 2013–2015)
  • Go (AlphaGo, 2016)
  • Translation (Seq2Seq + Attention, 2014–2017)
💡 Why deep learning is so powerful

Traditional ML needs humans to design features by hand (e.g., to recognize a cat, first tell the model to “look at ear shape”). Deep learning learns the features automatically—from raw pixels it discovers “edges → eyes → cat faces” without being told.

So: deep learning is a subset of machine learning, specifically the part that uses multi-layer neural networks. Other ML methods (regression, decision trees, etc.) still exist, but they’ve been overshadowed.

4. Large Models & LLMs — the giant branch of DL

In 2017, Google researchers published a paper with a brash title:

“Attention Is All You Need”

It introduced a new neural network architecture: the Transformer. Originally designed for translation, but researchers soon found something extraordinary:

As long as you give Transformers enough data and enough parameters, their capabilities keep improving—with no ceiling in sight.

This insight changed everything.

  • 2018: OpenAI’s GPT-1 (117M parameters)
  • 2019: GPT-2 (1.5B parameters)
  • 2020: GPT-3 (175B parameters)
  • 2022: ChatGPT launches
  • 2024–2026: GPT-4, Claude, Gemini, Llama… ever larger

Neural networks at this scale—hundreds of billions of parameters—are collectively called Large Models.

LLM stands for Large Language Model—large models that handle text. GPT, Claude, Llama are LLMs. Image-handling large models are sometimes called LVMs. Models that handle multiple modalities are multimodal large models.

⚠️ A common confusion

“Large model” is a size descriptor, not an architecture. A large model is always deep learning, almost always Transformer-based, and has enough parameters (typically 1B+) to qualify as “large”.

5. Putting it all together

You can now draw the containment diagram:

┌───────────────────────────────────────────────┐
│ Artificial Intelligence (AI) — coined 1956    │
│                                                │
│  ┌─────────────────────────────────────────┐  │
│  │ Machine Learning (ML)                    │  │
│  │                                          │  │
│  │  ┌────────────────────────────────────┐ │  │
│  │  │ Deep Learning (DL)                  │ │  │
│  │  │                                     │ │  │
│  │  │  ┌──────────────────────────────┐  │ │  │
│  │  │  │ Large Models                  │  │ │  │
│  │  │  │   ├── LLM (text)              │  │ │  │
│  │  │  │   ├── LVM (vision)            │  │ │  │
│  │  │  │   └── Multimodal              │  │ │  │
│  │  │  └──────────────────────────────┘  │ │  │
│  │  └────────────────────────────────────┘ │  │
│  │  Also: decision trees, SVM, RF, ...      │  │
│  └─────────────────────────────────────────┘  │
│  Also: expert systems, search, rules, ...      │
└───────────────────────────────────────────────┘

Quick check—you should now be able to answer:

Q1: Is ChatGPT AI? ML? DL? LLM? A: Yes to all! It lives in the innermost box, so it inherits all the outer labels.

Q2: Is a decision tree AI? Is it DL? A: It’s AI and ML, but not DL—it’s not a deep neural network.

Q3: Is a hand-coded chatbot AI? A: It’s AI (an expert system), but not ML—it doesn’t learn from data.

6. Why people get confused

The words are misused:

  • Media says “AI is here”—they almost always mean LLMs
  • VCs say “we invest in AI companies”—they almost always mean deep learning startups
  • A product says “powered by AI”—it might just wrap OpenAI’s API

This is unlikely to change. But in your head, you should translate: when an outsider says “AI” they mean LLM; when an engineer says “AI” they might mean ML or DL; when a researcher says “AI” they probably mean it in the 1956 sense.

You’ll be doing a lot of silent translation in conversations.

7. What should you learn next?

If you came here to “understand how ChatGPT works”—

Congratulations, you’ve just done one lap around the AI knowledge map. From here you can:

  • Finish L0—read the remaining 11 articles for user-facing understanding
  • Jump to L4—dive straight into how LLMs work
  • Walk L1–L3—build mathematical foundations from scratch

Personal suggestion: finish L0 first. It needs no prerequisites and gives you the full map.

📝 A small ask

If this article helped, please share with a friend who’s confused. HelloAI is free, ad-free, and forever open-source—a share is the best support.

Next up: “AI History in One Sitting: From Turing to ChatGPT”