L2 Chapter 1 🐣 🕒 11 min

Supervised / Unsupervised / Reinforcement: The Three Worldviews of ML

Every ML algorithm belongs to one of these three. Understand the taxonomy, and any new algorithm finds its home.

HelloAI Editors

6/24/2026

L1’s math block is done. Congrats—you have the “internal kung fu” of ML.

Now we walk L2—classical machine learning.

L2 isn’t as “sexy” as L3, L4 (no Transformers, no ChatGPT). But every algorithm here is what industry actually uses. An ML engineer who knows only deep learning misses 60% of real business problems.

L2 article 1: classify all algorithms into three worldviews.

The Three Worldviews

ML is split by “data shape” into 3 major categories:

              ┌── Supervised (you have labels)
              │
ML ──────────┼── Unsupervised (no labels)
              │
              └── Reinforcement (only reward signal)

Each corresponds to a world assumption—what your data looks like, what you’re training for.

Supervised Learning: With a “Teacher”

Data form: every sample has a label.

(x, y)
(image, "cat" / "dog")
(email text, "spam" / "normal")
(house features, price)
(patient metrics, "diagnosed" / "not")

Goal: learn a function $f(x) \to y$ .

Most widely-used paradigm. 80% of ML industry applications are supervised.

Two Sub-Types

By type of $y$ :

Type	What is $y$	Examples	Typical algorithms
Classification	Discrete category	Spam or not, what disease	Logistic regression, decision tree, SVM, random forest, neural networks
Regression	Continuous number	House price, sales, temperature	Linear regression, decision tree, XGBoost, neural networks

Notice: many algorithms do both (decision tree, neural network, XGBoost). Difference is loss function + output layer.

The “Soul” of Supervised Learning

Success/failure = label quality.

Wrong labels → wrong model
Few labels → poor generalization
Expensive labels → expensive project

Real pain: a medical CT scan label requires 10 minutes of specialist time + $30. 10,000 labels =$ 300,000—often more than the compute cost.

L4 covers semi-supervised and self-supervised learning—designed to dodge the “labels are expensive” problem.

Unsupervised Learning: No “Teacher”, Figure It Out

Data form: only $x$ , no $y$ .

[A pile of points (no labels)]
[A pile of user behavior logs (no labels)]
[A pile of articles (no topics)]

Goal: discover internal structure.

Three Main Tasks

1. Clustering

Group similar samples together.

E.g., an e-commerce site has 1M users—auto-group into 5 segments:

“High-frequency high-value”
“Low-frequency high-value”
“High-frequency low-value”
“New users”
“Churned users”

No labels needed—algorithm finds the structure. Typical algorithms: K-Means, DBSCAN, hierarchical clustering.

2. Dimensionality Reduction

Squeeze a 768-dim vector into 2D for visualization.

Or: compress 100 features to 20 with minimal info loss, making downstream models faster and more stable.

Typical algorithms: PCA, t-SNE, UMAP, autoencoders.

The Embedding visualization is a t-SNE example—high-dim word vectors squeezed to 2D.

3. Anomaly Detection

Identify “looks unlike most” samples.

E.g., credit card fraud, machine failures, network intrusion—anomalies are always rare.

Typical algorithms: Isolation Forest, One-Class SVM, autoencoder reconstruction error.

💡 The pain of unsupervised

Unsupervised’s hardest problem isn’t the algorithm—it’s knowing whether you did well.

Supervised has labels to compare; unsupervised doesn’t. So evaluation often depends on “is it useful for business”—very subjective.

Reinforcement Learning: Learn from Rewards

Data form: agent acts in environment, gets reward or penalty per action.

Environment (chess board / game / robot world)
     ↓ observation
Agent → select action → reward + new observation
     ↑___________________________|

Goal: learn a “policy” $\pi(\text{action} | \text{state})$ that maximizes cumulative long-term reward.

Examples

Chess: state = current board, action = next move, reward = win/loss
Robot: state = camera frame, action = joint angles, reward = task completion +1
Recommender systems: state = user history, action = what to recommend, reward = click/buy
AlphaGo: hybrid of supervised (human game records) + RL (self-play)

Why It’s Hard

Dimension	Supervised	Reinforcement
Feedback	Immediate (label per sample)	Delayed (50 chess moves later you find out)
Data	Collected statically	Generated by agent actions
Explore vs Exploit	Doesn’t exist	Core problem
Training stability	Stable	Very unstable

Reality: classical RL isn’t used much in industry—outside games and robotics, most companies don’t need it.

But RLHF is the exception: ChatGPT, Claude, etc.—their training pipelines all have RL (learn from human feedback). That’s RL’s biggest industrial application.

Cross-Paradigm “Hybrids”

In practice, pure supervised / pure unsupervised projects are rare—hybrids are most common:

Semi-Supervised

Small labels + large unlabeled data. Pre-train on unlabeled, fine-tune on labeled.

LLM training perfectly illustrates this: self-supervised pre-training on internet text, then small human labels for RLHF fine-tuning.

Self-Supervised

Data constructs its own labels. E.g., mask random words and have model guess—this is BERT (masked language modeling).

LLMs, CLIP, SAM and all current foundation models are self-supervised. This is the core reason AI’s second boom—it dodged the “labels are expensive” problem.

Transfer Learning

Pre-train on one task, fine-tune on a related one.

Take BERT pre-trained on Wikipedia, transfer to your company’s email classification—just 1000 labeled samples and it works. Without this, 95% of today’s AI apps don’t run.

How to Choose

Given a new business problem, ask 4 questions:

Question	Answer → Path
Do I have labels?	Yes → supervised / No → unsupervised or self-supervised
Is $y$ a category or a number?	Category → classification / Number → regression
Does it involve “decisions in environment”?	Yes → reinforcement
Are labels very expensive?	Yes → self-supervised / semi-supervised / transfer

Answers tell you the rough class of algorithms.

L2 Path Overview

We’ll go deep on the most important classical algorithms:

Article	Topic
L2-02	Linear Regression
L2-03	Logistic Regression & Classification
L2-04	Decision Trees
L2-05	Random Forest + Ensemble Learning
L2-06	K-Means Clustering
L2-07	Evaluation + Overfitting + Regularization
L2-08	SVM
L2-09	Optimizers (already written)
L2-10	Feature Engineering
L2-11	End-to-end ML project (Kaggle hands-on)

By L2’s end, you can solve tons of real business problems with scikit-learn—without any deep learning.

🔬 Your L2 capabilities

Solve Kaggle entry-level (Titanic, House Prices)
Explain “why we should use XGBoost not neural net here” to colleagues
Read 90% of sklearn docs
Use ML in real work (not just call ChatGPT API)

Next: “Linear Regression: The Simplest and Most Profound ML Model”