HelloAI
L2 Chapter 1 🐣 🕒 11 min

Supervised / Unsupervised / Reinforcement: The Three Worldviews of ML

Every ML algorithm belongs to one of these three. Understand the taxonomy, and any new algorithm finds its home.

H
HelloAI Editors
6/24/2026

L1’s math block is done. Congrats—you have the “internal kung fu” of ML.

Now we walk L2—classical machine learning.

L2 isn’t as “sexy” as L3, L4 (no Transformers, no ChatGPT). But every algorithm here is what industry actually uses. An ML engineer who knows only deep learning misses 60% of real business problems.

L2 article 1: classify all algorithms into three worldviews.

The Three Worldviews

ML is split by “data shape” into 3 major categories:

              ┌── Supervised (you have labels)

ML ──────────┼── Unsupervised (no labels)

              └── Reinforcement (only reward signal)

Each corresponds to a world assumption—what your data looks like, what you’re training for.

Supervised Learning: With a “Teacher”

Data form: every sample has a label.

(x, y)
(image, "cat" / "dog")
(email text, "spam" / "normal")
(house features, price)
(patient metrics, "diagnosed" / "not")

Goal: learn a function f(x)yf(x) \to y.

Most widely-used paradigm. 80% of ML industry applications are supervised.

Two Sub-Types

By type of yy:

TypeWhat is yyExamplesTypical algorithms
ClassificationDiscrete categorySpam or not, what diseaseLogistic regression, decision tree, SVM, random forest, neural networks
RegressionContinuous numberHouse price, sales, temperatureLinear regression, decision tree, XGBoost, neural networks

Notice: many algorithms do both (decision tree, neural network, XGBoost). Difference is loss function + output layer.

The “Soul” of Supervised Learning

Success/failure = label quality.

  • Wrong labels → wrong model
  • Few labels → poor generalization
  • Expensive labels → expensive project

Real pain: a medical CT scan label requires 10 minutes of specialist time + 30.10,000labels=30. 10,000 labels = 300,000—often more than the compute cost.

L4 covers semi-supervised and self-supervised learning—designed to dodge the “labels are expensive” problem.

Unsupervised Learning: No “Teacher”, Figure It Out

Data form: only xx, no yy.

[A pile of points (no labels)]
[A pile of user behavior logs (no labels)]
[A pile of articles (no topics)]

Goal: discover internal structure.

Three Main Tasks

1. Clustering

Group similar samples together.

E.g., an e-commerce site has 1M users—auto-group into 5 segments:

  • “High-frequency high-value”
  • “Low-frequency high-value”
  • “High-frequency low-value”
  • “New users”
  • “Churned users”

No labels needed—algorithm finds the structure. Typical algorithms: K-Means, DBSCAN, hierarchical clustering.

2. Dimensionality Reduction

Squeeze a 768-dim vector into 2D for visualization.

Or: compress 100 features to 20 with minimal info loss, making downstream models faster and more stable.

Typical algorithms: PCA, t-SNE, UMAP, autoencoders.

The Embedding visualization is a t-SNE example—high-dim word vectors squeezed to 2D.

3. Anomaly Detection

Identify “looks unlike most” samples.

E.g., credit card fraud, machine failures, network intrusion—anomalies are always rare.

Typical algorithms: Isolation Forest, One-Class SVM, autoencoder reconstruction error.

💡 The pain of unsupervised

Unsupervised’s hardest problem isn’t the algorithm—it’s knowing whether you did well.

Supervised has labels to compare; unsupervised doesn’t. So evaluation often depends on “is it useful for business”—very subjective.

Reinforcement Learning: Learn from Rewards

Data form: agent acts in environment, gets reward or penalty per action.

Environment (chess board / game / robot world)
     ↓ observation
Agent → select action → reward + new observation
     ↑___________________________|

Goal: learn a “policy” π(actionstate)\pi(\text{action} | \text{state}) that maximizes cumulative long-term reward.

Examples

  • Chess: state = current board, action = next move, reward = win/loss
  • Robot: state = camera frame, action = joint angles, reward = task completion +1
  • Recommender systems: state = user history, action = what to recommend, reward = click/buy
  • AlphaGo: hybrid of supervised (human game records) + RL (self-play)

Why It’s Hard

DimensionSupervisedReinforcement
FeedbackImmediate (label per sample)Delayed (50 chess moves later you find out)
DataCollected staticallyGenerated by agent actions
Explore vs ExploitDoesn’t existCore problem
Training stabilityStableVery unstable

Reality: classical RL isn’t used much in industry—outside games and robotics, most companies don’t need it.

But RLHF is the exception: ChatGPT, Claude, etc.—their training pipelines all have RL (learn from human feedback). That’s RL’s biggest industrial application.

Cross-Paradigm “Hybrids”

In practice, pure supervised / pure unsupervised projects are rare—hybrids are most common:

Semi-Supervised

Small labels + large unlabeled data. Pre-train on unlabeled, fine-tune on labeled.

LLM training perfectly illustrates this: self-supervised pre-training on internet text, then small human labels for RLHF fine-tuning.

Self-Supervised

Data constructs its own labels. E.g., mask random words and have model guess—this is BERT (masked language modeling).

LLMs, CLIP, SAM and all current foundation models are self-supervised. This is the core reason AI’s second boom—it dodged the “labels are expensive” problem.

Transfer Learning

Pre-train on one task, fine-tune on a related one.

Take BERT pre-trained on Wikipedia, transfer to your company’s email classification—just 1000 labeled samples and it works. Without this, 95% of today’s AI apps don’t run.

How to Choose

Given a new business problem, ask 4 questions:

QuestionAnswer → Path
Do I have labels?Yes → supervised / No → unsupervised or self-supervised
Is yy a category or a number?Category → classification / Number → regression
Does it involve “decisions in environment”?Yes → reinforcement
Are labels very expensive?Yes → self-supervised / semi-supervised / transfer

Answers tell you the rough class of algorithms.

L2 Path Overview

We’ll go deep on the most important classical algorithms:

ArticleTopic
L2-02Linear Regression
L2-03Logistic Regression & Classification
L2-04Decision Trees
L2-05Random Forest + Ensemble Learning
L2-06K-Means Clustering
L2-07Evaluation + Overfitting + Regularization
L2-08SVM
L2-09Optimizers (already written)
L2-10Feature Engineering
L2-11End-to-end ML project (Kaggle hands-on)

By L2’s end, you can solve tons of real business problems with scikit-learn—without any deep learning.

🔬 Your L2 capabilities
  • Solve Kaggle entry-level (Titanic, House Prices)
  • Explain “why we should use XGBoost not neural net here” to colleagues
  • Read 90% of sklearn docs
  • Use ML in real work (not just call ChatGPT API)

Next: “Linear Regression: The Simplest and Most Profound ML Model”