Supervised / Unsupervised / Reinforcement: The Three Worldviews of ML
Every ML algorithm belongs to one of these three. Understand the taxonomy, and any new algorithm finds its home.
L1’s math block is done. Congrats—you have the “internal kung fu” of ML.
Now we walk L2—classical machine learning.
L2 isn’t as “sexy” as L3, L4 (no Transformers, no ChatGPT). But every algorithm here is what industry actually uses. An ML engineer who knows only deep learning misses 60% of real business problems.
L2 article 1: classify all algorithms into three worldviews.
The Three Worldviews
ML is split by “data shape” into 3 major categories:
┌── Supervised (you have labels)
│
ML ──────────┼── Unsupervised (no labels)
│
└── Reinforcement (only reward signal)
Each corresponds to a world assumption—what your data looks like, what you’re training for.
Supervised Learning: With a “Teacher”
Data form: every sample has a label.
(x, y)
(image, "cat" / "dog")
(email text, "spam" / "normal")
(house features, price)
(patient metrics, "diagnosed" / "not")
Goal: learn a function .
Most widely-used paradigm. 80% of ML industry applications are supervised.
Two Sub-Types
By type of :
| Type | What is | Examples | Typical algorithms |
|---|---|---|---|
| Classification | Discrete category | Spam or not, what disease | Logistic regression, decision tree, SVM, random forest, neural networks |
| Regression | Continuous number | House price, sales, temperature | Linear regression, decision tree, XGBoost, neural networks |
Notice: many algorithms do both (decision tree, neural network, XGBoost). Difference is loss function + output layer.
The “Soul” of Supervised Learning
Success/failure = label quality.
- Wrong labels → wrong model
- Few labels → poor generalization
- Expensive labels → expensive project
Real pain: a medical CT scan label requires 10 minutes of specialist time + 300,000—often more than the compute cost.
L4 covers semi-supervised and self-supervised learning—designed to dodge the “labels are expensive” problem.
Unsupervised Learning: No “Teacher”, Figure It Out
Data form: only , no .
[A pile of points (no labels)]
[A pile of user behavior logs (no labels)]
[A pile of articles (no topics)]
Goal: discover internal structure.
Three Main Tasks
1. Clustering
Group similar samples together.
E.g., an e-commerce site has 1M users—auto-group into 5 segments:
- “High-frequency high-value”
- “Low-frequency high-value”
- “High-frequency low-value”
- “New users”
- “Churned users”
No labels needed—algorithm finds the structure. Typical algorithms: K-Means, DBSCAN, hierarchical clustering.
2. Dimensionality Reduction
Squeeze a 768-dim vector into 2D for visualization.
Or: compress 100 features to 20 with minimal info loss, making downstream models faster and more stable.
Typical algorithms: PCA, t-SNE, UMAP, autoencoders.
The Embedding visualization is a t-SNE example—high-dim word vectors squeezed to 2D.
3. Anomaly Detection
Identify “looks unlike most” samples.
E.g., credit card fraud, machine failures, network intrusion—anomalies are always rare.
Typical algorithms: Isolation Forest, One-Class SVM, autoencoder reconstruction error.
Unsupervised’s hardest problem isn’t the algorithm—it’s knowing whether you did well.
Supervised has labels to compare; unsupervised doesn’t. So evaluation often depends on “is it useful for business”—very subjective.
Reinforcement Learning: Learn from Rewards
Data form: agent acts in environment, gets reward or penalty per action.
Environment (chess board / game / robot world)
↓ observation
Agent → select action → reward + new observation
↑___________________________|
Goal: learn a “policy” that maximizes cumulative long-term reward.
Examples
- Chess: state = current board, action = next move, reward = win/loss
- Robot: state = camera frame, action = joint angles, reward = task completion +1
- Recommender systems: state = user history, action = what to recommend, reward = click/buy
- AlphaGo: hybrid of supervised (human game records) + RL (self-play)
Why It’s Hard
| Dimension | Supervised | Reinforcement |
|---|---|---|
| Feedback | Immediate (label per sample) | Delayed (50 chess moves later you find out) |
| Data | Collected statically | Generated by agent actions |
| Explore vs Exploit | Doesn’t exist | Core problem |
| Training stability | Stable | Very unstable |
Reality: classical RL isn’t used much in industry—outside games and robotics, most companies don’t need it.
But RLHF is the exception: ChatGPT, Claude, etc.—their training pipelines all have RL (learn from human feedback). That’s RL’s biggest industrial application.
Cross-Paradigm “Hybrids”
In practice, pure supervised / pure unsupervised projects are rare—hybrids are most common:
Semi-Supervised
Small labels + large unlabeled data. Pre-train on unlabeled, fine-tune on labeled.
LLM training perfectly illustrates this: self-supervised pre-training on internet text, then small human labels for RLHF fine-tuning.
Self-Supervised
Data constructs its own labels. E.g., mask random words and have model guess—this is BERT (masked language modeling).
LLMs, CLIP, SAM and all current foundation models are self-supervised. This is the core reason AI’s second boom—it dodged the “labels are expensive” problem.
Transfer Learning
Pre-train on one task, fine-tune on a related one.
Take BERT pre-trained on Wikipedia, transfer to your company’s email classification—just 1000 labeled samples and it works. Without this, 95% of today’s AI apps don’t run.
How to Choose
Given a new business problem, ask 4 questions:
| Question | Answer → Path |
|---|---|
| Do I have labels? | Yes → supervised / No → unsupervised or self-supervised |
| Is a category or a number? | Category → classification / Number → regression |
| Does it involve “decisions in environment”? | Yes → reinforcement |
| Are labels very expensive? | Yes → self-supervised / semi-supervised / transfer |
Answers tell you the rough class of algorithms.
L2 Path Overview
We’ll go deep on the most important classical algorithms:
| Article | Topic |
|---|---|
| L2-02 | Linear Regression |
| L2-03 | Logistic Regression & Classification |
| L2-04 | Decision Trees |
| L2-05 | Random Forest + Ensemble Learning |
| L2-06 | K-Means Clustering |
| L2-07 | Evaluation + Overfitting + Regularization |
| L2-08 | SVM |
| L2-09 | Optimizers (already written) |
| L2-10 | Feature Engineering |
| L2-11 | End-to-end ML project (Kaggle hands-on) |
By L2’s end, you can solve tons of real business problems with scikit-learn—without any deep learning.
- Solve Kaggle entry-level (Titanic, House Prices)
- Explain “why we should use XGBoost not neural net here” to colleagues
- Read 90% of sklearn docs
- Use ML in real work (not just call ChatGPT API)
Next: “Linear Regression: The Simplest and Most Profound ML Model”