Turn abstract concepts into toys you can play with

Tokenizer Playground

See how GPT/Claude/Llama tokenize your text differently — understand LLM API pricing.

🐣 🕒 5 min

🧠 Deep learning

Attention in real time

Type a sentence, watch 4 attention head patterns: neighbor, global, similarity, syntax.

🐣 🕒 10 min

📐 Math/optim

Gradient descent climbers

Click anywhere on the map—watch SGD, Momentum, Adam race down the hill.

🐣 🕒 12 min

Embedding space walk

50 words in 2D space. Play with word arithmetic: king − man + woman = ?

🐣 🕒 7 min

🧠 Deep learning

CNN convolution scan

Drag a kernel across an image, see different kernels extract edges, blur, sharpen.

🥚 🕒 8 min

LLM sampling

Adjust temperature / top-k / top-p, see candidate probabilities, draw the next word.

🐣 🕒 6 min

🎨 Multimodal

Diffusion denoising

From pure noise to image in 50 steps. The core idea behind Stable Diffusion / Sora.

🐥 🕒 10 min

RAG pipeline

Type a question, watch chunking → embedding → retrieval → rerank → generation animate.

🐣 🕒 12 min

KV Cache

How much does KV Cache speed up generation? Side-by-side O(n²) vs O(n).

🐥 🕒 8 min

Pipeline parallelism

4 GPUs training a huge model. Compare Naive / GPipe / 1F1B bubble efficiency.

🐥 🕒 10 min

MoE routing

8 experts working together — each token picks top 2. Click tokens to see routing.

Speculative decoding

Small model guesses + big model verifies = 2-4× faster. How acceptance rate affects total time.

🐥 🕒 8 min

Beam search

Greedy / Beam k=2 / Beam k=4 / Sampling — four decoding strategies side by side.

Knowledge distillation

How does a big model "teach" a small one? Tune temperature to see soft vs hard labels.

Vocabulary map

What's in GPT-4's vocab? 300 tokens clustered by category + frequency — English, Chinese, code, weird tokens.

🧠 Deep learning

Backpropagation

A 2-layer network forward + backward, 5 steps showing how chain rule propagates loss gradient back to each parameter.

🎨 Multimodal

GAN training

Generator vs discriminator training in real time — watch red (generated) points converge toward blue (real). 4 distributions to try.

🐥 🕒 10 min

🛡️ AI safety

RLHF reward signal

4 prompts × 2 candidate answers × reward model scores. See how human preferences become training signal.