Overfitting vs Underfitting Explorer

TriWei AI Lab

Overfitting vs Underfitting Explorer

Stress-test polynomial models, compare train and test behavior, then control complexity with regularization.

Bias-variance tradeoff Model capacity

Illustration of decision and model boundary regions.

How to play + what to look for

Goal: see how polynomial degree controls underfitting (too simple) vs overfitting (too wiggly).
Use Degree (1–12). Compare train loss (dashed) vs test loss (solid) across degrees.
Increase L2 λ to penalize large weights and reduce overfitting.
Keyboard: F=Fit, X=Regenerate, G=Gradient check.

This demo uses gradient descent, not a closed-form solver. Real workflows tune λ via validation.

Learning objectives

Concept focus: explore how model complexity and regularization affect underfitting and overfitting.
Core definition: polynomial regression approximates functions with basis functions \(\phi_j(x)=x^j/j!\) and is trained using gradient descent on MSE plus L2 penalty.
Common mistake: increasing the degree without enough data leads to wildly oscillating curves that do not generalize.
Why it matters: understanding the bias–variance tradeoff and regularization is crucial for building generalizable models.
Toy disclaimer: the factorial scaling and small sample sizes are for stability and visualization; real models use feature scaling and cross-validation.

Generate a noisy dataset, fit a polynomial, and watch what happens to train vs test loss as you change model complexity. This is a toy setting. Use it to build intuition, not to predict the universe.

Degree (d): 3 L2 λ: 0.00 Penalty: Train/Test split: 0.80

Learning rate (η): 0.050 Steps:

Fit

Train MSE: —

Test MSE: —

Train vs Test loss across degree

This curve often shows the classic “overfitting” pattern: training error drops with more capacity, but test error can rise after a point.

Math + Sources

We model \(\hat y(x)=\sum_{j=0}^{d} w_j \, \phi_j(x)\) with \(\phi_j(x)=x^j/j!\) for stability. Loss: \(L=\frac{1}{n}\sum_i(\hat y_i-y_i)^2 + \lambda\sum_j w_j^2\). Gradient: \(\partial L/\partial w_j = \frac{2}{n}\sum_i(\hat y_i-y_i)\phi_j(x_i) + 2\lambda w_j\).

Under/overfitting intuition and polynomial example: Dive into Deep Learning – underfit/overfit. D2L also notes scaling monomials by \(1/j!\) to avoid huge values for large degrees.

Collaboration Credits

These interactive labs are the result of a close collaboration between a human author and an AI assistant (ChatGPT). The AI contributed algorithmic refinements, numerical safeguards and visual improvements, while the human designed the pedagogical structure, reviewed all code, and ensured educational accuracy. Mathematical formulas and derivations are referenced to reputable course notes and textbooks. All code runs entirely in the browser; no data is sent to any server.