Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
Every model has two failure modes: underfitting (too simple, misses real patterns, high bias) and overfitting (too complex, memorizes noise, high variance). The bias-variance tradeoff describes how these trade off as complexity increases. A degree-1 polynomial underfits a curved signal; degree-15 overfits it; degree-3 is about right. This framework is behind every regularization technique, every dropout layer, every early-stopping criterion, and every cross-validation loop you'll write.
Polynomial degree is one of the cleanest knobs for dialing bias versus variance: degree-1 is too rigid to capture a sine curve (high bias), degree-15 passes through every noisy training point perfectly but generalises badly (high variance), and degree-3 sits in the productive middle. Plotting train and test MSE together as degree increases makes the famous U-shaped test error curve concrete and shows exactly where the model transitions from underfitting to overfitting.
test_size=0.7). Does overfitting get worse? This shows overfitting depends on data size, not just model complexity.LinearRegression() with Ridge(alpha=1.0) at degree=9. Compare train/test MSE gap with and without. Increase alpha to 10 and 100 to see underfitting emerge.Use these three in order. Each builds on the one before.
In one paragraph, explain overfitting in plain language: what does 'memorizing training data' mean, and why does 99% train accuracy with 70% test accuracy indicate a problem?
Walk me through the bias-variance decomposition: Expected MSE = Bias² + Variance + Irreducible noise. Define each term for the polynomial example. Why does increasing complexity reduce bias but increase variance?
My validation loss plateaus at epoch 12 while training loss keeps falling to epoch 50. Name three fixes (not 'get more data'), and for each: the mechanism, the hyperparameter to tune, and the sign that you've overcorrected into underfitting.
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
rng = np.random.default_rng(0)
X = np.sort(rng.uniform(0, 1, 50)).reshape(-1, 1)
y = np.sin(2 * np.pi * X.squeeze()) + rng.normal(0, 0.3, 50)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=0)
for degree in [1, 3, 9, 15]:
model = make_pipeline(PolynomialFeatures(degree), LinearRegression())
model.fit(X_train, y_train)
tr = mean_squared_error(y_train, model.predict(X_train))
te = mean_squared_error(y_test, model.predict(X_test))
tag = "underfit" if degree == 1 else "good" if degree == 3 else "overfit"
print(f"deg={degree:2d} train={tr:.3f} test={te:.3f} gap={te-tr:.3f} ({tag})")python3 main.py