Interactions and nonlinear terms

Part 4 — Linear regression, done seriously

Learning objectives

Distinguish a main effect from an interaction effect
Add and interpret interaction terms x_1·x_2 in a linear-in-parameters model
Add polynomial and basis-function terms for nonlinear responses
Apply splines for smooth flexible relationships
Avoid common interaction-modelling pitfalls: not centring covariates, ignoring main effects, misinterpreting coefficients

The linear-regression machinery of §§4.1–4.5 was nominally linear — but "linear" means "linear in the PARAMETERS", not in the original covariates. By adding INTERACTION TERMS (products of covariates), POLYNOMIAL TERMS (powers), and BASIS FUNCTIONS (splines, indicators), we can fit curved + multiplicative relationships while keeping the OLS / inference machinery intact. §4.6 develops these extensions and the standard interpretive pitfalls.

Interaction effects

An interaction is when the effect of one covariate DEPENDS on the value of another. The classic example: drug efficacy may depend on age. Model:

Y = \beta_0 + \beta_1 \text{drug} + \beta_2 \text{age} + \beta_3 (\text{drug} \times \text{age}) + \varepsilon.

The MAIN EFFECTS $\beta_1, \beta_2$ tell you the effect of each variable when the OTHER is zero. The INTERACTION $\beta_3$ tells you HOW MUCH the drug effect changes per unit of age.

Centring covariates makes main effects interpretable

Without centring, $\beta_1$ is "the drug effect AT AGE = 0" — which is often meaningless. CENTRE the covariates (subtract the mean) before adding interactions, and $\beta_1$ becomes "the drug effect AT THE MEAN AGE" — interpretable.

Polynomial terms

If Y has a nonlinear relationship with x, add $x^2, x^3$ etc.:

Y = \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + \varepsilon.

Still linear-in-parameters — OLS works unchanged. Caveats:

Polynomial extrapolation EXPLODES outside the data range. Restrict predictions to the data domain.
High-degree polynomials are unstable. Use orthogonal polynomials (e.g., Legendre) or SPLINES for stability.
The first quadratic term $\beta_2 x^2$ is what makes a curve; degree 3+ is rarely needed.

Splines: piecewise polynomials

Better than high-degree polynomials: cubic SPLINES. Divide the x range into intervals at KNOTS $k_1 < k_2 < \ldots$ ; fit cubic polynomials within each interval; constrain them to be CONTINUOUS and SMOOTH at the knots.

Practical: B-spline basis functions are pre-computed; you regress Y on these basis functions like ordinary regressors. R: splines::bs(x, df=5); Python: patsy.dmatrix("bs(x, df=5)", data).

Pitfalls to avoid

Interaction without main effect: never fit $x_1 x_2$ without including $x_1$ and $x_2$ separately. The interaction's coefficient is meaningless without them.
Forgetting to centre: the main effects are evaluated at covariate = 0, which is often outside the data. Centre.
Misinterpreting "no main effect, only interaction": this is rarely meaningful; usually means the main effect is non-significant given the interaction is in the model, not that there's no main effect.
Extrapolation: polynomial and spline fits diverge outside the data range. Restrict predictions.

Try it

Set interaction term to its maximum positive. Watch the surface tilt: the slope of Y in x_1 NOW DEPENDS on x_2. Drug efficacy increases with age — that's a positive x_1·x_2 interaction.
Now set interaction to ZERO. The surface is a tilted PLANE — no curvature, no twist. This is the "no interaction" main-effects-only case. Compare residual sum of squares with vs without interaction at the bottom of the widget.
Toggle "centre covariates" on and off. Note: the β̂_int (interaction coefficient) is THE SAME either way. But β̂_1 and β̂_2 (main effects) change dramatically. With centring, β̂_1 is "the effect of x_1 at the mean of x_2" — interpretable. Without, it's "the effect of x_1 at x_2 = 0" — usually meaningless.
Set degree = 3 on the polynomial slider with a small dataset. Watch the fit start oscillating wildly at the extremes — polynomial extrapolation is unstable. Now switch to splines with similar df: the fit is much smoother because splines DO NOT extrapolate beyond their last knot.
Try removing the main effect for x_1 while keeping the interaction x_1·x_2 in the model. The fit becomes UNINTERPRETABLE — you cannot describe what the interaction means without anchors. ALWAYS keep main effects when including interactions ("marginality principle").

Your colleague reports: "I included age × treatment interaction in my model. The main effect of treatment has p = 0.6, but the interaction has p = 0.01. The interaction is real, so I'll drop the non-significant main effect to simplify the model." What is WRONG with their reasoning, and what would you advise?

What you now know

Linear regression accommodates interactions, polynomials, and splines while staying linear-in-parameters — all the OLS machinery (§§4.1–4.5) applies. Centring covariates makes main effects interpretable. Splines beat high-degree polynomials for flexibility + stability. §4.7 turns to MODEL SELECTION: how to choose among the many possible specifications (which interactions to include, which polynomial degree, how many knots).

References

Aiken, L.S., West, S.G. (1991). Multiple Regression: Testing and Interpreting Interactions. Sage. (The canonical reference for interaction-effect interpretation.)
Hastie, T., Tibshirani, R., Friedman, J. (2009). Elements of Statistical Learning, 2nd ed. Springer. (Chapter 5 covers splines and basis-function methods.)
Wood, S.N. (2017). Generalized Additive Models: An Introduction with R, 2nd ed. Chapman & Hall. (Modern treatment of splines and GAMs.)
Gelman, A., Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press. (Practical guidance on interactions in applied research.)