Nonlinear Regression: When a Straight Line Isn't Enough
OLS regression in its standard form draws a straight line through your data. But what if the true relationship curves? A drug that works better and better up to a certain dose, then becomes dangerous? Wages that rise with experience — but at a decreasing rate as you age? These aren't quirks. They're the norm in economics, medicine, and most real-world data.
The good news: you don't need a completely new method. You can extend ordinary OLS by adding squared terms or cross-products (interactions) to your model. The math stays the same — only the interpretation changes.
Part 1: Quadratic Terms — Modelling Curves
The Big Picture: Why Go Quadratic?
In a standard OLS model, the slope of $Y$ with respect to $X$ is always $\beta_1$ — a fixed constant. Every extra unit of $X$ has the exact same effect, no matter where you start. But if the true relationship curves, this constant slope is wrong everywhere.
Adding a quadratic term $X^2$ gives the slope a chance to vary as $X$ changes. A larger $X$ then shifts the slope up or down, allowing you to model diminishing returns, saturation effects, or U-shaped relationships.
$X_i$ = the predictor variable
$X_i^2$ = $X_i$ squared — you create this new column in your data and include it as a separate predictor
$\beta_0$ = intercept (value of $E(Y)$ when $X = 0$)
$\beta_1$ = linear part of the slope (effect when $X$ is near 0)
$\beta_2$ = curvature coefficient — determines whether the parabola opens upward or downward
$u_i$ = error term
Important: although the model contains $X^2$, it's still estimated by OLS without any modification. You simply treat $X$ and $X^2$ as two separate predictor columns. The linearity assumption in OLS refers to linearity in the parameters ($\beta$s), not in the variables.
How $\beta_2$ Shapes the Curve
The sign of $\beta_2$ tells you the shape of the relationship:
If $\beta_2 < 0$: the parabola opens downward — the effect of $X$ on $Y$ is initially positive but diminishes and eventually turns negative. Think of a worker's productivity peaking in mid-career, or a drug dose that helps at low levels but causes harm at high levels.
If $\beta_2 > 0$: the parabola opens upward — the effect is initially negative but becomes positive. Think of environmental costs that are negligible at low pollution levels but accelerate sharply at high ones.
The Marginal Effect: Slope Is No Longer Constant
In a standard linear model, the slope is always $\beta_1$. In a quadratic model, the slope depends on the current value of $X$. We calculate it by taking the derivative of $E(Y|X)$ with respect to $X$:
$\beta_1 + 2\beta_2 X$ = the slope at the point $X$ — this changes as $X$ changes
$\partial$ (partial derivative symbol) = indicates we're changing only $X$ while holding everything else fixed
This is the key difference from a standard OLS model: the effect of one extra unit of $X$ depends on where you currently are. At low values of $X$, the marginal effect is $\approx \beta_1$. As $X$ grows, the term $2\beta_2 X$ kicks in and changes the slope — accelerating it if $\beta_2 > 0$, decelerating it if $\beta_2 < 0$.
Testing Whether the Quadratic Term Is Needed
Should you include $X^2$ at all? If the true relationship is actually linear, then $\beta_2 = 0$ and the model reduces to the standard linear case. So the test is simple: run the OLS regression with $X^2$ included, and look at the t-test for $\hat{\beta}_2$.
If $\hat{\beta}_2$ is not significantly different from zero (p-value > 0.05), you don't have evidence of curvature — stick with the linear model. If the t-test rejects $\hat{\beta}_2 = 0$, the non-linear relationship is statistically significant and you should keep the quadratic term.
The Vertex: Where the Slope Crosses Zero
The vertex (or "turning point") of the parabola is the value of $X$ where the slope equals zero — where the function switches from increasing to decreasing (or vice versa). It's the peak (if $\beta_2 < 0$) or the trough (if $\beta_2 > 0$).
Set $\frac{\partial E(Y|X)}{\partial X} = \beta_1 + 2\beta_2 X = 0$ and solve for $X$ to derive this formula
If $X^*$ lies outside the range of your data, the parabola is monotone in your data range — it only goes up or only goes down, and you won't observe the turning point
The vertex is practically important: it tells you, for example, at what years of experience wages peak, or at what level of advertising spending returns start diminishing. Always check whether $X^*$ falls within your observed data range — if not, your data only shows one side of the parabola.
Interactive — Quadratic Regression Explorer
The red curve shows $E(Y|X) = 20 + \beta_1 X + \beta_2 X^2$. The dashed vertical line marks the vertex $X^* = -\beta_1 / (2\beta_2)$. The tangent line at the marker shows the marginal effect at that point. Drag the sliders to see how curvature and direction change.
Part 2: Interaction Terms — When Effects Depend on Context
The Big Picture: Why Interactions?
A standard regression assumes the effect of $X_1$ on $Y$ is the same regardless of the value of $X_2$. But is that realistic? Does education always have the same return on wages, whether you live in a rural area or a major city? Does a drug always have the same effect, regardless of age?
Interaction terms let you answer: does the effect of $X_1$ on $Y$ depend on the level of $X_2$? If it does, a pure additive model misses the story entirely.
The Model with Interaction
An interaction term is simply the product of two variables, $X_1 \cdot X_2$, added to the model:
$\beta_1$ = the effect of $X_1$ on $Y$ when $X_2 = 0$
$\beta_2$ = the effect of $X_2$ on $Y$ when $X_1 = 0$
$\beta_3$ = the interaction coefficient — by how much does the effect of $X_1$ change for each one-unit increase in $X_2$?
(equivalently: by how much does the effect of $X_2$ change for each one-unit increase in $X_1$)
The Marginal Effect with Interactions
Taking the derivative with respect to $X_1$:
$\beta_1$ = base effect of $X_1$ (when $X_2 = 0$)
$\beta_3 X_2$ = how much the effect of $X_1$ changes as $X_2$ increases
If $\beta_3 > 0$: higher $X_2$ amplifies the effect of $X_1$
If $\beta_3 < 0$: higher $X_2$ dampens the effect of $X_1$
If $\beta_3 = 0$: no interaction — the two variables act independently
Dummy Variables with Interaction Terms
A particularly common and powerful application is interacting a continuous variable with a dummy variable. Recall from Dummy Variables that a dummy $D_i \in \{0, 1\}$ shifts the intercept of the regression line up or down for different groups. But what if the slope also differs between groups?
Interacting $X$ with $D$ allows each group to have a completely different regression line — different intercept and different slope:
$X_i \cdot D_i$ = interaction term: zero for the base group ($D=0$), equals $X_i$ for the other group ($D=1$)
$\beta_2$ = intercept shift for $D=1$ group (vertical gap between the two lines when $X = 0$)
$\beta_3$ = slope shift for $D=1$ group (how much steeper or flatter the line is for group $D=1$ compared to $D=0$)
Substituting $D = 0$ and $D = 1$ separately makes the model concrete:
For the base group, the model is a simple line with intercept $\beta_0$ and slope $\beta_1$. For the $D=1$ group, the intercept is shifted by $\beta_2$ and the slope is shifted by $\beta_3$. The two lines can cross — which is exactly what happens when the effect of $X$ differs across groups.
Interactive — Dummy Variable Interaction
Red = base group (D=0), slope = $\beta_1$. Blue = D=1 group, slope = $\beta_1 + \beta_3$. The vertical gap at $X=0$ is $\beta_2$. If the lines cross, the two groups' advantage reverses at that point.
Worked Example: Salary, Experience, and Education
Worked Example — Labour Economics
A labour economist models annual salary (€) as a function of years of experience ($\text{Exp}$) and whether the worker holds a college degree ($\text{Edu} = 1$) or not ($\text{Edu} = 0$). The estimated model is:
Step 1: What Does Each Coefficient Mean?
$\hat{\beta}_1 = 1{,}800$: at the start of a career ($\text{Exp} \approx 0$), each extra year of experience adds €1,800 to salary for non-graduates. But this effect shrinks as experience grows (because $\hat{\beta}_2 = -45 < 0$).
$\hat{\beta}_2 = -45$: the parabola opens downward — wages rise with experience but at a diminishing rate, eventually peaking and (in theory) declining.
$\hat{\beta}_3 = 600$: college graduates benefit more from each year of experience. Their salary slope is $1{,}800 + 600 = 2{,}400$ at the start of their career, compared to $1{,}800$ for non-graduates.
Step 2: Find the Vertex (Peak Salary) for Each Group
For non-graduates ($\text{Edu} = 0$), the slope is $1{,}800 + 2 \cdot (-45) \cdot \text{Exp}$. Setting this to zero:
For college graduates ($\text{Edu} = 1$), the effective linear slope is $\beta_1 + \beta_3 = 1{,}800 + 600 = 2{,}400$:
College graduates reach their salary peak later and at a higher level — because their slope is steeper, it takes longer to be overcome by the $-45 \cdot \text{Exp}^2$ term.
Step 3: Predict Salary and Marginal Effects at Exp = 10
For a non-graduate with 10 years of experience:
For a college graduate with 10 years of experience:
The marginal effect of one more year of experience at $\text{Exp} = 10$:
At exactly 20 years of experience, the non-graduate's marginal effect reaches zero: $1{,}800 + 2 \cdot (-45) \cdot 20 = 1{,}800 - 1{,}800 = 0$. That's the wage peak for non-graduates. Any more experience beyond 20 years actually decreases predicted salary for this group.
Step 4: Interpret the Full Story
The model reveals a nuanced labour market story. Both groups see diminishing returns to experience — each extra year adds less and less salary — but college graduates maintain a stronger momentum: their salary keeps rising until 26.7 years of experience, reaching a peak of €50,000, versus the non-graduate peak of €36,000 at 20 years. The interaction term ($\hat{\beta}_3 = 600$) captures exactly this difference in momentum: college education doesn't just shift wages upward uniformly — it fundamentally changes how much people gain from experience.
Salary curves for non-graduates (red, D = 0) and graduates (blue, D = 1). The vertex markers show the experience level at which salary peaks for each group. The interaction term β₃ = 600 means each extra year of experience is worth €600 more for graduates — the widening gap between the two curves.