28 Differential calculus
The instantaneous rate of change
Your car’s GPS records your average speed over the entire trip: 62 km/h. But your speedometer, right now, reads 90 km/h. These are different quantities. The GPS average looks back over the whole journey; the speedometer is telling you something about this instant. The question “how fast am I going right now?” is not the same question as “how fast did I go on average?” — and answering it requires a fundamentally different tool.
Here is a sharper version of the same problem. A drug is injected into the bloodstream. A doctor monitors its concentration — measured in milligrams per litre — every hour. She can compute the average rate of clearance over a two-hour window. But what determines the dosing schedule is the instantaneous rate of clearance right now: how fast is the concentration dropping at this moment? If that rate is too slow, the drug accumulates to toxic levels. Too fast, and the dose is ineffective before the next one can be given. The decision is made on the basis of a rate at a point in time, not an average across an interval.
Both situations share the same mathematical structure. In chapter 1 you met limits: the value that a function approaches as the input gets arbitrarily close to some point. The speedometer reading is a limit — the limit of average speeds over shorter and shorter time intervals. This chapter builds that limit into a systematic tool.
28.1 The limit definition of the derivative
The derivative has a precise definition as a limit. Start with the average rate of change of \(f\) over the interval from \(x\) to \(x + h\):
\[\frac{f(x+h) - f(x)}{h}\]
The numerator \(f(x+h) - f(x)\) is the change in output. The denominator \(h\) is the change in input. Their ratio is the slope of the secant line connecting the two points \((x,\, f(x))\) and \((x+h,\, f(x+h))\) on the graph.
Now let \(h \to 0\). The second point slides toward the first. The secant line tilts and, in the limit, becomes the tangent line at \(x\). The slope of that tangent line is the derivative:
\[\boxed{f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}}\]
This is the formal definition. Every differentiation rule derived later is a consequence of it. The definition itself is the limit you saw in chapter 1, applied to a specific quotient.
28.2 What the notation is saying
Two notations for the derivative are standard. You will need both.
Prime notation (Newton): \(f'(x)\), read f prime of x. Clean and fast to write. Used when there is no ambiguity about which variable you’re differentiating with respect to.
Leibniz notation: \(\dfrac{dy}{dx}\), read dy by dx. More verbose but carries structural information — it names both the output variable (\(y\)) and the input variable (\(x\)). When a problem involves several variables, or when you need to keep track of units, Leibniz notation makes the algebra clearer. Most science and engineering texts default to it.
Both mean the same thing. If \(y = f(x)\), then:
\[f'(x) = \frac{dy}{dx} = \frac{d}{dx}\bigl[f(x)\bigr]\]
The expression \(\dfrac{d}{dx}[\,\cdot\,]\) is an operator — it takes a function as input and produces its derivative as output. Think of \(d/dx\) as an instruction: differentiate with respect to \(x\).
The tangent line
The derivative \(f'(a)\) is the slope of the tangent line to the graph of \(f\) at the point \((a, f(a))\) — the straight line through \((a, f(a))\) with slope \(f'(a)\), which is the best linear approximation to \(f\) near \(x = a\).
Unlike the informal picture, a tangent line can cross the curve — \(f(x) = x^3\) at the origin has a tangent line that crosses. What defines a tangent is its slope, not whether it crosses.
This is why a function fails to be differentiable at a corner or cusp: no unique tangent line exists there.
28.3 The method — differentiation rules
Computing every derivative from the limit definition would be exhausting. Fortunately, the limit has been worked out once for each class of function, packaged into rules, and those rules can be applied mechanically. The skill in differentiation is knowing which rule to reach for.
28.3.1 Power rule
For any function \(f(x) = x^n\), where \(n\) is a real number:
\[\frac{d}{dx}\bigl[x^n\bigr] = n x^{n-1}\]
This is the most-used rule in calculus. It applies to positive integers, negative integers, fractions — any real exponent.
Derivation for positive integer \(n\). Start from the limit definition:
\[f'(x) = \lim_{h \to 0} \frac{(x+h)^n - x^n}{h}\]
To make progress, we need to expand \((x+h)^n\). For small \(n\) you can do this by multiplying out, but the pattern for general \(n\) comes from the binomial theorem — covered in Vol 3 polynomials. For \(n = 2\): \((x+h)^2 = x^2 + 2xh + h^2\). For \(n = 3\): \((x+h)^3 = x^3 + 3x^2h + 3xh^2 + h^3\). In general:
\[(x+h)^n = x^n + n x^{n-1} h + \frac{n(n-1)}{2} x^{n-2} h^2 + \cdots + h^n\]
Subtract \(x^n\) and divide by \(h\):
\[\frac{(x+h)^n - x^n}{h} = n x^{n-1} + \frac{n(n-1)}{2} x^{n-2} h + \cdots + h^{n-1}\]
Every term except the first contains at least one factor of \(h\). As \(h \to 0\), all those terms vanish:
\[f'(x) = \lim_{h \to 0} \left( n x^{n-1} + \text{terms with } h \right) = n x^{n-1}\]
The rule is proved. Notice what happened: the limit killed every term except the leading one. This is why the power rule is so clean.
Examples:
| \(f(x)\) | \(f'(x)\) |
|---|---|
| \(x^5\) | \(5x^4\) |
| \(x^{-2}\) | \(-2x^{-3}\) |
| \(\sqrt{x} = x^{1/2}\) | \(\tfrac{1}{2} x^{-1/2}\) |
| \(1 = x^0\) | \(0\) |
The last row says: the derivative of any constant is zero. A constant function has zero slope everywhere — that makes sense.
28.3.2 Sum/difference and constant multiple rules
These follow directly from the limit definition: because the limit of a sum is the sum of the limits, and a constant factor passes through a limit.
Constant multiple rule: \[\frac{d}{dx}\bigl[c \cdot f(x)\bigr] = c \cdot f'(x)\]
Sum/difference rule: \[\frac{d}{dx}\bigl[f(x) \pm g(x)\bigr] = f'(x) \pm g'(x)\]
Together these mean you can differentiate a polynomial term by term. For \(p(x) = 3x^4 - 2x^2 + 5x - 1\):
\[p'(x) = 3 \cdot 4x^3 - 2 \cdot 2x + 5 \cdot 1 - 0 = 12x^3 - 4x + 5\]
28.3.3 Product rule
The derivative of a product is not the product of the derivatives. That failure is easy to verify: \(\tfrac{d}{dx}[x \cdot x] = \tfrac{d}{dx}[x^2] = 2x\), but \(\tfrac{d}{dx}[x] \times \tfrac{d}{dx}[x] = 1 \times 1 = 1\). Different answers.
The correct rule: if \(f\) and \(g\) are both differentiable, then:
\[\frac{d}{dx}\bigl[f(x)\cdot g(x)\bigr] = f'(x)\cdot g(x) + f(x)\cdot g'(x)\]
A useful mnemonic: derivative of first times second, plus first times derivative of second. The rule accounts for how both factors are changing simultaneously.
Check: \(\tfrac{d}{dx}[x \cdot x] = 1 \cdot x + x \cdot 1 = 2x\). Correct.
28.3.4 Chain rule
The chain rule handles composite functions — functions of functions. If \(y = f(u)\) and \(u = g(x)\), so that \(y = f(g(x))\), then:
\[\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx}\]
In prime notation: \(\bigl(f \circ g\bigr)'(x) = f'\bigl(g(x)\bigr) \cdot g'(x)\).
This is the most important and most commonly misapplied rule in calculus. The key skill is identifying the outer and inner functions before differentiating.
Example. Differentiate \(y = \sin(x^2)\).
- Outer function: \(f(u) = \sin(u)\), so \(\dfrac{dy}{du} = \cos(u)\).
- Inner function: \(g(x) = x^2\), so \(\dfrac{du}{dx} = 2x\).
- Chain rule: \(\dfrac{dy}{dx} = \cos(x^2) \cdot 2x = 2x\cos(x^2)\).
The notation \(\dfrac{dy}{dx} = \dfrac{dy}{du} \cdot \dfrac{du}{dx}\) looks like the \(du\)’s cancel — and you can use it as if they do. The reason it works is multiplicative structure in the limit, not literal fraction cancellation, but the practical result is the same: set up the product and compute.
28.3.5 Derivatives of key functions
These are stated here without proof. Each can be derived from the limit definition, but the derivations involve limits of trigonometric quotients and the definition of \(e\) — both of which are established results.
| Function | Derivative | Read the derivative as |
|---|---|---|
| \(\sin x\) | \(\cos x\) | Cosine is the rate of change of sine — maximum slope at \(x=0\), zero slope at the peaks |
| \(\cos x\) | \(-\sin x\) | Sine with a sign flip — sine is decreasing where cosine reaches its peak |
| \(e^x\) | \(e^x\) | The exponential is its own derivative — the rate of growth equals the current value |
| \(\ln x\) | \(\dfrac{1}{x}\) | Valid for \(x > 0\) — the domain of \(\ln x\). For \(x < 0\), use \(\ln\lvert x\rvert\), whose derivative is also \(\frac{1}{x}\) |
The self-referential property of \(e^x\) — that it equals its own derivative — is why \(e\) appears everywhere in physics and engineering. Any system whose rate of change is proportional to its current value is governed by an exponential, and the differential equation describing it is \(\dfrac{dy}{dx} = y\).
28.4 Higher derivatives
The derivative of \(f\) is itself a function. You can differentiate it again. The second derivative \(f''(x)\) (or \(\dfrac{d^2y}{dx^2}\)) is the rate of change of the rate of change.
The clearest physical example: if \(s(t)\) is position at time \(t\), then \(s'(t)\) is velocity and \(s''(t)\) is acceleration. Acceleration is how quickly velocity is changing.
The second derivative also carries geometric information:
- \(f''(x) > 0\): the curve is concave up at \(x\) — the slope is increasing, the curve bends upward like a bowl.
- \(f''(x) < 0\): the curve is concave down at \(x\) — the slope is decreasing, the curve bends downward like an arch.
This is what the second derivative test uses.
28.5 Applications: optimisation
One of the most immediate uses of calculus is finding where a function reaches its maximum or minimum value.
Critical points. If \(f\) is differentiable and has a local maximum or minimum at \(x = c\), then the tangent line at \(c\) must be horizontal — a tilted tangent would mean the function is still rising or falling, so \(c\) couldn’t be an extreme point. Therefore:
\[f'(c) = 0 \implies c \text{ is a critical point}\]
Not every critical point is a maximum or minimum — it could be a saddle point (like \(x = 0\) for \(f(x) = x^3\), where the function is neither increasing nor decreasing at that instant but continues in the same direction on both sides). The second derivative distinguishes the cases.
Second derivative test. If \(f'(c) = 0\):
- If \(f''(c) > 0\): concave up at \(c\), so the tangent is at the bottom of a bowl — local minimum.
- If \(f''(c) < 0\): concave down at \(c\), so the tangent is at the top of an arch — local maximum.
- If \(f''(c) = 0\): the test is inconclusive. What to do when \(f''(c) = 0\) is deferred to multivariable calculus — the tools for that case are richer there. In single-variable calculus, if the second derivative test is inconclusive, you check the sign of \(f'\) on either side of \(c\) directly.
Worked procedure. Find the local extrema of \(f(x) = x^3 - 6x^2 + 9x + 1\).
Step 1. Differentiate: \(f'(x) = 3x^2 - 12x + 9\).
Step 2. Set \(f'(x) = 0\): \(3x^2 - 12x + 9 = 0 \implies x^2 - 4x + 3 = 0 \implies (x-1)(x-3) = 0\).
Critical points: \(x = 1\) and \(x = 3\).
Step 3. Compute \(f''(x) = 6x - 12\).
Step 4. Test each critical point:
- \(f''(1) = 6(1) - 12 = -6 < 0\): local maximum at \(x = 1\). Value: \(f(1) = 1 - 6 + 9 + 1 = 5\).
- \(f''(3) = 6(3) - 12 = 6 > 0\): local minimum at \(x = 3\). Value: \(f(3) = 27 - 54 + 27 + 1 = 1\).
The function rises to a local maximum of 5 at \(x = 1\), falls to a local minimum of 1 at \(x = 3\), then rises again.
28.6 Worked examples
Example 1 (Computing/data). Differentiate \(f(x) = 3x^4 - 2x^2 + 5x - 1\).
Apply the power rule term by term, with the constant multiple and sum rules:
\[f'(x) = 3 \cdot 4x^3 - 2 \cdot 2x^1 + 5 \cdot 1 - 0\]
\[f'(x) = 12x^3 - 4x + 5\]
This is a polynomial — differentiating a polynomial always produces a polynomial of degree one lower. The constant term \(-1\) vanishes: a constant has zero slope everywhere.
Example 2 (Science). Differentiate \(f(x) = x^2 \sin(x)\).
This is a product. Identify \(f(x) = x^2\) (the first factor) and \(g(x) = \sin(x)\) (the second). Apply the product rule:
\[\frac{d}{dx}\bigl[x^2 \sin x\bigr] = 2x \cdot \sin x + x^2 \cdot \cos x\]
\[= 2x\sin x + x^2 \cos x\]
This kind of function appears in wave mechanics: \(x^2 \sin x\) models an oscillation whose amplitude grows quadratically with distance from the source.
Example 3 (Engineering). Differentiate \(f(x) = e^{3x^2}\).
This is a composite function. Identify the outer and inner functions:
- Outer: \(f(u) = e^u\), which has derivative \(e^u\).
- Inner: \(g(x) = 3x^2\), which has derivative \(6x\).
Chain rule:
\[\frac{d}{dx}\bigl[e^{3x^2}\bigr] = e^{3x^2} \cdot 6x = 6x\,e^{3x^2}\]
Functions of this form appear in Gaussian distributions, heat diffusion kernels, and signal-processing windows. The chain rule is the only way to differentiate them correctly.
Example 4 (Real-world optimisation). A farmer has 80 m of fencing and wants to enclose a rectangular paddock. One side is a riverbank, which needs no fence. What dimensions maximise the enclosed area?
Set up. Let the side perpendicular to the river have length \(x\) (metres). Two of these sides are needed. The side parallel to the river, call it \(y\), uses the remaining fencing:
\[2x + y = 80 \implies y = 80 - 2x\]
The area is:
\[A(x) = x \cdot y = x(80 - 2x) = 80x - 2x^2\]
Differentiate and find critical points:
\[A'(x) = 80 - 4x\]
Setting \(A'(x) = 0\): \(80 - 4x = 0 \implies x = 20\).
Classify: \(A''(x) = -4 < 0\) everywhere, so \(x = 20\) is a local (and global) maximum.
Dimensions: \(x = 20\) m, \(y = 80 - 2(20) = 40\) m.
Maximum area: \(A = 20 \times 40 = 800\text{ m}^2\).
The shape that maximises the area is twice as wide as it is deep — a result that appears in architectural and agricultural optimisation problems wherever one boundary is free.
28.7 Where this goes
Integral calculus (ch03): Integration inverts differentiation. Where this chapter asked “given a function, find its rate of change,” the next chapter asks “given the rate of change, recover the original function.” The Fundamental Theorem of Calculus — the deepest result in the subject — shows these two operations are inverses of each other. Every calculation of area, volume, work, and accumulated change runs through it.
Ordinary differential equations (Vol 7): A differential equation is a relationship between a function and its own derivative — an equation that says something like “the rate of change of this quantity is proportional to the quantity itself.” Everything in this chapter is prerequisite: you need to know what a derivative is, how to compute one, and how the chain rule works, before a differential equation makes sense. The entire Vol 7 ODE sequence begins here.
Where this shows up
- Velocity and acceleration. If \(s(t)\) is the position of an object at time \(t\), then \(s'(t)\) is its velocity and \(s''(t)\) is its acceleration. Every problem in Newtonian mechanics is stated in these terms.
- Marginal analysis. In economics, marginal cost is the derivative of total cost with respect to quantity. Profit is maximised where marginal revenue equals marginal cost — a critical point of the profit function.
- Newton’s law of cooling. The rate of heat loss of an object is proportional to the difference between the object’s temperature and the ambient temperature. That “rate of heat loss” is a derivative. The law is \(\tfrac{dT}{dt} = -k(T - T_\text{ambient})\) — a differential equation that lives in Vol 7.
- Gradient descent. The optimisation algorithm that trains neural networks works by computing the derivative of the loss function with respect to each model parameter, then updating the parameters in the direction of steepest descent. The chain rule — applied thousands of times through a deep network — is called backpropagation.
28.8 Exercises
These are puzzles. Each has a clean answer, but the interesting part is choosing the right rule and executing the steps carefully before you reach for the result.
Exercise 1. Differentiate \(f(x) = 4x^5 - 3x^3 + 7x - 2\) using the power rule.
Exercise 2. Differentiate \(f(x) = x^3 \cos(x)\) using the product rule.
Exercise 3. Differentiate \(f(x) = (2x^3 + 1)^5\) using the chain rule.
Exercise 4. Find all critical points of \(f(x) = x^3 - 3x^2 - 9x + 2\) and classify each as a local maximum, local minimum, or neither.
Exercise 5. A manufacturer’s total cost in dollars for producing \(q\) units per day is \(C(q) = q^3 - 6q^2 + 15q + 50\). At what production level is the marginal cost minimised? (Marginal cost is \(C'(q)\).)
Exercise 6. A stone is dropped from a bridge. Its height above the water at time \(t\) seconds is \(h(t) = 45 - 4.9t^2\) metres. At what speed (in m/s) is it falling at the instant it hits the water?