Matheno - Learn Well and Excel

C.1 Chain Rule – Introduction

You might have noticed that so far in this “Calculating Derivatives” chapter, we have only found the derivative of simple functions like $x^2,$ $\sin x,$ $e^x,$ and the product and quotient of such functions. We have not yet found the derivative of a function like $\sin (2x),$ or $e^{x^2}.$ There’s a reason for this: we need a new tool to be able to do so. That tool is called “the Chain Rule,” which we will spend this entire Section exploring because it’s that important. We’ll of course also provide lots of practice so you’ll confidently have its use comfortably in your toolkit.

Note: Some students initially find abstract discussion of the Chain Rule difficult to understand. If you’re one of them, we encourage you to jump down to the Check Questions at the bottom of this screen to see how easy the Chain Rule actually is to use in practice, and then proceed to the next screen to develop your problem-solving skills further. Once you see how to use the Chain Rule routinely, you may find the discussion of why it works the way it does easier to follow.
Quick examples of why we need a new rule

Let’s quickly consider two examples to illustrate why we need a new rule.

First, consider the function $f(x) = (2x)^3.$ Viewing this function as $f(x) = 8x^3,$ we know from the Power Rule that its derivative is $f'(x) = 24x^2.$

Naively, looking at $f(x) = (2x)^3$ to find the derivative you might simply bring the 3 down in front of the parentheses and change the power to 2:
\begin{align*}
f'(x) = \big[(2x)^3\big]\,’ \, &\overbrace{=}^{?} 3(2x)^2 &&\text{[Does this naive approach work??]} \\[8px] &\overbrace{=}^{?} 3 \left(4x^2 \right) \\[8px] &\overbrace{=}^{?} 12x^2 \ne 24x^2 \quad \xmark &&\color{red}{\left[\text{No!} \quad \big[(2x)^3]\,’ \ne 3(2x)^2 \right]}
\end{align*}
Ack: our naive approach gives a result that is off by a factor of 2 from the correct answer.

As a second example, consider the function $p(x) = (3x – 1)^2.$ If we view this function as $p(x) = 9x^2 – 6x + 1,$ then we know immediately from the Power Rule that its derivative is $p'(x) = 18x -6.$

Again thinking naively, you might simply bring the power of 2 down in front of the parentheses:
\begin{align*}
p'(x) = \big[(3x – 1)^2\big]\,’ \, &\overbrace{=}^{?} 2(3x-1) &&\text{[Does this naive approach work??]} \\[8px] &\overbrace{=}^{?} 6x – 2 \ne 18x -6 \quad \xmark &&\color{red}{\left[\text{No!} \quad 6x – 2 \ne 18x -6 \right]}
\end{align*}
Again the naive approach doesn’t work, and this time we’re off by a factor of 3. (Hmmm.)

As you’ll see below, the Chain Rule resolves this discrepancy, and will let us — easily, with practice! — find the derivatives of functions that are quite complicated.

In order to understand the Chain Rule, we first need to make sure we’re clear about compound functions.

Compound (Composite) Functions Review

Recall that a compound function, also known as a composite function, is a function comprised of one or more functions inside it.

For instance, $\left(x^2+1\right)^7$ is comprised of the inner function $x^2 + 1$ inside the outer function $(\boxed{\phantom{\cdots}})^7.$

As another example, $e^{\sin x}$ is comprised of the inner function $\sin x$ inside the outer function $e^{\boxed{\phantom{\cdots}}}.$

As yet another example, $\ln{(t^3 – 2t^2 +5)}$ is comprised of the inner function $t^3 – 2t^2 +5$ inside the outer function $\ln(\boxed{\phantom{\cdots}}).$


Tips icon“How can I tell what the inner and outer functions are?”
Here’s a foolproof method: Imagine calculating the value of the function for a particular value of x and identify the steps you would take, because you’ll always automatically start with the inner function and work your way out to the outer function.

For example, imagine computing $\left(x^2+1\right)^7$ for $x=3.$ Without thinking about it, you would first calculate $x^2 + 1$ (which equals $3^2 +1 =10$), so that’s the inner function, guaranteed. Then you would next calculate $10^7,$ and so $(\boxed{\phantom{\cdots}})^7$ is the outer function.

This imaginary computational process works every time to identify correctly what the inner and outer functions are.


Compound Functions Example 1: Identify Inner and Outer Functions

Each function below can be thought of as a composition of functions, $f\Big(g(x)\Big),$ where $g(x)$ is the input, or “inside” of $f(x)$, the “outside” function. In each case, identify an inside and an outside function that, when composed, are equivalent to the given function.

Note: Often there is more than one way to define the inside and outside functions, and even to determine how many “layers deep” the functions go. Our solution below may not be the only correct possibility.

  1. $p(x)=(3x-1)^2.$
  2. $s(x) = \dfrac{1}{1+e^{-x}}.$ (We’ll view this as being comprised of three functions.)
  3. $f(t) = A \cos(bt).$

Solution.
In the tables below we present three different ways of describing each function’s decomposition: verbal description, “box notation,” and more common “function notation” using x, u, t and such.

(a) Given $p(x)=(3x-1)^2,$ if you were to compute $p(4)$ you would first calculate $(3•4-1) = 11,$ so $3x-1$ is the inner function. You would then square that value of 11, and so $(\Box)^2$ is the outer function.

InsideOutside
descriptionmultiply the input by 3, and subtract 1square the input
boxes$3\Box-1$$\Box^2$
function notation$g(x) = 3x-1$$f(u) = u^2$

 

So, composing our outside and inside functions, we get $f\Big(g(x)\Big) = (3x-1)^2$. Another way to say it is that we take all of the “stuff” from our inside function, and put it into the input box of the outside function.

(b) We rewrote $s(x) = \dfrac{1}{1+e^{-x}}$ as a composition of three functions. You could certainly choose a different way to view this decomposition that would also be correct: if you did it differently, you can check your answer by recomposing your functions, and see whether the function you get is the same as $s(x).$

The way we thought about it: starting with input x, (1) make that value negative; (2) raise e to that value and add 1; and then (3) take the reciprocal of that value.

Further InsideInsideOutside
descriptionmake the input negativeraise e to the power of the input, and add 1take the reciprocal of the input
boxes$-\Box$$1+e^{\Box}$$\dfrac{1}{\Box} = \Box^{-1}$
function notation$h(x) = -x$$g(w) = 1+e^w$$f(u) = \dfrac{1}{u} =(u)^{-1}$

 
So, composing our “Further Inside” function with the “Inside” function, we get $g\big(h(x)\big)=1+e^{-x}$. Then, composing this resulting function with the “Outside” function, we get $f\Big(g\big(h(x)\big)\Big)=\dfrac{1}{1+e^{-x}}$.

Another way to say it is that we take all of the “stuff” from “Further Inside,” and put it into the input box of “Inside.” And then we take all of the “stuff” from “Inside,” and put it into the input box of “Outside.”

(c) $f(t) = A \cos(bt).$

InsideOutside
descriptionmultiply the input by a factor of $b$take the cosine of the input, and multiply it by a factor $A$
boxes$b\Box$$A\cos(\Box)$
function notation$g(t) = bt$$f(u) = A\cos(u)$

 
So, composing our “Inside” function with the “Outside” function, we get $f\Big(g(t)\Big)=A\cos(bt)$. Another way to say it is that we take all of the “stuff” from inside, $b\Box,$ and place it in the input of “Outside”: $A\cos(b\Box)$.

Alternate notation: You may see the compound (or composite) function $f\big(g(x)\big)$ written instead as $(f \circ g)(x).$ These mean exactly the same thing, and both are said as “f of g of x.” Most students prefer the former notation, as do we, and so we’ll almost always use it as we did above. However, since you may encounter or be asked about $(f \circ g)(x),$ please know it’s simply a different way of writing $f\big(g(x)\big).$

Now please set aside this quick review of compound functions. It’ll be imporant again in a bit, but first we’re going to develop an intuitive understanding of the Chain Rule before we present it formally.

Developing a Conceptual Understanding of the Chain Rule: A balloon ascends and cools

Before we present the Chain Rule, let’s consider an everyday scenario that illustrates the core idea.

Imagine a balloon that travels straight upward at a rate given in m/s.

As you may know, as you move upward away from the Earth’s surface, the temperature of the air decreases. Specifically, the air around it gets cooler and cooler at a rate given in $^{\circ}\text{F/m}.$ For simplicity, let’s imagine the balloon’s temperature always matches that of the surrounding air.

To keep our focus on the key point here, we’re going to pretend that the two rates are constant:
The balloon travels straight upward such that its elevation, E, changes at the constant rate with respect to time:
\[\dfrac{d(\text{ elevation })}{d(\text{ time })} = \dfrac{dE}{dt} = 0.004 \, \dfrac{\text{m}}{\text{s}}\] Hence, for instance in 1 second the balloon ascends 0.004 meters.

As the balloon travels upward away from the Earth’s surface, its temperature, T, changes at the constant rate with respect to elevation:
\[\dfrac{d(\text{ temperature })}{d(\text{ elevation })} = \dfrac{dT}{dE} = -0.01\, \dfrac{^\circ\text{F}}{\text{m}}\] For instance, when the balloon gains 1 meter in elevation, its temperature changes by $-0.01\, ^\circ\text{F}.$

Picture of a hot air balloon ascending. Text on the left next to an upward-pointing arrow reads balloon ascends at the time-rate d( elevation) / d( time) = dE/dt = 0.004 m/s. Text on the right reads air temperature changes at the elevation-rate d( temperature )/d (elevation) = dT/dE = -0.01 deg F/m.

Here’s the question: What is $\dfrac{d(\text{ temperature })}{d(\text{ time })}?$
For instance, in 1 second, how much does the balloon’s temperature change?

[Do you have an answer in mind? If not, please stop and develop one for yourself. In particular, imagine what happens over 1 second: the balloon travels upward ___ m, which means its temperature changes by . . . . ]

If your instinct was simply to multiply the two rates, then great! Hold onto that intuition, because it is perfectly correct and is at the core of the Chain Rule.

If not, think about what happens over the course of a 1-second time change. Given $dE/dt = 0.004 \, \tfrac{\text{m}}{\text{s}},$ over 1 second the balloon’s elevation increases by 0.004 meters.

Now shift focus. When the balloon’s elevation increases by 0.004 meters, what temperature change does it experience? Since the temperature rate of change is $dT/dE = -0.01^\circ\text{F/m},$ then over the 1-second elevation increase of 0.004 meters the balloon’s temperature changes by
\[\Delta T = (0.004 \text{ m }) \left( -0.01\, \tfrac{^\circ\text{F}}{\text{m}}\right) = -0.00004^\circ\text{F}\] So for a time change of 1 second, our temperature changes by $-.00004^\circ\text{F}.$

Having focused on what happens in 1 second, let’s return to the time-rate at which the balloon’s temperature changes, $\dfrac{dT}{dt}.$ To find the small change in temperature, dT, relative to a small change in time, dt, we simply multiply the rate at which elevation changes with time (dE/dt) by the rate at which the temperature changes with elevation (dT/dE):
\begin{align*}
\dfrac{dT}{dt} &= \dfrac{dT}{dE} \cdot \dfrac{dE}{dt} \\[8px] &= \left( 0.004 \, \dfrac{\cancel{\text{m}}}{\text{s}}\right) \left(-0.01\, \dfrac{^\circ\text{F}}{\cancel{\text{m}}}\right) = -0.00004\dfrac{^\circ\text{F}}{\text{s}}
\end{align*}
Notice that the units cancel in this calculation as we would expect.

If that all makes sense, you have the fundamental idea behind the Chain Rule.

Recast in terms of functions

To see how this scenario relates to compound functions, let’s recast the balloon’s temperature change in function notation.

We know that variations in time result in corresponding variations in elevation, so elevation is a function of time. We denote this functional dependence by writing $E = E(t)$.

Similarly, variations in elevation result in corresponding variations in temperature, so temperature is a function of elevation. We denote this by writing $T = T(E).$

Putting the pieces together, we can write temperature T as a function of time t as
\[T(t) = T\Big(E(t)\Big)\] So really, from the beginning we could have said this scenario considers a composition of the functions T and $E.$ $E(t)$ was just a pesky intermediate function that we had to go through to see what the relationship between the balloon’s temperature and time.

Returning to the question of $f\Big(g(x)\Big)$, which for the moment we’ll write as $y\Big(u(x)\Big).$ Your quick calculation above shows that if you are interested in the derivative of y with respect to x, but there’s a pesky intermediate function u between y and x, you can still find the derivative — easily! — by taking the derivative of the outside function with respect to the inside function, $\dfrac{dy}{du},$ and then multiplying by the derivative of the inside function with respect to the input variable x, $\dfrac{du}{dx}.$ That is,
\[\dfrac{dy}{dx} = \dfrac{dy}{du} \cdot \dfrac{du}{dx} \] This is exactly the process you probably landed on intuitively when you multiplied the two rates and computed $\dfrac{dT}{dt} = \dfrac{dT}{dE} \cdot \dfrac{dE}{dt}$ above.

The key thing to notice in all of this: to find the rate-of-change of the overall function $T(t) = T\Big(E(t)\Big)$ with respect to the inner variable t, you automatically multiplied the rate-of-change of the outer function by the rate-of-change of the inner function. That’s the Chain Rule.

Chain Rule

Here is a formal statement of the Chain Rule:

Chain Rule (with prime notation):
For two functions $f(x)$ and $g(x)$, if $g$ is differentiable at $x$ and $f$ is differentiable at $g(x)$, then
\begin{align*}
\Big[f\big(g(x)\big) \Big]’ &= f’\big(g(x)\big)\cdot g'(x) \\[8px] &= [\text{derivative of the outer function, evaluated at the inner function}] \\[8px] &\qquad \cdot [\text{derivative of the inner function}] \end{align*}

The Chain Rule is often written with Leibniz notation instead, because its easy to remember and is one case where thinking of the derivative as a fraction actually comes in handy.

Chain Rule (with Leibniz notation):
For two differentiable functions $y(x)$ and $u(x),$
\[\dfrac{dy}{dx} = \dfrac{dy}{du} \cdot \dfrac{du}{dx}\]

Using the alternate notation $(f \circ g)(x)$ for the compound function $f\big(g(x)\big),$ the Chain Rule is

Chain Rule (alternate notation):
\[ \dfrac{d}{dx} (f \circ g)(x) = \dfrac{df}{dg}\cdot \dfrac{dg}{dx}\]

If we think of the derivative as a fraction of vanishing quantities (as Leibniz did), then the statement of the Chain Rule seems almost obvious. As you can see, when we multiply two fractions that share a common factor in the numerator and denominator ($dg$), then that factor “cancels” and we are left with
\[\dfrac{df}{\cancel{dg}} \cdot \dfrac{\cancel{dg}}{dx} = \dfrac{df}{dx}\] This is probably what you did in your head when you considered the balloon scenario above: $\dfrac{dT}{dt} = \dfrac{dT}{\cancel{dE}} \cdot \dfrac{\cancel{dE}}{dt}.$ As we noted above, the units cancel correctly, as they must, giving us the result we were after for the rate of change of temperature T with respect to time t: $\dfrac{dT}{dt} = \left( 0.004 \, \dfrac{\cancel{\text{m}}}{\text{s}}\right) \left(-0.01\, \dfrac{^\circ\text{F}}{\cancel{\text{m}}}\right) = -0.00004\dfrac{^\circ\text{F}}{\text{s}}.$

WARNING: While for first derivatives we can think of differentials as canceling in this way, we cannot extend this reasoning to second- and other higher-order derivatives. We also cannot apply other properties of fractions to derivatives. Indeed, even this notion of canceling these differentials was quite controversial until the 1960s (quite late for the development of Calculus since Newton and Leibnitz were alive in the 1600s).

Show/Hide quick example of derivatives not behaving as fractions

This isn’t crucial and if you’re first learning this material, you can skip this box. If you’re curious about how we can’t suddenly always treat derivatives as if they’re fractions, read on:

If you’re working with a fraction of numbers like $\dfrac{4}{9},$ then certainly
\[\left(\frac{4}{9} \right)^{1/2} = \frac{4^{1/2}}{9^{1/2}} = \frac{2}{3}\] By contrast, writing
\[\left(\frac{dy}{dx}\right)^{1/2} \overbrace{=}^{?} \dfrac{(dx)^{1/2}}{(dy)^{1/2}} \quad \text{?!?}\] makes no sense. We have no way to interpret such a statement: neither $\sqrt{dx}$ nor $\sqrt{dy}$ has any evident meaning. We therefore cannot generally think of $\dfrac{dy}{dx}$ as simply dx divided by dy and treat it like we would any other fraction.

We could certainly generate other examples where treating a derivative as a fraction, using Leibnitz notation, leads to nonsensical results.

In more advanced classes you may dive more deeply into the meaning of the derivative and its component parts, but for now let’s just say that while it’s safe to treat differentials as canceling when you use the Chain Rule for first derivatives, this is the only safe time to do so.

[collapse]

Using the Chain Rule

Enough with the abstract; let’s get to some Examples to show how we use the Chain Rule routinely in practice. We’ll begin by resolving the first quick examples we introduced at the top of the page to illustrate why we need the Chain Rule at all.

Chain Rule Example #1: $f(x) = (2x)^3$

Use the Chain Rule to differentiate $f(x) = (2x)^3.$
Note: As we saw above without using the Chain Rule, since $f(x) = 8x^3$ we know immediately that the answer is $f'(x) = 3 \cdot 8x^{3-1} = 24x^2.$

Solutions.
We’ll solve this using three different approaches — but we encourage you to become comfortable with the third approach as quickly as possible, because that’s the one you’ll use to compute derivatives quickly as the course progresses.

• Solution 1.
Let’s use the first form of the Chain rule above:
\begin{align*}\Big[ f\Big(g(x)\Big)\Big]’ &= f’\Big(g(x)\Big) \cdot g'(x) \\[5px] &=\text{[derivative of the outer function, evaluated at the inner function] } \\[5px] &\qquad \times \text{ [derivative of the inner function]}
\end{align*}
We have the outer function $f(u) = u^3,$ and the inner function $u = g(x) = 2x.$

Then $f'(u) = 3u^2,$ and $g'(x) = 2.$ ($\leftarrow$ Notice that factor of 2!)
Hence
\begin{align*}
f'(x) &= 3u^2 \cdot 2 \\[8px] &= 3(2x)^2 \cdot 2 \\[8px] &= 3(4x^2) \cdot 2 = 24x^2 \quad \cmark
\end{align*}
Ah: our naive approach at the start of the page was missing that very factor of 2 that comes from the Chain Rule as the derivative of the inner function. As we said, the Chain Rule makes this all work easily.

• Solution 2.

Let’s use the second form of the Chain rule above:
\[\dfrac{dy}{dx} = \dfrac{dy}{du} \cdot \dfrac{du}{dx} \] We have $y = u^3$ and $u = 2x.$

Then $\dfrac{dy}{du} = 3u^2,$ and $\dfrac{du}{dx} = 2$ (again, there’s that factor of 2!). Hence
\begin{align*}
\dfrac{dy}{dx} &= 3u^2 \cdot 2 \\[8px] &= 3(2x)^2 \cdot 2 \\[8px] &= 3(4x^2) \cdot 2 = 24x^2 \quad \cmark
\end{align*}

• Solution 3.

With some experience, you won’t introduce a new variable like $u = \dots$ as we did in Solutions 1 and 2.

Instead, you’ll think something like: “The function is some stuff to the 3rd power. So the derivative is 3 times that same stuff to the 2nd power, times the derivative of that stuff.”
\[\dfrac{df}{dx} = \left[\dfrac{df}{d\text{(stuff)}}\text{, with the same stuff inside} \right] \times \dfrac{d}{dx}\text{(stuff)}\] \begin{align*}
f(x) &= (\text{stuff})^3; \quad \text{stuff} = 2x \\[12px] \text{Then}\phantom{f(x)= }\\
\frac{df}{dx} &= 3(\text{stuff})^2 \cdot \left(\frac{d}{dx}(2x)\right) \\[8px] &= 3(2x)^2 \cdot 2 = 24x^2 \quad \cmark
\end{align*}
Note: You’d never actually write “stuff = ….” Instead just hold in your head what that “stuff” is, and proceed to write down the required derivatives.

Let’s consider next one of the functions from our Compound Functions Example above.

Chain Rule Example 2: $f(t) = A\cos(bt)$

Given $f(t) = A\cos(bt)$, find $\dfrac{df}{dt}$.

Solution.
We’ll again solve this using three different approaches, and again encourage you to become comfortable with the third approach as quickly as possible.

• Solution 1.
Let’s use the first form of the Chain rule above:
\begin{align*}\Big[ f\Big(g(x)\Big)\Big]’ &= f’\Big(g(x)\Big) \cdot g'(x) \\[5px] &=\text{[derivative of the outer function, evaluated at the inner function] } \\[5px] &\qquad \times \text{ [derivative of the inner function]}
\end{align*}
In Compound Functions Example 1 above, we recast this function as the composition $f\Big(g(t)\Big)$ where the outer function $f(u)=A\cos(u)$ and the inside function $u = g(t)=bt.$

Then $f'(u) = -A\sin(u)$ and $g'(t) = b.$
Hence
\begin{align*}
\dfrac{df}{dt} = \Big[ f\Big(g(t)\Big)\Big]’ &= -A\sin(u) \cdot b \\[8px] &= -Ab \sin(bt) \quad \cmark
\end{align*}

• Solution 2.
Let’s use the second form of the Chain rule above:
\[\dfrac{dy}{dt} = \dfrac{dy}{du} \cdot \dfrac{du}{dt} \] We have $y = A\cos(u)$ and $u = bt.$

Then $\dfrac{dy}{du} = -A\sin(u)$ and $\dfrac{du}{dt} = b.$ Hence
\begin{align*}
\dfrac{dy}{dt} &= -A\sin(u) \cdot b \\[8px] &= -Ab\sin(bt) \quad \cmark
\end{align*}

• Solution 3.
With some experience, you won’t introduce a new variable like $u = \dots$ as we did above.

Instead, you’ll think something like: “The function is $A\cos(\text{some stuff}).$ The derivative is thus $-A\sin(\text{that same stuff}),$ times the derivative of that stuff.”
\[\dfrac{df}{dt} = \left[\dfrac{df}{d\text{(stuff)}}\text{, with the same stuff inside} \right] \times \dfrac{d}{dt}\text{(stuff)}\] \begin{align*}
f(t) &= A\cos(\text{stuff}); \quad \text{(stuff)} = bt \\[8px] \text{Then}\phantom{f(t)= }\\
\dfrac{df}{dt} &= -A\sin(\text{stuff}) \cdot \dfrac{d}{dt}(bt) \\[8px] &= -A\sin(bt) \cdot b \\[8px] &= -Ab\sin(bt) \quad \cmark
\end{align*}
Note: You’d never actually write “stuff = ….” Instead just hold in your head what that “stuff” is, and proceed to write down the required derivatives. We’ll make this more formal immediately below this Example.

What’s chained in the Chain Rule?

Let’s use the preceding Example to both explain why the Chain Rule has the name it does, and also to justify our use of “stuff” in quickly reasoning our way through finding the derivative of even the most complex functions.

First, the “Chain Rule” has the name it does because compositions of functions can be thought of as “chains” of functions, and the Chain Rule provides our way to differentiate these functions.

Consider the example from immediately above. The function $f(t)=A\cos(bt)$ can be thought of as a chain of functions:
Illustrating the chain rule, links in a chain for the function f(t) = A cos(bt). First link inside text: input something. Below that link is the input letter t. An arrow points to the second link, which has text inside: multiply it by b. Text beneath that link says f_1(t) = bt. An arrow then points to the third chain link, with text inside: take the cosine of that, and then multiply by A. Text beneath the link says f_2(u) = A cos(u).

Images of nested dolls, from Wikipedia Commons.The third, final link in this chain is the most “outside” procedure we apply, whereas the first link is the inner-most piece of the function. We can write the entire procedure we are using with function and box notation as $f(\Box) = f_2\Big(f_1(\Box)\Big)$ where $f_1(\Box) = b\Box,$ and $f_2(\Box) = A\cos(\Box).$ We typically describe functions like this as “nested” – reminiscent of nesting dolls known as Matryoshka – with one function inside another (potentially inside another, inside another …).

To differentiate this chain, we start from the outside and work our way inward until we hit something that can be considered a function. We’ll use a downward pointing arrow to indicate our focus as we work our way along:
\[f(t) = \buildrel \downarrow \over {A} \cos(bt)\] We don’t have to worry about the constant multiple A on the very outside here because of the derivative property $\Big(kf(x)\Big)’ = kf'(x)$. That is, we know that A will simply appear in front of the derivative in the same way it appears in front of the original function here.

So let’s move the arrow to the right.
\[f(t) = A \buildrel \downarrow \over {\cos}(bt)\] Ah, now we are looking at something where taking the derivative actually changes the function: cosine changes to -sine when we take the derivative. So, let’s imagine covering the stuff up on the inside of this function so we can apply the derivative rules that we know. In the following equations we literally use a gray box to “cover up” what’s inside, as you can imagine doing in your head. If you tap each box you’ll reveal what’s underneath, and then tap again to hide it. But the key point is that for the moment what’s underneath doesn’t matter and so we can leave it covered-up:
\[ A\cos'(\toggle{\style{background-color:gray}{\Box}}{bt}\endtoggle) = -A\sin(\toggle{\style{background-color:gray}{\Box}}{bt}\endtoggle) \times \dots\] Here’s a crucial point: whether we use $\toggle{\style{background-color:gray}{\Box}}{bt}\endtoggle$ or call the inner function g, or hold it in our minds as “stuff,” according to the Chain Rule we first take the initial derivative of the outer function with respect to that inner function:
\begin{align*}
f(\toggle{\style{background-color:gray}{\Box}}{bt}\endtoggle) = A\cos(\toggle{\style{background-color:gray}{\Box}}{bt}\endtoggle) &\implies f'(\toggle{\style{background-color:gray}{\Box}}{bt}\endtoggle)= \dfrac{df}{d\,\toggle{\style{background-color:gray}{\Box}}{bt}\endtoggle} = -A\sin(\toggle{\style{background-color:gray}{\Box}}{bt}\endtoggle) \\[8px] f\big(g(t)\big) = A\cos\big(g(t)\big) &\implies \Big[f\big(g(t)\big)\Big]’ = \dfrac{df}{dg} = -A\sin\Big(g(t)\Big) \\[8px] f(\text{stuff}) = A\cos(\text{stuff}) &\implies [f(\text{stuff})]’ = \dfrac{df}{d(\text{stuff})} = -A\cos(\text{stuff})
\end{align*}
That is, the first term of the Chain Rule is always the derivative of the outside function with respect to whatever is inside it, no matter what you call it. That’s why we can “get away with” just holding it as stuff (or however you want to think about it) in our heads. And the argument of this outer function remains unchanged: $\toggle{\style{background-color:gray}{\Box}}{bt}\endtoggle,$ or g(t), or “stuff” remains unchanged and just gets plugged back in after we take this first derivative.

The crux of the Chain Rule is what happens next: now we must multiply this first Chain Rule term by the derivative of the inside function, meaning the stuff we covered up with $\toggle{\style{background-color:gray}{\Box}}{bt}\endtoggle$. In this case we covered up $bt.$ So we just continue using our arrow method, now looking at $bt$:
\[\toggle{\style{background-color:gray}{\Box}}{bt}\endtoggle = \buildrel \downarrow \over {b} t\] This b is just a constant, so it just carries along and we can move the arrow over:
\[\toggle{\style{background-color:gray}{\Box}}{bt}\endtoggle = b \buildrel \downarrow \over {t}\] Now when we take the derivative of t with respect to t, we just get 1. So the derivative of our inside “stuff” covered up by the box is just b. So, applying this Chain Rule term:
\[f'(t) = -A\sin(\toggle{\style{background-color:gray}{\Box}}{bt}\endtoggle)\cdot b\] We’re not quite done, since the original function didn’t have a $\toggle{\style{background-color:gray}{\Box}}{bt}\endtoggle$ in it and so we need to uncover that: $\toggle{\style{background-color:gray}{\Box}}{bt}\endtoggle = bt.$
\[f'(t) = -Ab\sin(bt) \quad \cmark \] However you approach using the Chain Rule, you should get the same result. The arrow / box covering method starts to come in handy when nesting of functions are deep, since as you’ll see you just work your way down the chain. With enough practice, you’ll be able to differentiate quite complex-looking functions with just one or two lines of work. (We’ll get to that in a few screens.)


Tips icon
Remember the
Chain Rule term!
As you might imagine, the most common beginning student error is to forget to multiply by the derivative of the inner function. Indeed, at this very moment all over the world teachers and tutors are saying “Chain Rule!” to beginning students who’ve forgotten this term. (It’s also why many students before an exam write “Chain Rule!” on one hand to remind themselves to include this factor.) The “naive calculations” at the top of this screen illustrate this very error; with practice, you’ll start catching yourself when you make it, and quickly remember to multiply by this “missing” factor.


Quick practice, and what’s to come

On the next screen we’ll provide you with lots of basic practice with the Chain Rule. Since the rest of the course will depend on your ability to quickly compute correct derivatives, practicing now (and making as many errors as you need to, and you will almost certainly initially make some) is super-important. On the screen after that, we’ll address more complex problems: we simply extend the chain, and just keep going as we did above to find the derivative of quite complicated-looking functions.

To end this screen, let’s do some quick work on your ability to immediately notice where a Chain-Rule term is missing. We’ll treat similar problems in more depth on the next screen; we mean these to be fast, just so you can get used to basic usage of the Chain Rule.

This content is available to logged-in users.
To interact with all of the content of this page, including Check Questions and Practice Problems (each with a complete solution), you must be logged in.
  • You can immediately sign in for free using your Facebook, Google, or Apple account.
  • If you would rather use your dedicated Matheno.com account, please enter your username and password below. (Don't have a free account but would like one? You can create it in 60 seconds here.)
This content is available to logged-in users.
This content is available to logged-in users.

The Upshot

  1. A compound (or composite) function is comprised of an outer function and an inner function.
  2. When we take the derivative of a compound function, we must use the Chain Rule.
    In Prime notation:
    \begin{align*}\Big[ f\Big(g(x)\Big)\Big]’ &= f’\Big(g(x)\Big) \cdot g'(x) \\[5px] &=\text{[derivative of the outer function, evaluated at the inner function] } \\[5px] &\qquad \times \text{ [derivative of the inner function]}
    \end{align*}
    In Leibniz notation:
    \[\dfrac{dy}{dt} = \dfrac{dy}{du} \cdot \dfrac{du}{dt} \] In alternate notation:
    \[ \dfrac{d}{dx} (f \circ g)(x) = \dfrac{df}{dg}\cdot \dfrac{dg}{dx}\] And informally, the way you may quickly come to think about it:
    \[\dfrac{df}{dx} = \left[\dfrac{df}{d\text{(stuff)}}\text{, with the same stuff inside} \right] \times \dfrac{d}{dx}\text{(stuff)}\]


On the next screen, you’ll get lots of practice with basic Chain Rule problems before we move on to more complex ones.


Questions or comments about what’s on this screen, or any other Calculus questions? Visit the Forum and we’d love to help!