C.2 The Chain Rule

On the preceding screen we reviewed compound functions and developed a conceptual understanding of the Chain Rule using the ascending balloon scenario. In particular, we saw that to find the rate-of-change of the overall function $𝑇 (𝑡) = 𝑇 (𝐸 (𝑡))$ with respect to the inner variable t, we simply multiplied the rate-of-change of the outer function $𝑇 (𝐸)$ by the rate-of-change of the inner function $𝐸 (𝑡) .$ That straightforward multiplication is the Chain Rule.

Let's formalize that process, and then we'll work some initial problems to see how easy the Chain Rule is to use.

Formal Statement of the Chain Rule

Here is a formal statement of the Chain Rule:

Chain Rule, with prime notation:

For two functions $𝑓 (𝑥)$ and $𝑔 (𝑥)$ , if $𝑔$ is differentiable at $𝑥$ and $𝑓$ is differentiable at $𝑔 (𝑥)$ , then

[𝑓 (𝑔 (𝑥))]' = 𝑓' (𝑔 (𝑥)) \cdot 𝑔' (𝑥) = [d e r i v a t i v e o f t h e o u t e r f u n c t i o n, e v a l u a t e d a t t h e i n n e r f u n c t i o n] \times [d e r i v a t i v e o f t h e i n n e r f u n c t i o n]

The Chain Rule is often written with Leibniz notation instead, because its easy to remember and is one case where thinking of the derivative as a fraction actually comes in handy.

Chain Rule, with Leibniz notation:

For two differentiable functions $𝑦 (𝑥)$ and $𝑢 (𝑥),$

𝑑 𝑦 𝑑 𝑥 = 𝑑 𝑦 𝑑 𝑢 \cdot 𝑑 𝑢 𝑑 𝑥

Using the alternate notation $(𝑓 \circ 𝑔) (𝑥)$ for the compound function $𝑓 (𝑔 (𝑥)),$ the Chain Rule is

Chain Rule, alternate notation:

$𝑑 𝑑 𝑥 (𝑓 \circ 𝑔) (𝑥) = 𝑑 𝑓 𝑑 𝑔 \cdot 𝑑 𝑔 𝑑 𝑥$

If we think of the derivative as a fraction of vanishing quantities (as Leibniz did), then the statement of the Chain Rule seems almost obvious. As you can see, when we multiply two fractions that share a common factor in the numerator and denominator ( $𝑑 𝑔$ ), then that factor "cancels" and we are left with

𝑑 𝑓 𝑑 𝑔 \cdot 𝑑 𝑔 𝑑 𝑥 = 𝑑 𝑓 𝑑 𝑥

This is probably what you did in your head when you considered the ascending balloon scenario on the preceding screen: $𝑑 𝑇 𝑑 𝑡 = 𝑑 𝑇 𝑑 𝐸 \cdot 𝑑 𝐸 𝑑 𝑡 .$ As we noted above, the units cancel correctly, as they must, giving us the result we were after for the rate of change of temperature T with respect to time t: $𝑑 𝑇 𝑑 𝑡 = (0.004 m s) (- 0.01 \circ F m) = - 0.00004 \circ F s .$

WARNING: While for first derivatives we can think of differentials as canceling in this way, we cannot extend this reasoning to second- and other higher-order derivatives. We also cannot apply other properties of fractions to derivatives. Indeed, even this notion of canceling these differentials was quite controversial until the 1960s — quite late for the development of Calculus since Newton and Leibnitz were alive in the 1600s.

Using the Chain Rule

Enough with the abstract; let's get to some Examples to show how we use the Chain Rule routinely in practice. We'll begin by resolving the first quick examples we introduced at the top of the preceding page to illustrate why we need the Chain Rule at all.

Chain Rule Example #1: $𝑓 (𝑥) = (2 𝑥) 3$

Use the Chain Rule to differentiate $𝑓 (𝑥) = (2 𝑥) 3 .$

Note: As we saw above without using the Chain Rule, since $𝑓 (𝑥) = 8 𝑥 3$ we know immediately that the answer is $𝑓' (𝑥) = 3 \cdot 8 𝑥 3 - 1 = 24 𝑥 2 .$

We'll solve this using three different approaches — but we encourage you to become comfortable with the third approach as quickly as possible, because that's the one you'll use to compute derivatives quickly as the course progresses.

• Solution 1.

Let's use the first form of the Chain rule above:

[𝑓 (𝑔 (𝑥))]' = 𝑓' (𝑔 (𝑥)) \cdot 𝑔' (𝑥) = [d e r i v a t i v e o f t h e o u t e r f u n c t i o n, e v a l u a t e d a t t h e i n n e r f u n c t i o n] \times [d e r i v a t i v e o f t h e i n n e r f u n c t i o n]

We have the outer function

𝑓 (𝑢) = 𝑢 3,

and the inner function

𝑢 = 𝑔 (𝑥) = 2 𝑥 .

Then $𝑓' (𝑢) = 3 𝑢 2,$ and $𝑔' (𝑥) = 2 .$ ( $\leftarrow$ Notice that factor of 2!)

Hence

𝑓' (𝑥) = 3 𝑢 2 \cdot 2 = 3 (2 𝑥) 2 \cdot 2 = 3 (4 𝑥 2) \cdot 2 = 24 𝑥 2 ✓

Ah: our naive approach at the start of the preceding page was missing that very factor of 2 that comes from the Chain Rule as the derivative of the inner function. As we said, the Chain Rule makes this all work easily.

• Solution 2.

Let's use the second form of the Chain rule above:

𝑑 𝑦 𝑑 𝑥 = 𝑑 𝑦 𝑑 𝑢 \cdot 𝑑 𝑢 𝑑 𝑥

We have $𝑦 = 𝑢 3$ and $𝑢 = 2 𝑥 .$

Then $𝑑 𝑦 𝑑 𝑢 = 3 𝑢 2,$ and $𝑑 𝑢 𝑑 𝑥 = 2$ (again, there's that factor of 2!). Hence

𝑑 𝑦 𝑑 𝑥 = 3 𝑢 2 \cdot 2 = 3 (2 𝑥) 2 \cdot 2 = 3 (4 𝑥 2) \cdot 2 = 24 𝑥 2 ✓

• Solution 3.

With some experience, you won't introduce a new variable like $𝑢 = \dots$ as we did in Solutions 1 and 2. Instead, you'll think something like: "The function is some stuff to the 3rd power. So the derivative is 3 times that same stuff to the 2nd power, times the derivative of that stuff."

𝑑 𝑓 𝑑 𝑥 = [𝑑 𝑓 𝑑 (s t u f f), w i t h t h e s a m e s t u f f i n s i d e] \times 𝑑 𝑑 𝑥 (s t u f f)

𝑓 (𝑥) = (s t u f f) 3; s t u f f = 2 𝑥 T h e n 𝑓 (𝑥) = 𝑑 𝑓 𝑑 𝑥 = 3 (s t u f f) 2 \cdot (𝑑 𝑑 𝑥 (2 𝑥)) = 3 (2 𝑥) 2 \cdot 2 = 24 𝑥 2 ✓

Note: You'd never actually write "stuff = ...." Instead just hold in your head what that "stuff" is, and proceed to write down the required derivatives.

Solution.

Let's consider next one of the functions from our Compound Functions Example 1 on the preceding screen.

Chain Rule Example #2: $𝑓 (𝑡) = 𝐴 c o s (𝑏 𝑡)$

Given $𝑓 (𝑡) = 𝐴 c o s (𝑏 𝑡)$ , find $𝑑 𝑓 𝑑 𝑡$ .

Solution.

We'll again solve this using three different approaches, and again encourage you to become comfortable with the third approach as quickly as possible.

• Solution 1.

Let's use the first form of the Chain rule above:

[𝑓 (𝑔 (𝑥))]' = 𝑓' (𝑔 (𝑥)) \cdot 𝑔' (𝑥) = [d e r i v a t i v e o f t h e o u t e r f u n c t i o n, e v a l u a t e d a t t h e i n n e r f u n c t i o n] \times [d e r i v a t i v e o f t h e i n n e r f u n c t i o n]

In Compound Functions Example 1, we recast this function as the composition $𝑓 (𝑔 (𝑡))$ where the outer function $𝑓 (𝑢) = 𝐴 c o s (𝑢)$ and the inside function $𝑢 = 𝑔 (𝑡) = 𝑏 𝑡 .$

Then $𝑓' (𝑢) = - 𝐴 s i n (𝑢)$ and $𝑔' (𝑡) = 𝑏 .$

Hence

𝑑 𝑓 𝑑 𝑡 = [𝑓 (𝑔 (𝑡))]' = - 𝐴 s i n (𝑢) \cdot 𝑏 = - 𝐴 𝑏 s i n (𝑏 𝑡) ✓

• Solution 2.

Let's use the second form of the Chain rule above:

𝑑 𝑦 𝑑 𝑡 = 𝑑 𝑦 𝑑 𝑢 \cdot 𝑑 𝑢 𝑑 𝑡

We have $𝑦 = 𝐴 c o s (𝑢)$ and $𝑢 = 𝑏 𝑡 .$

Then $𝑑 𝑦 𝑑 𝑢 = - 𝐴 s i n (𝑢)$ and $𝑑 𝑢 𝑑 𝑡 = 𝑏 .$ Hence

𝑑 𝑦 𝑑 𝑡 = - 𝐴 s i n (𝑢) \cdot 𝑏 = - 𝐴 𝑏 s i n (𝑏 𝑡) ✓

• Solution 3.

With some experience, you won't introduce a new variable like $𝑢 = \dots$ as we did above. Instead, you'll think something like: "The function is $𝐴 c o s (s o m e s t u f f) .$ The derivative is thus $- 𝐴 s i n (t h a t s a m e s t u f f),$ times the derivative of that stuff."

𝑑 𝑓 𝑑 𝑡 = [𝑑 𝑓 𝑑 (s t u f f), w i t h t h e s a m e s t u f f i n s i d e] \times 𝑑 𝑑 𝑡 (s t u f f)

𝑓 (𝑡) = 𝐴 c o s (s t u f f); (s t u f f) = 𝑏 𝑡 T h e n 𝑓 (𝑡) = 𝑑 𝑓 𝑑 𝑡 = - 𝐴 s i n (s t u f f) \cdot 𝑑 𝑑 𝑡 (𝑏 𝑡) = - 𝐴 s i n (𝑏 𝑡) \cdot 𝑏 = - 𝐴 𝑏 s i n (𝑏 𝑡) ✓

Note: You'd never actually write "stuff = ...." Instead just hold in your head what that "stuff" is, and proceed to write down the required derivatives. We'll make this more formal immediately below this Example.

What's chained in the Chain Rule?

Let's use the preceding Example to both explain why the Chain Rule has the name it does, and also to justify our use of "stuff" in quickly reasoning our way through finding the derivative of even the most complex functions.

First, the "Chain Rule" has the name it does because compositions of functions can be thought of as "chains" of functions, and the Chain Rule provides the means to differentiate these functions.

Consider the example from immediately above. The function $𝑓 (𝑡) = 𝐴 c o s (𝑏 𝑡)$ can be thought of as a chain of functions:

Illustrating the chain rule, links in a chain for the function f(t) = A cos(bt). First link inside text: input something. Below that link is the input letter t. An arrow points to the second link, which has text inside: multiply it by b. Text beneath that link says f_1(t) = bt. An arrow then points to the third chain link, with text inside: take the cosine of that, and then multiply by A. Text beneath the link says f_2(u) = A cos(u).

Images of nested dolls, from Wikipedia Commons.

The third, final link in this chain is the most "outside" procedure we apply, whereas the first link is the inner-most piece of the function. We can write the entire procedure we are using with function and box notation as $𝑓 (◻) = 𝑓 2 (𝑓 1 (◻))$ where $𝑓 1 (◻) = 𝑏 ◻,$ and $𝑓 2 (◻) = 𝐴 c o s (◻) .$ We typically describe compound functions like this as "nested" — reminiscent of nesting dolls known as Matryoshka - with one function inside another (potentially inside another, inside another ...).

To differentiate this chain, we start from the outside and work our way inward until we hit something that can be considered a function. We'll use a downward pointing arrow to indicate our focus as we work our way along: $𝑓 (𝑡) = ↓ 𝐴 c o s (𝑏 𝑡)$ We don't have to worry about the constant multiple A on the very outside here because of the derivative property $(𝑘 𝑓 (𝑥))' = 𝑘 𝑓' (𝑥)$ . That is, we know that A will simply appear in front of the derivative in the same way it appears in front of the original function here.

So let's move the arrow to the right. $𝑓 (𝑡) = 𝐴 ↓ c o s (𝑏 𝑡)$ Ah, now we are looking at something where taking the derivative actually changes the function: cosine changes to -sine when we take the derivative. So, let's imagine covering the stuff up on the inside of this function so we can apply the derivative rules that we know. In the following equations we literally use a gray box to "cover up" what's inside, as you can imagine doing yourself as you proceed along. If you tap each box you'll reveal what's underneath, and then tap again to hide it. But the key point is that for the moment what's underneath doesn't matter and so we can leave it covered-up: $𝐴cos′⁡(\toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑−𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦◻𝑏𝑡\endtoggle)=−𝐴sin⁡(\toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑−𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦◻𝑏𝑡\endtoggle)×…$ Here's a crucial point: whether we use $\toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑−𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦◻𝑏𝑡\endtoggle$ or call the inner function g, or hold it in our minds as "stuff," according to the Chain Rule we first take the initial derivative of the outer function with respect to that inner function. You can view the process in any of the following ways: $𝑓(\toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑−𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦◻𝑏𝑡\endtoggle)=𝐴cos⁡(\toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑−𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦◻𝑏𝑡\endtoggle)⟹𝑓′(\toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑−𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦◻𝑏𝑡\endtoggle)=𝑑𝑓𝑑\toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑−𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦◻𝑏𝑡\endtoggle=−𝐴sin⁡(\toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑−𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦◻𝑏𝑡\endtoggle)𝑓(𝑔(𝑡))=𝐴cos⁡(𝑔(𝑡))⟹[𝑓(𝑔(𝑡))]′=𝑑𝑓𝑑𝑔=−𝐴sin⁡(𝑔(𝑡))𝑓(stuff)=𝐴cos⁡(stuff)⟹[𝑓(stuff)]′=𝑑𝑓𝑑(stuff)=−𝐴sin⁡(stuff)$ That is, the first term of the Chain Rule is always the derivative of the outside function with respect to whatever is inside it, no matter what you call it. That's why we can "get away with" just holding it as stuff (or however you want to think about it) in our heads. And the argument of this outer function remains unchanged: $\toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑−𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦◻𝑏𝑡\endtoggle,$ or g(t), or "stuff" remains unchanged and just gets plugged back in after we take this first derivative.

The crux of the Chain Rule is what happens next: now we must multiply this first Chain Rule term by the derivative of the inside function, meaning the stuff we covered up with $\toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑−𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦◻𝑏𝑡\endtoggle$ . In this case we covered up $𝑏 𝑡 .$ So we just continue using our arrow method, now looking at $𝑏 𝑡$ : $\toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑−𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦◻𝑏𝑡\endtoggle=↓𝑏𝑡$ This b is just a constant, so it just carries along and we can move the arrow over: $\toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑−𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦◻𝑏𝑡\endtoggle=𝑏↓𝑡$ Now when we take the derivative of t with respect to t, we just get 1. So the derivative of our inside "stuff" covered up by the box is just b. So, applying this Chain Rule term: $𝑓′(𝑡)=−𝐴sin⁡(\toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑−𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦◻𝑏𝑡\endtoggle)⋅𝑏$ We're not quite done, since the original function didn't have a $\toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑−𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦◻𝑏𝑡\endtoggle$ in it and so we need to uncover that: $\toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑−𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦◻𝑏𝑡\endtoggle =𝑏𝑡.$ $𝑓' (𝑡) = - 𝐴 𝑏 s i n (𝑏 𝑡) ✓$ Whichever approach you use to apply the Chain Rule, you must get the same result. The arrow / box covering method starts to come in handy when nesting of functions are deep, since as you'll see you just work your way down the chain. With enough practice, you'll be able to differentiate quite complex-looking functions with just one or two lines of work. (We'll get to that in a few screens.)

$Tip icon$

Remember the
Chain Rule term!

For most beginning students, the most common error on exams is to forget to multiply by the derivative of the inner function.

Indeed, at this very moment all over the world teachers and tutors are saying "Chain Rule!" to beginning students who've forgotten this term. (It's also why, right before an exam, many students will write "Chain Rule!" on one hand to remind themselves to include this factor.) The "naive calculations" at the start of the preceding page illustrate this very error. With practice, you'll start catching yourself when you make it, and quickly remember to multiply by this "missing" Chain Rule factor.

Quick practice, and what's to come

On the next screen we'll provide you with lots of basic practice with the Chain Rule. Since the rest of the course will depend on your ability to quickly compute correct derivatives, practicing now (and making as many errors as you need to, and you will almost certainly initially make some) is super-important. On the screen after that, we'll address more complex problems: we simply extend the chain, and just keep going as we did above to find the derivative of quite complicated-looking functions.

To end this screen, let's do some quick work on your ability to immediately notice where a Chain-Rule term is missing. We'll treat similar problems in more depth on the next screen; we mean these to be fast, just so you can get used to basic usage of the Chain Rule.

CHECK QUESTION 1: $𝑓 (𝑥) = s i n (5 𝑥)$

CHECK QUESTION 2: $𝑓 (𝑡) = 𝑒 𝑡 5$

CHECK QUESTION 3: $𝑔 (𝑤) = (𝑤 3 - 6 𝑤 2 + 5 𝑤) 9$

The Upshot

A compound (or composite) function is comprised of an outer function and an inner function.
When we take the derivative of a compound function, we must use the Chain Rule.
In Prime notation: $[𝑓 (𝑔 (𝑥))]' = 𝑓' (𝑔 (𝑥)) \cdot 𝑔' (𝑥) = [d e r i v a t i v e o f t h e o u t e r f u n c t i o n, e v a l u a t e d a t t h e i n n e r f u n c t i o n] \times [d e r i v a t i v e o f t h e i n n e r f u n c t i o n]$ In Leibniz notation: $𝑑 𝑦 𝑑 𝑡 = 𝑑 𝑦 𝑑 𝑢 \cdot 𝑑 𝑢 𝑑 𝑡$ In alternate notation: $𝑑 𝑑 𝑥 (𝑓 \circ 𝑔) (𝑥) = 𝑑 𝑓 𝑑 𝑔 \cdot 𝑑 𝑔 𝑑 𝑥$ And informally, the way you may quickly come to think about it: $𝑑 𝑓 𝑑 𝑥 = [𝑑 𝑓 𝑑 (s t u f f), w i t h t h e s a m e s t u f f i n s i d e] \times 𝑑 𝑑 𝑥 (s t u f f)$

On the next screen, you'll get lots of practice with basic Chain Rule problems before we move on to more complex ones.

Questions or comments about what's on this screen, or any other Calculus questions? Visit the Forum and we'd love to help!