C.2 The Chain Rule

On the preceding screen we reviewed compound functions and developed a conceptual understanding of the Chain Rule using the ascending balloon scenario. In particular, we saw that to find the rate-of-change of the overall function 𝑇(𝑡) =𝑇(𝐸(𝑡)) with respect to the inner variable t, we simply multiplied the rate-of-change of the outer function 𝑇(𝐸) by the rate-of-change of the inner function 𝐸(𝑡). That straightforward multiplication is the Chain Rule.

Let's formalize that process, and then we'll work some initial problems to see how easy the Chain Rule is to use.

Formal Statement of the Chain Rule

Here is a formal statement of the Chain Rule:

Chain Rule, with prime notation:

For two functions 𝑓(𝑥) and 𝑔(𝑥), if 𝑔 is differentiable at 𝑥 and 𝑓 is differentiable at 𝑔(𝑥), then

[𝑓(𝑔(𝑥))]=𝑓(𝑔(𝑥))𝑔(𝑥)=[derivative of the outer function, evaluated at the inner function]×[derivative of the inner function]

The Chain Rule is often written with Leibniz notation instead, because its easy to remember and is one case where thinking of the derivative as a fraction actually comes in handy.

Chain Rule, with Leibniz notation:

For two differentiable functions 𝑦(𝑥) and 𝑢(𝑥),

𝑑𝑦𝑑𝑥=𝑑𝑦𝑑𝑢𝑑𝑢𝑑𝑥

Using the alternate notation (𝑓 𝑔)(𝑥) for the compound function 𝑓(𝑔(𝑥)), the Chain Rule is

Chain Rule, alternate notation:

𝑑𝑑𝑥(𝑓𝑔)(𝑥)=𝑑𝑓𝑑𝑔𝑑𝑔𝑑𝑥

If we think of the derivative as a fraction of vanishing quantities (as Leibniz did), then the statement of the Chain Rule seems almost obvious. As you can see, when we multiply two fractions that share a common factor in the numerator and denominator (𝑑𝑔), then that factor "cancels" and we are left with

𝑑𝑓𝑑𝑔𝑑𝑔𝑑𝑥=𝑑𝑓𝑑𝑥

This is probably what you did in your head when you considered the ascending balloon scenario on the preceding screen: 𝑑𝑇𝑑𝑡 =𝑑𝑇𝑑𝐸 𝑑𝐸𝑑𝑡. As we noted above, the units cancel correctly, as they must, giving us the result we were after for the rate of change of temperature T with respect to time t: 𝑑𝑇𝑑𝑡 =(0.004ms)(0.01Fm) = 0.00004Fs.

WARNING: While for first derivatives we can think of differentials as canceling in this way, we cannot extend this reasoning to second- and other higher-order derivatives. We also cannot apply other properties of fractions to derivatives. Indeed, even this notion of canceling these differentials was quite controversial until the 1960s — quite late for the development of Calculus since Newton and Leibnitz were alive in the 1600s.

Using the Chain Rule

Enough with the abstract; let's get to some Examples to show how we use the Chain Rule routinely in practice. We'll begin by resolving the first quick examples we introduced at the top of the preceding page to illustrate why we need the Chain Rule at all.

Chain Rule Example #1: 𝑓(𝑥) =(2𝑥)3

Use the Chain Rule to differentiate 𝑓(𝑥) =(2𝑥)3.

Note: As we saw above without using the Chain Rule, since 𝑓(𝑥) =8𝑥3 we know immediately that the answer is 𝑓(𝑥) =3 8𝑥31 =24𝑥2.

We'll solve this using three different approaches — but we encourage you to become comfortable with the third approach as quickly as possible, because that's the one you'll use to compute derivatives quickly as the course progresses.

• Solution 1.

Let's use the first form of the Chain rule above:

[𝑓(𝑔(𝑥))]=𝑓(𝑔(𝑥))𝑔(𝑥)=[derivative of the outer function, evaluated at the inner function] × [derivative of the inner function]

We have the outer function 𝑓(𝑢) =𝑢3, and the inner function 𝑢 =𝑔(𝑥) =2𝑥.

Then 𝑓(𝑢) =3𝑢2, and 𝑔(𝑥) =2. ( Notice that factor of 2!)

Hence

𝑓(𝑥)=3𝑢22=3(2𝑥)22=3(4𝑥2)2=24𝑥2

Ah: our naive approach at the start of the preceding page was missing that very factor of 2 that comes from the Chain Rule as the derivative of the inner function. As we said, the Chain Rule makes this all work easily.

• Solution 2.

Let's use the second form of the Chain rule above:

𝑑𝑦𝑑𝑥=𝑑𝑦𝑑𝑢𝑑𝑢𝑑𝑥

We have 𝑦 =𝑢3 and 𝑢 =2𝑥.

Then 𝑑𝑦𝑑𝑢 =3𝑢2, and 𝑑𝑢𝑑𝑥 =2 (again, there's that factor of 2!). Hence

𝑑𝑦𝑑𝑥=3𝑢22=3(2𝑥)22=3(4𝑥2)2=24𝑥2

• Solution 3.

With some experience, you won't introduce a new variable like 𝑢 = as we did in Solutions 1 and 2. Instead, you'll think something like: "The function is some stuff to the 3rd power. So the derivative is 3 times that same stuff to the 2nd power, times the derivative of that stuff."

𝑑𝑓𝑑𝑥=[𝑑𝑓𝑑(stuff), with the same stuff inside]×𝑑𝑑𝑥(stuff) 𝑓(𝑥)=(stuff)3;stuff=2𝑥Then𝑓(𝑥)=𝑑𝑓𝑑𝑥=3(stuff)2(𝑑𝑑𝑥(2𝑥))=3(2𝑥)22=24𝑥2

Note: You'd never actually write "stuff = ...." Instead just hold in your head what that "stuff" is, and proceed to write down the required derivatives.

Solution.

Let's consider next one of the functions from our Compound Functions Example 1 on the preceding screen.

Chain Rule Example #2: 𝑓(𝑡) =𝐴cos(𝑏𝑡)

Given 𝑓(𝑡) =𝐴cos(𝑏𝑡), find 𝑑𝑓𝑑𝑡.

Solution.

We'll again solve this using three different approaches, and again encourage you to become comfortable with the third approach as quickly as possible.

• Solution 1.

Let's use the first form of the Chain rule above:

[𝑓(𝑔(𝑥))]=𝑓(𝑔(𝑥))𝑔(𝑥)=[derivative of the outer function, evaluated at the inner function] × [derivative of the inner function]

In Compound Functions Example 1, we recast this function as the composition 𝑓(𝑔(𝑡)) where the outer function 𝑓(𝑢) =𝐴cos(𝑢) and the inside function 𝑢 =𝑔(𝑡) =𝑏𝑡.

Then 𝑓(𝑢) = 𝐴sin(𝑢) and 𝑔(𝑡) =𝑏.

Hence

𝑑𝑓𝑑𝑡=[𝑓(𝑔(𝑡))]=𝐴sin(𝑢)𝑏=𝐴𝑏sin(𝑏𝑡)

• Solution 2.

Let's use the second form of the Chain rule above:

𝑑𝑦𝑑𝑡=𝑑𝑦𝑑𝑢𝑑𝑢𝑑𝑡

We have 𝑦 =𝐴cos(𝑢) and 𝑢 =𝑏𝑡.

Then 𝑑𝑦𝑑𝑢 = 𝐴sin(𝑢) and 𝑑𝑢𝑑𝑡 =𝑏. Hence

𝑑𝑦𝑑𝑡=𝐴sin(𝑢)𝑏=𝐴𝑏sin(𝑏𝑡)

• Solution 3.

With some experience, you won't introduce a new variable like 𝑢 = as we did above. Instead, you'll think something like: "The function is 𝐴cos(some stuff). The derivative is thus 𝐴sin(that same stuff), times the derivative of that stuff."

𝑑𝑓𝑑𝑡=[𝑑𝑓𝑑(stuff), with the same stuff inside]×𝑑𝑑𝑡(stuff) 𝑓(𝑡)=𝐴cos(stuff);(stuff)=𝑏𝑡Then𝑓(𝑡)=𝑑𝑓𝑑𝑡=𝐴sin(stuff)𝑑𝑑𝑡(𝑏𝑡)=𝐴sin(𝑏𝑡)𝑏=𝐴𝑏sin(𝑏𝑡)

Note: You'd never actually write "stuff = ...." Instead just hold in your head what that "stuff" is, and proceed to write down the required derivatives. We'll make this more formal immediately below this Example.

What's chained in the Chain Rule?

Let's use the preceding Example to both explain why the Chain Rule has the name it does, and also to justify our use of "stuff" in quickly reasoning our way through finding the derivative of even the most complex functions.

First, the "Chain Rule" has the name it does because compositions of functions can be thought of as "chains" of functions, and the Chain Rule provides the means to differentiate these functions.

Consider the example from immediately above. The function 𝑓(𝑡) =𝐴cos(𝑏𝑡) can be thought of as a chain of functions:

Illustrating the chain rule, links in a chain for the function f(t) = A cos(bt). First link inside text: input something. Below that link is the input letter t. An arrow points to the second link, which has text inside: multiply it by b. Text beneath that link says f_1(t) = bt. An arrow then points to the third chain link, with text inside: take the cosine of that, and then multiply by A. Text beneath the link says f_2(u) = A cos(u). Images of nested dolls, from Wikipedia Commons.

The third, final link in this chain is the most "outside" procedure we apply, whereas the first link is the inner-most piece of the function. We can write the entire procedure we are using with function and box notation as 𝑓() =𝑓2(𝑓1()) where 𝑓1() =𝑏, and 𝑓2() =𝐴cos(). We typically describe compound functions like this as "nested" — reminiscent of nesting dolls known as Matryoshka - with one function inside another (potentially inside another, inside another ...).

To differentiate this chain, we start from the outside and work our way inward until we hit something that can be considered a function. We'll use a downward pointing arrow to indicate our focus as we work our way along: 𝑓(𝑡)=𝐴cos(𝑏𝑡) We don't have to worry about the constant multiple A on the very outside here because of the derivative property (𝑘𝑓(𝑥)) =𝑘𝑓(𝑥). That is, we know that A will simply appear in front of the derivative in the same way it appears in front of the original function here.

So let's move the arrow to the right. 𝑓(𝑡)=𝐴cos(𝑏𝑡) Ah, now we are looking at something where taking the derivative actually changes the function: cosine changes to -sine when we take the derivative. So, let's imagine covering the stuff up on the inside of this function so we can apply the derivative rules that we know. In the following equations we literally use a gray box to "cover up" what's inside, as you can imagine doing yourself as you proceed along. If you tap each box you'll reveal what's underneath, and then tap again to hide it. But the key point is that for the moment what's underneath doesn't matter and so we can leave it covered-up: 𝐴cos(\toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦𝑏𝑡\endtoggle)=𝐴sin(\toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦𝑏𝑡\endtoggle)× Here's a crucial point: whether we use \toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦𝑏𝑡\endtoggle or call the inner function g, or hold it in our minds as "stuff," according to the Chain Rule we first take the initial derivative of the outer function with respect to that inner function. You can view the process in any of the following ways: 𝑓(\toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦𝑏𝑡\endtoggle)=𝐴cos(\toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦𝑏𝑡\endtoggle)𝑓(\toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦𝑏𝑡\endtoggle)=𝑑𝑓𝑑\toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦𝑏𝑡\endtoggle=𝐴sin(\toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦𝑏𝑡\endtoggle)𝑓(𝑔(𝑡))=𝐴cos(𝑔(𝑡))[𝑓(𝑔(𝑡))]=𝑑𝑓𝑑𝑔=𝐴sin(𝑔(𝑡))𝑓(stuff)=𝐴cos(stuff)[𝑓(stuff)]=𝑑𝑓𝑑(stuff)=𝐴sin(stuff) That is, the first term of the Chain Rule is always the derivative of the outside function with respect to whatever is inside it, no matter what you call it. That's why we can "get away with" just holding it as stuff (or however you want to think about it) in our heads. And the argument of this outer function remains unchanged: \toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦𝑏𝑡\endtoggle, or g(t), or "stuff" remains unchanged and just gets plugged back in after we take this first derivative.

The crux of the Chain Rule is what happens next: now we must multiply this first Chain Rule term by the derivative of the inside function, meaning the stuff we covered up with \toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦𝑏𝑡\endtoggle. In this case we covered up 𝑏𝑡. So we just continue using our arrow method, now looking at 𝑏𝑡: \toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦𝑏𝑡\endtoggle=𝑏𝑡 This b is just a constant, so it just carries along and we can move the arrow over: \toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦𝑏𝑡\endtoggle=𝑏𝑡 Now when we take the derivative of t with respect to t, we just get 1. So the derivative of our inside "stuff" covered up by the box is just b. So, applying this Chain Rule term: 𝑓(𝑡)=𝐴sin(\toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦𝑏𝑡\endtoggle)𝑏 We're not quite done, since the original function didn't have a \toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦𝑏𝑡\endtoggle in it and so we need to uncover that: \toggle\style𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑𝑐𝑜𝑙𝑜𝑟:𝑔𝑟𝑎𝑦𝑏𝑡\endtoggle =𝑏𝑡. 𝑓(𝑡)=𝐴𝑏sin(𝑏𝑡) Whichever approach you use to apply the Chain Rule, you must get the same result. The arrow / box covering method starts to come in handy when nesting of functions are deep, since as you'll see you just work your way down the chain. With enough practice, you'll be able to differentiate quite complex-looking functions with just one or two lines of work. (We'll get to that in a few screens.)

Tip icon

Remember the
Chain Rule term!

For most beginning students, the most common error on exams is to forget to multiply by the derivative of the inner function.

Indeed, at this very moment all over the world teachers and tutors are saying "Chain Rule!" to beginning students who've forgotten this term. (It's also why, right before an exam, many students will write "Chain Rule!" on one hand to remind themselves to include this factor.) The "naive calculations" at the start of the preceding page illustrate this very error. With practice, you'll start catching yourself when you make it, and quickly remember to multiply by this "missing" Chain Rule factor.

Quick practice, and what's to come

On the next screen we'll provide you with lots of basic practice with the Chain Rule. Since the rest of the course will depend on your ability to quickly compute correct derivatives, practicing now (and making as many errors as you need to, and you will almost certainly initially make some) is super-important. On the screen after that, we'll address more complex problems: we simply extend the chain, and just keep going as we did above to find the derivative of quite complicated-looking functions.

To end this screen, let's do some quick work on your ability to immediately notice where a Chain-Rule term is missing. We'll treat similar problems in more depth on the next screen; we mean these to be fast, just so you can get used to basic usage of the Chain Rule.

CHECK QUESTION 1: 𝑓(𝑥) =sin(5𝑥)

CHECK QUESTION 2: 𝑓(𝑡) =𝑒𝑡5

CHECK QUESTION 3: 𝑔(𝑤) =(𝑤36𝑤2+5𝑤)9

The Upshot

  1. A compound (or composite) function is comprised of an outer function and an inner function.
  2. When we take the derivative of a compound function, we must use the Chain Rule.
    In Prime notation: [𝑓(𝑔(𝑥))]=𝑓(𝑔(𝑥))𝑔(𝑥)=[derivative of the outer function, evaluated at the inner function] × [derivative of the inner function] In Leibniz notation: 𝑑𝑦𝑑𝑡=𝑑𝑦𝑑𝑢𝑑𝑢𝑑𝑡 In alternate notation: 𝑑𝑑𝑥(𝑓𝑔)(𝑥)=𝑑𝑓𝑑𝑔𝑑𝑔𝑑𝑥 And informally, the way you may quickly come to think about it: 𝑑𝑓𝑑𝑥=[𝑑𝑓𝑑(stuff), with the same stuff inside]×𝑑𝑑𝑥(stuff)

On the next screen, you'll get lots of practice with basic Chain Rule problems before we move on to more complex ones.

Questions or comments about what's on this screen, or any other Calculus questions? Visit the Forum and we'd love to help!