Why the gradient is the direction of steepest ascent

Рет қаралды 319,590

8 жыл бұрын

The way we compute the gradient seems unrelated to its interpretation as the direction of steepest ascent. Here you can see how the two relate.
About Khan Academy: Khan Academy offers practice exercises, instructional videos, and a personalized learning dashboard that empower learners to study at their own pace in and outside of the classroom. We tackle math, science, computer programming, history, art history, economics, and more. Our math missions guide learners from kindergarten to calculus using state-of-the-art, adaptive technology that identifies strengths and learning gaps. We've also partnered with institutions like NASA, The Museum of Modern Art, The California Academy of Sciences, and MIT to offer specialized content.
For free. For everyone. Forever. #YouCanLearnAnything
Subscribe to KhanAcademy: kzfaq.info_...

Пікірлер: 186

@stevemanus6740 4 жыл бұрын

I was still struggling with the intuition of this and I think I have come up with another simple way to conceptualize the gradient and the characteristic of steepest ascent. Start by remembering that the gradient is composed of the partial derivatives of the function in question. If you think about each partial derivative as just a simple rise-over-run problem, then you can see clearly that each partial derivative is going to give you the amount of change to the output (rise) as the input (run) is increased. Let's consider Grant's 3-dimensional example, so we can say that inputs for the multivariable problem are x and y and the output is z. Because the slopes vary based on location on the x-y grid, we need to pick a starting point on the grid. Let's just say (1,1). It doesn't matter. Now let's look at the x-z 2-dimensional problem first. Let's say the partial derivative tells us that at point (1,1) for each 1 unit increase of x, z is increased by 4 units (i.e. the derivative = 4x). Since x can move in only 1 dimension, the only choice of direction we have is whether the change in x is positive or negative. Obviously, if we move x by -1, then z will decrease by 4 units. So, if we need to choose in which direction we move x to increase z, we know that it is in the positive direction. If we decrease x, z will decrease as well. Now, do the same thing for y and z and let's say that the partial derivative for y at (1,1) is 3y. This means that a 1 unit increase in y will result in a 3 unit increase in z. Again, if you want to increase z by moving y, increase y, don't decrease it. Now, let's put the two variables together. We now have a choice of directions. It's no longer sufficient to say that we need to increase x and we need to increase y. (Though that is half the battle). We need to also decide the relative value of increasing x versus increasing y. When we choose a direction we are making a tradeoff between the relative movements in each of the basis (x, y) directions. Let's say that we are allowed to move in any direction by 5 units (if you haven't noticed yet, I like to keep my Pythagorean theorem problems simple!). The question is - "in what direction can I move 5 units to maximize the increase in z?" This would correspond to the direction of steepest ascent. So, let's say we use all of our 5 units in the x direction. This corresponds to the vector [5,0]. Since the increase in z is 4x + 3y, the total increase in z will be 20 (5∙4) . If, on the other hand, we use all of our 5 units in the y direction [0, 5], the total increase in z will be 15 (5∙3). But the beauty of using vector geometry is that we can use our 5 units in a direction that will give us effectively a movement in the x direction of 4 and in the y direction of 3. We get 7 units of movement for the price of 5! Of course, that's the hypotenuse of our 4 by 3 right triangle. So, by following the vector [4, 3], z is increased by 4∙4 from the x direction movement and by 3∙3 from the y direction movement for a total increase of 25! I think you can see that any other direction will produce a smaller increase. And, of course, the vector [4, 3] is exactly our gradient! It's interesting to also think about when one of the components of the gradient is negative. Let's imagine that instead of 3y, our partial y derivative is -3y. This means that a positive movement in y produces a decrease in z. Remembering that we can only move y in 1 dimension, if we move y downward then of course z will increase. So you can see that the gradient vector [4, -3] will produce the same 25 unit increase in z for a 5 unit move IF we move y in the negative (downward) direction. Just follow the vector! Of course this is calculus, so these 3, 4, and 5 unit moves are very, very small. 😊

@rafaemuhammadabdullah6904 4 жыл бұрын

why 4*4? 5*4+ 5*3 = 35, So why not 5? (Since 5 is limit)

@electric_sand 4 жыл бұрын

Your explanation is awesome Steve

@ahmadizzuddin 4 жыл бұрын

@@rafaemuhammadabdullah6904 I think what Steve meant is five as in the size of a vector v=[a, b]. So [4, 3].[a, b] where ||v||=sqrt(a^2 + b^2) = 5. In this case 4a +3b

@ujjwal2912 3 жыл бұрын

BAM !!!

@rajinish0 3 жыл бұрын

you could make it easy by starting with 2 dimensional calculus: A positive slope would tell you to move in the right direction to ascend whereas a negative slope would tell you to move in the left direction to ascend. Now in 3 dimensions, partial of x gives you a value for the steepest ascent in x direction, same for y. So you can say generally the steepest ascent = partial(x)i + partial(y)j where i and j are the unit vectors in x and y directions.

@alvapeng1474 5 жыл бұрын

the gradient is not just a vector, it's a vector that loves to be dotted with other things. /best

@coopercarr9407 3 жыл бұрын

bruh that sentence had me laughing, what a thought lol

@instinct736 3 жыл бұрын

@@coopercarr9407 😀

@Olivia-by2vm 2 жыл бұрын

just like EVERY vector!!!!

@dalisabe62 2 жыл бұрын

@@coopercarr9407 ya that was pretty intriguing expression.

@priyankkharat7407 5 жыл бұрын

Thank you so much Grant! Simplicity is the most difficult thing to acheive.

@sahilbabbar8859 4 жыл бұрын

Grant is amazing dude

@simplepuppy 3 ай бұрын

"simplicity is difficult to achieve" i'll remember this quote

@jacobvandijk6525 5 жыл бұрын

You can't climb a mountain f(x,y) in the fastest way possible by moving just in the x- or the y-direction (using partial derivatives). You most of the time have to go in a direction that's a combination of the x- and the y-direction (using directional derivatives)!

@TranquilKr 8 жыл бұрын

Beautiful! Didn't think of it that way. Thanks a lot!

@pratibhas2468 Жыл бұрын

Lucky that I found these intuitive explanations.. it truly feels great when you understand what's actually going on when we use a formula

@arijitdas4504 3 жыл бұрын

Learning this concept was no less than a sense of accomplishment itself! Grant is Grand! Cheers!

@brandonquintanilla411 6 жыл бұрын

Since I´ve heard the voice of 3B1B I knowed that this is going to be a great video

@ahmadizzuddin 4 жыл бұрын

My takeaway from this is to try reduce it to one dimension to understand what each element is doing to increase the "steepness" of a gradient. Say *f(x)=-x^3* then *df/dy=-3x^2*. Since the constant of this example derivative is negative, that means *x* needs to move in the negative direction of the number line to increase the output of *f(x)*. After you know what direction of the number line to go towards for each element, the magnitude you move for each element is proportional to the size of the element compared to the whole gradient vector. Anyways thanks, great explanation Grant :)

@allyourcode 3 жыл бұрын

Thanks! Here is how I would very concisely explain it: The problem of finding the direction of steepest ascent is exactly the problem of maximizing the directional derivative. The directional derivative is a dot product. When you are trying to maximize a dot product, choose the direction to make it parallel to the other vector, Since in this case, the other vector is given to be gradient(f)(v), THAT IS the direction of steepest ascent. For me, the basic intuition comes from the dot product. The part that is not so obvious to me is that gradient(f) dot v is the "right" formula for the (definition of) the directional derivative.

@poojaagarwal2000 2 жыл бұрын

Thanks for this

@andresnamm982 8 ай бұрын

You are awesome!

@leonhardolaye-felix8811 Жыл бұрын

For anyone confused, This is how I see it, if it helps at all: If you’re at a point (a, b) then the directional derivative at that point looking in the direction v (where ||v||=1), is given by ∇f(a, b) • v. When we say,”What is the direction of steepest ascent?”, what we are really asking is,”In what direction do I move in to produce the largest directional derivative?” In other words, we want to maximise ∇f(a, b) • v. Given that we are at a single point, we can then say that ∇f(a, b) is a constant since it is evaluated by only using the particular values a and b. v is the only variable here and so the only way to maximise ∇f(a, b) • v is to alter the vector v. We know that the dot product of 2 vectors is maximised when they are pointing in the same direction as each other (see proof of this at bottom). Using this and the fact that we are not varying ∇f(a, b), we can conclude that to maximise the directional derivative (given by ∇f(a, b) • v) we must vary the vector v so that it points in the same direction as ∇f(a, b). And that’s it - we’ve shown how when the vector v, is in the same direction as the gradient function ∇f(a, b), it’s output (the directional derivative) is maximised. Saying that v is in the same direction as ∇f(a, b) is to say that v = (1/k) × ∇f(a, b) where k is the magnitude of ∇f(a, b). This is because v is a unit vector as previously stated. These 2 vectors (v and ∇f(a, b)) are pointing in the same direction so the conclusion can be drawn that ∇f(a, b) also points in the direction of steepest ascent Proof to maximise dot product: Consider two vectors a and b. The angle between a and b is given by cosθ = (a • b) / (|a| × |b|) We can rearrange to say that a • b = |a||b|cosθ. To maximise the left hand side, which is what we want to prove, we must maximise cosθ. This achieves its maximum value at 1, which occurs when θ = 0. When θ = 0, we can visualise this by saying the vectors are parallel and overlapping each other. So we can conclude that a•b is maximised when a is parallel to, and overlapping b, Vice Versa.

@nafisaparveen9759 Жыл бұрын

loved ur explanation sir!

@leonhardolaye-felix8811 Жыл бұрын

@@nafisaparveen9759 happy to hear it ✌🏽

@blyatmanmarkeson708 6 жыл бұрын

This is so easy to follow! I love it.

@AbDmitry 4 жыл бұрын

Thanks a lot Grant! It was a great pleasure to see you here. I am a big fan of 3B1B.

@liabraga4641 7 жыл бұрын

Beautiful and elucidating

@wontpower 6 жыл бұрын

This helps so much, thank you!

@charusingh2159 3 жыл бұрын

I always wonder how Grant have developed such a great understanding of maths, he does magic with maths. !!!

@b_rz 2 жыл бұрын

Thanks man . That was the best explaination ever . Simple and sweet . I was very confused but you saved me :) Thanks

@gustavomello2207 8 жыл бұрын

Amazing video. Matchs perfectly with your Linear Algebra series.

@iustinianconstantinescu5498 7 жыл бұрын

Gustavo Mello matches* Great video!!!

@desavera 3 жыл бұрын

Excellent exposition ... thanks a lot !

@senri- 8 жыл бұрын

Great video helps a lot thanks :)

@NzeDede 2 жыл бұрын

It's like my mind just got illuminated!! I've always underestimated the power of the Del operator. Not only is this operator showing you the slope of a scalar field in the direction of a vector, it also points at the direction of the unit vector with the max gradient, and also tells you the size of the slope of this unit vector with a max value. It's crazy how this new revelation changes your understanding of vector calculus. Thanks a lot 🙏🏽🙏🏽🙏🏽🙏🏽🙏🏽

@jatinsaini7790 3 жыл бұрын

The best explanation of a gradient on Internet!

@mireazma 7 жыл бұрын

I'd like to add my two cents on this, as I couldn't relate some things at the beginning but after some reflection, I figured them out: 1. The gradient is the direction of the steepest ascent because the gradient encompasses all of the possible d (change) for the function. This is how: - The ubiquitous 1 dimension input derivative - the ordinary regular derivative - means the change of the function (in its entirety that is, the maximum possible change); - The gradient "owns" the derivatives on all possible "angles". It suffices to have snapshots of the d only from all orthogonal directions (2 in our case). As a note, the dot product is known for measuring how much of a vector, another vector is (roughly speaking). So dotting the vector of the directional derivative with the gradient is merely how much of the gradient - the entire change - the said vector is. And of course, to get a maximum you want to dot two parallel vectors. 2. Question: is the gradient with partials of x and y, the only possible vector to have the steepest ascent direction? Well I thought why not make one by taking any 2 orthogonal vectors (on xy plane) and get the directional derivatives of these. The two resulted derivatives can be the components of another gradient. I feel I'm missing something here but I'll get to its bottom.

@shiphorns 7 жыл бұрын

If you do as you propose, and take any 2 orthogonal vectors int he XY plane, it is just a change of basis. If they are unit vectors, you're just rotating your reference frame around the Z axis. Your new vector of two directional derivative coefficients will be the same vector as before, just represented in your new basis. You don't need orthogonal vectors either, they can be linearly dependent, they just can't be colinear as they need to span the XY plane.

@Raikaska 7 жыл бұрын

Wow, i still don't get it but I think you people's comments should be included in the video. I often think about taking derivatives in two ortogonal directions, but thing is, the function's output is determined by "x" and "y", that is, already the two directions in which the gradient loos at...

@dereksmetzer2039 6 жыл бұрын

Adam Smith little late to the party, but i think you mean you'd need 2 linearly independent vectors. any two vectors which span a plane would suffice and would be necessarily linearly independent. choosing an orthogonal basis just makes the computations prettier.

@dereksmetzer2039 6 жыл бұрын

additionally, if you visualize the gradient as vectors on a contour map of the function, perpendicular vectors to the contour lines are oriented in the direction of greatest increase - i.e., small 'nudges' along these directions result in the largest changes in the function. vectors which are orthogonal to the gradient vectors point along contour lines, thus a change in this direction keeps the function at a constant value and therefore the gradient along these lines is zero.

@daniloespinozapino4865 3 жыл бұрын

that last explanation kinda blew my mind a bit, nice!

@grinfacelaxu 2 ай бұрын

ThankYou!

@Postermaestro 6 жыл бұрын

so good!

@trivialstuff2384 5 жыл бұрын

Thank you

@niroshas1790 6 жыл бұрын

i liked it. I would request put some lecturers on riemann stieltjes integral and the difference between it and Riemann integral

@scholar-mj3om 6 ай бұрын

Marvellous💯

@foerfoer 5 жыл бұрын

Honestly, thank you

@Niharika-uz6xl 5 жыл бұрын

SIMPLY AWESOME.

@poiuwnwang7109 4 жыл бұрын

Derivation of del in direcion of del that is equal to magnitude of del gives a lot of intuition. Nice!

@mohdzikrya5396 Жыл бұрын

Thanks

@muratcan__22 5 жыл бұрын

most critical video in understanding the gradient's relation to the steepest ascent.

@meghan______669 6 ай бұрын

I’m still processing everything (I’m not going to ace an exam any time soon) but I’m excited that I’ve been able to follow this logic. Thank you!

@tinku-n8n 7 ай бұрын

Thanks grant ❤

@chrismarklowitz1001 5 жыл бұрын

Think about the gradient in one dimension. It is the biggest roc since its the only roc in one direction. Think about the gradient in two dimensions. It combines the greatest roc if you could only go in the y and the greatest roc if you could only go in the x. To create a greatest roc overall.

@AA-tm3ew 5 жыл бұрын

great way to think about it

@dalisabe62 2 жыл бұрын

@steve manus, I like the way you broke this concept down almost like a problem of Lagrange multiplier, where we are asked to find the optimal value of some function f(x,y) subject to the constraint of another function g(x,y) in two dimensions. Of course as you may know already or expect, the concept of the gradient is incorporated into the solution. It is typically the scenario that involves the balance between the independent variables so as to produce the maximum output for the function of such variables. Usually, the optimal value lies in between the extreme choices for the variables. Extreme X or extreme Y choices, as you noted, didn’t produce the maximum output for f(x,y). I was hoping that the video maker would stay away from the concept of directional derivative to explain the geometrical meaning of the gradient. In fact, I liked the mapping to a straight line explanation that he started with at the start of the video. I wished that he finished that up.

@avadhoothede8392 3 жыл бұрын

Great

@danieljaszczyszczykoeczews2616 3 жыл бұрын

thank you very much for a video!!! :D cheers from Ukraine

@joaquincastillo4824 4 жыл бұрын

I'm not nearly as advanced as you guys but I'm a little bit unsure about the logic here. If we let nabla_f= [a,b], then, there exists another vector "-nabla_f"= -[a,b] (direction of fastest descent) such that dot(-nabla_f,-nabla_f) = dot(nabla_f,nabla_f) = max(nabla_f,V), even though "-nabla_f" points in the exact opposite direction. Would it be possible that the condition "dot(nabla_f,nabla_f) = max(nabla_f,V)" is a necessary but NOT sufficient condition to prove that nabla_f is the direction of fastest descent?

@xoppa09 6 жыл бұрын

Great video. My only quibble is with the notation for the directional derivative. You have ∇_v f. I have seen the directional derivative written as D_v f, and ∂f/∂v, which seem to make sense. But the use of ∇_v f seems non standard and a bit confusing. How do we interpret ∇_v f? The "gradient in the direction of unit vector v" does not make sense, since the gradient is independent of v and is fixed for all intents and purposes.

@matheosxenakis8978 5 жыл бұрын

So if I'm understanding this correctly, the argument he makes after he draws in the gradient line is that the vector dotted with the gradient that gives the max value for the gradient is the vector that is parallel to the gradient itself. But doesn't this argument only work if we already take it as true that the gradient *is* actually already in the direction of max increase, so that a vector parallel to it is also in the direction of max increase? I still don't get why the gradient definition inherently points in the direction of max increase??

@JaSamZaljubljen 5 жыл бұрын

I'm on your side buddy

@BigNWide 5 жыл бұрын

The reasoning does feel circular.

@abdullahyasin9221 5 жыл бұрын

No, it's not circular. He does not assume in this argument that the gradient is in the direction of steepest ascent. The starting points or the premises of this argument is the definition of directional derivative and the definition of the dot product. The directional derivative is just the rate of change of the function in the direction considered. The is no concept of a maximum rate of change in the concept of directional derivative, unlike the gradient. In the concept of dot product, the dot product of two vectors is a maximum when they are parallel. Combine these two concepts and you can see a beautiful proof emerge! 🙂

@adityaprasad465 5 жыл бұрын

It helps to take a few steps back. Suppose I know that, for each unit I were to walk in the x direction, my function would increase by some amount x' (and for each unit of y, y'). Now suppose I *actually* walk *a* units in the x-dir and *b* units in the y-dir. How much does f increase? It increases by the weighted sum x'*a + y'*b. How do we find the (a, b) that maximizes this weighted sum (where (a, b) must be a unit vector -- no fair walking further in some direction than others)? One way is to notice that it's the dot product of vectors v=(x', y') and w=(a, b). We know that v dot w = |v||w|cos theta, and since v is fixed and |w|=1, this is maximized for theta=0 (so cos theta = 1).

@BigNWide 5 жыл бұрын

@@adityaprasad465 Yes, when two vectors point in the same direction, their dot product is maximized, but that's not the issue of concern. The issue is that this argument is being used to justify the gradient being the maximum of all possible vectors, which is an invalid argument.

@himouryassine 9 ай бұрын

Hello, can you please tell what is the programming material do you use to illustrate the functions?

@chainesanbuenaventura2874 7 жыл бұрын

Best video!

@zes7215 6 жыл бұрын

not, nonerx

@yashawasthi242 5 жыл бұрын

My question is if the function is differentiable shouldn't the change in function from all the direction should be same like when we do for complex analysis.

@BedrockBlocker 4 жыл бұрын

You just explained the Cauchy-Schwarz equation in the end, didn't you?

@marat61 7 жыл бұрын

How to extend this conclusion to complex space?

@farhanhyder7304 2 жыл бұрын

Thank you. It's been bothering me for a long time

@anonymoustraveller2254 6 жыл бұрын

Beauty man ! Beauty.

@frankzhang105 4 жыл бұрын

Thanks much, but i am still not clear why gradient direction can let the function f have the steepest change. How gradient relates to the steepest output change of function f? Thanks very much.

@danielyoo828 6 ай бұрын

Slope =/= Gradient There can be only one gradient (vector) that's mapped by a given point (a,b). So, the gradient is the same regardless of the applied vector at a given point. However, there can be multiple slopes (scalar) at a given point. The slope depends on the applied vector. We can slice the graph with a plane in the same direction of the applied vector, and we can do this in infinite ways, all resulting in a different slope value. Think of it as climbing a hill sideways (arbitrary applied vector) instead of directly up (following the gradient).

@Elektrolite111 Жыл бұрын

For me the easiest way is to think of a basic function f : R -> R The derivative of f at a point a tells you which direction to walk (left or right on the y axis) for the steepest ascent. This is the same thing for 2 dimensions

@andrei-un3yr 4 жыл бұрын

could you provide us a video explaining why a dot b = a x b * cos(a,b)? I understand it for geometric vectors, but it's unclear for me how this can be scaled to n-dimensional vectors.

@sriyansh1729 2 жыл бұрын

I think he made a video on his channel 3 blue 1 brown explaining this

@trendypie5375 4 жыл бұрын

i am wondering if a vector V is doted with any vector A other than gradient vector , it still give the max value if V is parallel to A . so still the video doesn't prove that gradient is the steepest ascent ,,correct me if i am wrong

@kaustubhpandey1395 11 ай бұрын

When Geant first told us about the gradient giving the steepest ascent, I instantly imagined a graph where you have +ve partial derivatives in x and y directions, but a -ve one in between them (i.e. vector 1,1 etc). This would make the gradient vector not be the steepest ascent, rather the pure x or y direction (whichever is maximum slope). But after this I realised there must be a concept of multivariable differentiablity because in this case there would be a sharp point at that location!

@ImaybeaPlatypus 7 жыл бұрын

Why isn't this linked to the video on the website?

@Festus2022 3 ай бұрын

Why is the magnitude of the gradient vector said to be the RATE of maximum ascent? When I see "rate", I think slope. Why isn't the rate of ascent simply the partial of y divided by the partial of x.? Isn't this the slope of the gradient....i.e. change in y over the change in x? What am I missing? thanks

@ashita1130 5 жыл бұрын

Wish you were my Prof.!!!!!

@abcdef2069 7 жыл бұрын

let x^2 + y^2 + z^2 = 1, so that z = f(x,y) z = ( 1 - x ^2 + y ^2 ) ^ (1/2) z < 0 = - ( 1 - x ^2 + y ^2 ) ^ (1/2) z > 0 for z=0, use anything to make it continuous 1. prove the max value of gradient at (x,y,z) = (0,0,1) when the initial point is from (x,y,z) = (0,0,-1) 2. find the gradient at (x,y,z) = (1, 0, 0) from the problem1 , when the gradient becomes infinity if you will, change my questions to make it happen gradiently possible to do gradient on a closed surface?

@CREEPYassassin1 3 жыл бұрын

I'm 5 videos in and my brain is on fire

@danaworks 3 ай бұрын

Correct me if I'm wrong, but the "proof" here seems to be a circular argument. Consider this: 1) The directional derivative could also be = (another vector that is NOT the gradient) dot (direction vector), isn't it? 2) Then with the argument presented here, wouldn't MAX(direction derivative) = (another vector) * (direction vector)? So the question remains: how do we know that projecting along the "gradient vector" gives a larger value than projecting along "another non-gradient vector"?

@ryderb.845 3 ай бұрын

I disagree with your first point. The directional derivative does have to be multiplied by the gradient because those are the actual slopes at that point. The direction vector just says we want to go more in the y or x or whatever direction, but it has to stay along that slope

@abcdef2069 7 жыл бұрын

at 2:10 i thought the same, the combination of derivatives gives you the steepest ascent and not the steepest descent. f(x,y)= x (x-1) = x^2 -x , this function has the min at x= 0.5 and max at infinity del f = ( 2x - 1 ) i + 0 j when x= -1 del f = -3 i , this is correct. -3 direction will lead you the max when x= 1 del f = 1 i, this is correct. +1 direction will lead you the max when x =0.5 del f=0 , does this fail?, because it gives no directions, it doesnt know if this is a max or min.

@Raikaska 7 жыл бұрын

SO THAT's WHY A VECTOR CAN ONLY POINT IN ONE DIRECTION WOW, THANK YOU!! don't know how i didnt realize so before

@wajidali-oi1wo Жыл бұрын

Kindly tell me that .... As we know that... Gradient is (n-1)d as compared to scalar function of (n)d. .. Keeping in mind this thing..... Does gradient at a point means A vector of global maximum locator in dimension less than one to scalar function.....?????? For example.... If phi=3d func. Then Obviously ... Del phi= 2d vector at a point perp. To level surface... Then Does del means a vector in 2d that locates maximum value of phi???????

@andrei-un3yr 4 жыл бұрын

I don't understand why the directional derivative gives the slope. Firstly, if I have the slope of a function = df/dx, then I can only multiply it with dx if I expect the resulting change df to match the function graph. Otherwise it matches only the slope line. For directional gradients, you mentioned the vector length should be 1 instead of infinitely small. That means that a gradient component df/dx multiplied with the corresponding directional vector x-component will result in a change that will align with the slope line, but not with the graph. Can somebody cast light into this issue?

@moseslocke2084 3 ай бұрын

It seems like we are saying that the gradient is not always the direction of steepest ascent.? What if f=-(x^2)-(y^2)?

@mathalysisworld 2 ай бұрын

Wow

@davidiswhat 6 жыл бұрын

I'm still confused about why it is. I can see from the dot product formula that since the Directional Derivative is greatest and a positive number(due to taking absolute value of the gradient) that the gradient must represent the steepest ascent. I'm having trouble imagining the ascent part. Let's say the partial derivatives in respect to both x and y were both a negative value at a point. Wouldn't the Directional Derivative end up as a positive value and be referencing ascent still?

@LolForFun422 5 жыл бұрын

Thank you!

@ivanluthfi8832 Жыл бұрын

i think for steepest descent you need to put "-" , which come from cos theta , where theta = pi, gives the minimum for the objective function. cmiiw

@jameslow5738 7 жыл бұрын

Can anyone explain to me at 6:55, if the projected vector could also have a value of larger than 1? I mean it depends on the direction of projection too right?

@euromicelli5970 7 жыл бұрын

Y Low, no, the vector is length one already. Projecting it can only make it shorter. You can also see it from the more algebraic definition of dot product, "(U dot V) = ||U|| * ||V|| * cos(theta)" where theta is the angle between the vectors. The only thing that changes is the angle, and the longest dot product happens when cos(theta) is the largest, that is 1 (meaning, the vectors are parallel)

@523101997 7 жыл бұрын

tilt your head 90 degrees to the right. You'll see its a right angle triangle with the directional vector being the hypotenuse. Therefore the other 2 sides must be smaller than one

@nijatshukurov9022 4 жыл бұрын

Thank you 3blue1brown

@Ayah_95 3 жыл бұрын

I never faced such a difficulty to understand something in maths like this 😂

@Trangnguyenbookclub 3 жыл бұрын

me 2

@NoName-tj8dm 2 жыл бұрын

Why length is less than 1 at 5:50 ?

@jeffgalef121 7 жыл бұрын

I'm having trouble reconciling the two views of f(x,y). On one hand, you show them as mapping a 2D space to a 1D number line. On the other hand, you show them as mapping a 2D space to a 3D space, as when you show a 3D graph. But, it is not really 3D, is it? The height, f, is just an interpretation of the dependent variable, correct? You could show a 2D graph with color instead of height, right? To me, that makes the gradient easier to understand why it's on a plane below a 3D shape. Thanks.

@shiphorns 7 жыл бұрын

f(x,y) in the examples shown is only ever a mapping from 2D to 1D. It just happens, that when you have 3 values, it's nice to visualize it in 3D as the points [x, y, f(x,y)]. You could certainly use color as the 3rd dimension, you just need to provide a key, since it's not as clear what is meant as when you use a 3rd spatial dimension.

@jeffgalef121 7 жыл бұрын

Thank you for the confirmation, Adam.

@shiphorns 7 жыл бұрын

You can also think about the surface as the solution to z=f(x,y)

@jeffgalef121 7 жыл бұрын

Wouldn't that be the case if you integrated f(x,y)?

@eclipse-xl4ze 7 жыл бұрын

kekek you are a god

@vasundarakrishnan4093 3 жыл бұрын

To those who are confused, the direction of the steepest descent is the direction in which the directional derivative is maximum. Directional derivative for any vector v = Gradient * vector v So we maximise (gradient * vector v) to maximize directional derivative

@timgoppelsroeder121 4 жыл бұрын

How can the gradient which is a vector dotted with the vector v equal the normalized version of the gradient vector???

@andrewmacarthur6063 3 жыл бұрын

I think this is a notational error at around 7:23 onwards. As written, the RHS of that equation should be a _number_ (the maximum value of grad f DOT v) rather than a _vector_ . What Grant has written is the unit vector v which gives rise to that maximum, i.e. the one that points in the same direction as the gradient but is of length 1. This has happened because Grant wants to emphasise this fact as the main point of the video. Some precision has been lost in the notation. (Using 'argmax' rather than 'max' on the LHS would make this precise but that might be less familiar and require explanation too.)

@winstonvpeloso 3 жыл бұрын

I think it's hilarious how when Grant does videos for KA he repeats out loud what he's writing on the screen like Sal does. Makes me laugh every time

@guidogaggl4020 5 жыл бұрын

Is this grant from 3b1b?

@robertwilsoniii2048 Жыл бұрын

The way I've always seen it is every sum of derivatives will be a combo of the partials. Therefore, the purest and least inefficient path is the least scaled up linear combination of the bases so path with least or minimized resistance and drag is the combo of just the two partial derivatives or the gradient vector. I'm pretty sure you could prove this with the triangle inequality. Any sum of multiples of the bases vectors will have a longer hypotenuse than the sum of just the partial derivatives holding one side, like the height, constant on both. In other words, you'll waste energy traveling farther than necessary horizontally for the same movement vertically compared to the path of the pure partial derivatives. But you can't move faster than those, because you're limited by the physical shape of the surface you're on. You have no other choice, the constraints knock down other paths physically or hypothetically in the case of imagined scenarios.

@jonathandobrowolski6941 4 жыл бұрын

Yeah but why does the gradient point in the direction of steepest ascent? @ 5:18

@williambudd2850 5 жыл бұрын

Help!!! I think this guy just claimed that the direction of maximun change is in the direction of the gradient because it is in the direction of the gradient further confusing me.

@98danielray 3 жыл бұрын

dude no the justification is using how the directional derivative was defined previously. pay attention

@Eng.Hamza-Kuwait Жыл бұрын

👌👌👌👌👌👌

@dominicellis1867 4 жыл бұрын

Does another change w

@bakeqamza8907 5 жыл бұрын

it is rather consequence than reason

@cooper7655 5 жыл бұрын

TLDW: The directional derivative represents the rate of change of the function in that direction. If you try all possible directions centered at that point, it happens that the magnitude of the directional derivative is largest when taken in the direction of the gradient. Therefore, we can conclude that the gradient points in the direction of steepest ascent.

@DougMamilor 4 жыл бұрын

This is comment is by far clearer than the entire video. Thank you.

@Julie-ts9gi 2 жыл бұрын

so, it seems purely coincidental to me. Is there any sort of explanation why? The video didn't really explain it.

@Festus2022 3 жыл бұрын

I don't think the narrator ever really explained why the Gradient vector is ALWAYS in the direction of the steepest slope. As far as I could tell, he only explained how the directional unit vector interacts with the Gradient to reduce it or maintain it at its maximum. If every point on a 3D-surface has an infinite number of tangent lines, all with potentially different slopes, how can taking partial derivatives from just 2 directions (x and y) and combining them into a vector always point in the direction of maximum steepness?

@98danielray 3 жыл бұрын

he did. the inner product is largest when parallel to the vector. the partial derivatives are just in the direction of the basis vectors the basis vectors generate all vectors in your vector space by linear combinations, hence thats all the information you need. that is why the directional derivative is brought up in the first place, because linear combinations of partial derivatives correspond to derivatives in the directions of the vectors that are precisely those linear combinations of basis vectors (since the derivative is linear). ex : say you want the change in the direction (1,2) which is 1(1,0) + 2(0,1) written in the canonical basis ; thatd correspond to 1df/dx + 2 df/dy, or grad f at the point dotted with (1,2).

@hanju3250 5 жыл бұрын

Is this video part of some course?

@alvingustavii4458 5 жыл бұрын

Yeah, multivariable calculus.

@46pi26 6 жыл бұрын

Terribly sorry to Sal, but I'm just too fond of Grant's voice to watch any of Sal's videos.

@syedrizvi597 5 жыл бұрын

Then you're missing out

@CrankinIt43 4 жыл бұрын

Sal has a pretty god-like voice too though

@zes7215 6 жыл бұрын

ts not frix or not, can telx anyx by anyx nmw. no such thing as howx telx

@niroshas1790 6 жыл бұрын

if possible real analysis

@cauchyschwarz3295 2 жыл бұрын

I find this fact so confusing. If the gradient is the direction of steepest ascent, what is the direction of greatest net change? I always assumed the gradient points in the direction where the function changes the most.

@csmole1231 4 жыл бұрын

I was initially confused because I was thinking of a situation where: along x axis and y axis the graph is kinda stable and mildly changing but in quadrant one there is a freaking big valley and ma poor little point is at the origin point😂 I was worried that no info about that valley is shown in gradient and ma point don't know where to go😂 then i realize i was outside the scope of this discussion

@csmole1231 4 жыл бұрын

and at first i even totally ignored the fact that they are tiiiiiiiiiny steps, which means those steps happened in a little plane, not curvy at all

@csmole1231 4 жыл бұрын

and ma point should follow the diagonal line hence the gradient direction

@abdijabesa8544 3 жыл бұрын

7:40 isn't he suppose to say "multiplying it" rather than "dividing"?

@Rafael-pi4md 4 жыл бұрын

What if I want to find the least steepest??

@R3nxt 4 жыл бұрын

It's just the negative of the greates ascend.

@rebeccap6609 6 жыл бұрын

I understand why the gradient has the steepest slope of all the directional derivatives, but why can't it be in the direction of steepest DESCENT? Shouldn't there be a case where a pure step in x + a pure step in y lowers the value of the function?

@antonofka9018 6 жыл бұрын

Rebecca Peyser it's a little bit deeper. I didn't get it at first too. Now I'll try to explain (hopefully, I got it right) Suppose you have a little nudge in X that changes the value of the function in the negative direction. The gradient then encodes just that (the change to the function) as the first row, which is negative. See? You fed him a positive nudge and its output is negative. It means that you need to step in the negative direction in order to change the function the most. Ponder upon it for a moment. Now for the Y component. Suppose it changes the output more than it does the nudge in the X direction. So the second row of your gradient is going to be higher, since it reflects the change caused by your nudge. What you'd finally get as your gradient is a direction vector, that tells you to move in the negative direction for the X component and in positive for Y and change in X would be lower than the change in Y, since Y change affects the function stronger. It holds the information not of the direction of the first nudges, but the changes that those first nudges made to the function in the form of a new direction (a vector). If you get it now, then I'm jubilant. If you don't, try to ponder and write me a message. I'm open to explaining that again in more details.

@98danielray 3 жыл бұрын

no, because the derivative being positive means the function is increasing. you could even say it is an arbitrary choice/convention to define the derivative as the limit of (f(x+h)-f(x))/h. it may as well have been (f(x)-f(x+h))/h which would make the derivative be positive mean the function is decrasing.