Stochastic Gradient Descent vs Batch Gradient Descent vs Mini Batch Gradient Descent |DL Tutorial 14

Рет қаралды 156,749

3 жыл бұрын

Stochastic gradient descent, batch gradient descent and mini batch gradient descent are three flavors of a gradient descent algorithm. In this video I will go over differences among these 3 and then implement them in python from scratch using housing price dataset. At the end of the video we have an exercise for you to solve.
🔖 Hashtags 🔖
#stochasticgradientdescentpython #stochasticgradientdescent #batchgradientdescent #minibatchgradientdescent #gradientdescent
Do you want to learn technology from me? Check codebasics.io/? for my affordable video courses.
Next Video: kzfaq.info/get/bejne/a9WXoKic2tDToXk.html
Previous video: kzfaq.info/get/bejne/hrdzeJx0zdutdI0.html
Code of this tutorial: github.com/codebasics/deep-learning-keras-tf-tutorial/blob/master/8_sgd_vs_gd/gd_and_sgd.ipynb
Exercise: Go at the end of above link to find description for exercise
Deep learning playlist: kzfaq.info/sun/PLeo1K3hjS3uu7CxAacxVndI4bE_o3BDtO
Machine learning playlist : kzfaq.info/sun/PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw
Prerequisites for this series:
1: Python tutorials (first 16 videos): kzfaq.info/sun/PLeo1K3hjS3uv5U-Lmlnucd7gqF-3ehIh0
2: Pandas tutorials(first 8 videos): kzfaq.info/sun/PLeo1K3hjS3uuASpe-1LjfG5f14Bnozjwy
3: Machine learning playlist (first 16 videos): kzfaq.info/sun/PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw
#️⃣ Social Media #️⃣
🔗 Discord: discord.gg/r42Kbuk
📸 Dhaval's Personal Instagram: dhavalsays
📸 Instagram: codebasicshub
🔊 Facebook: codebasicshub
📝 Linkedin (Personal): www.linkedin.com/in/dhavalsays/
📝 Linkedin (Codebasics): www.linkedin.com/company/codebasics/
📱 Twitter: codebasicshub
🔗 Patreon: www.patreon.com/codebasics?fan_landing=true

Пікірлер: 250

@codebasics 2 жыл бұрын

Do you want to learn technology from me? Check codebasics.io/ for my affordable video courses.

@user-zy8sf7tv2f 2 жыл бұрын

I've followed your words to implement the minibatch gradient descent algorithm myself and learned a lot after wathing your implementation about that, thank you very much.

@ryansafourr3866 Жыл бұрын

The world is better with you in it!

@codebasics Жыл бұрын

Glad you liked it Ryan and thanks for the donation

@kasyapdharanikota8570 2 жыл бұрын

when you explain I find deep learning very easy and interesting. Thank you sir!

@spiralni 2 жыл бұрын

When you understand the topic you can explain it easily, and you are a sir, are a master. thanks.

@girishtripathy275 2 жыл бұрын

After So many videos I watched to learn ML (Self learn, I am complete noob in ML currently), this playlist might be the best one I got on youtube! Kudos man. Must respect

@kaiyunpan358 3 жыл бұрын

Thank you for your patient and easily understood explanation which solved my question !!!

@sanjivkumar8187 2 жыл бұрын

Hello Sir, i am following your tutorials by sitting in Germany. You made thing's so simple. Better then Udemy,coursera,.. etc courses. I highly recommend. Please take care of your health as well and hopefully you will be fatter in coming Video 🙂

@bestineouya5716 Жыл бұрын

I spent days trying to learn gradient descent and its types. Happy you cleared the mess. Thanks again teacher

@zhaoharry4113 3 жыл бұрын

love how you always put memes in your videos HAHA, great work!

@zhaoharry4113 3 жыл бұрын

and thank you for the videos Sir :3

@watch_tolearn 3 ай бұрын

You are the best teacher I have come across. you bring understanding in a humble way. Stay blessed.

@user-qi8xj8jh9m 10 ай бұрын

This is called teaching, love your teaching sir!!

@tiyasachakraborty4786 2 жыл бұрын

You are my best teacher. I am becoming a big fan of such a great teacher.

@siddharthsingh2369 2 жыл бұрын

If someone is facing trouble in the value of w_grad, b_grad, here is my explanation, please correct me if somewhere i am wrong - I think the error is calculated using the formula (y_predicted - y_true)**2, if u notice in the starting. Hence total error in that case will be mean of all the errors found. However when u do the derivate square term i.e. error **2 will also give 2 in the front ( By derivation of x**2) and along the weight it is showing 2 in front. The -ve value which u are seeing is just reversal of (y_true - y_predicted) in this video. As in previous video it was (y_predicted - y_true). Also if somehow u are getting confused in the transpose implementation of the matrix as the one which is shown here is little different then the one video 13 , then u can use below code for w_grad, b_grad. They will give u the exact value. # Similarity from video 13 while finding w1 , w2, bias - w_grad = ( 2 / total_samples )*np.dot( np.transpose( x ), ( y_predicted - y_true )) . b_grad = 2 * np.mean( y_predicted - y_true ).

@NguyenNhan-yg4cb 3 жыл бұрын

Lol i do not want go to sleep and i dont have enough money to watch netflix, so i just take care of my career sir

@kumudr 2 жыл бұрын

thanks, i understood finally gradient descent, sgd & mini batch

@malharlumbhani8700 3 жыл бұрын

Ekdum jordaar bhanavo sir tame, Bov ucchu :)))))

@yogeshbharadwaj6200 3 жыл бұрын

Tks a lot for the detailed explanation...learned a lot...

@nahidakhter8646 3 жыл бұрын

Video was fun to watch and the jokes helped keep me focused. Thanks for this :)

@codebasics 3 жыл бұрын

Glad you enjoyed it!

@vincemegasonic 2 жыл бұрын

Good day to you sir! I'm currently an undergraduate in Computer Science, currently working on a paper that is using this neural network. This tutorial helped me understand the neural network pretty quick and helped me adjust our software to function how we intend it to. Please keep up the good work and hope that other students like me can come across and use this in their upcoming studies!! Godspeed on your future content!!

@codebasics 2 жыл бұрын

Best of luck! and I am happy this video helped

@yen__0515 2 жыл бұрын

Sincerely appreciate for your enrich content, it helps me a lot!

@codebasics 2 жыл бұрын

Thanks for the generous donation 🙏👍

@waseemabbas5078 2 жыл бұрын

Hi! Sir i am from pakistan i am following your tutorials, thank you very much for such an amazing guiding material.

@piyalikarmakar5979 2 жыл бұрын

Sir, your vedios always answer my all queries around the topics...Thank you so much sir..

@harshalbhoir8986 Жыл бұрын

Thank you so much sir Now really dont have porblem with Gradient Descent and the exercise at last helps alot!!

@raom2127 2 жыл бұрын

Great videos and in simplicity in detailed explanation with coding is super.............

@jasonitsme 2 жыл бұрын

Thank you so much, sir! I think you taught way better than my university lecturer and helped me understand much better!

@codebasics 2 жыл бұрын

Glad I could help!

@optimizedintroverts668 Ай бұрын

hats of to you for making this topic easy to understand

@otsogileonalepelo9610 3 жыл бұрын

Great content and tutorials, thank you so much.🙏 But I have a few questions: When do you implement early stopping to prevent overfitting? Aren't you supposed to stop training the moment the loss function value increases compared to the last iteration? For instance the zig-zag pattern for the loss displayed by SGD, is that just fine?

@prashantbhardwaj7041 2 жыл бұрын

At about 14:43, a clarification may help someone as to why the Transpose is required. For Matrix product, the thumb rule is that Columns of the 1st matrix must be the same as the rows of the 2nd matrix. since our "w" is 2 columns, the "X_scaled" has to be transposed from a 22X2 matrix into a 2X22 matrix. Yes, the resulting matrix will be a 22 column, 2 rows matrix.

@mikeguitar-michelerossi8195 Жыл бұрын

Why don't we make np.dot(scaled_X, w)? Should give the same result, without the transpose operation

@ankitjhajhria7443 11 ай бұрын

w.shape is (2*1) means 1 column and x_scaled.T has (2*20) means 2 rows ? your rule does not follow why ?

@rofiqulalamshehab8528 11 ай бұрын

Your explanation is excellent. It would be great if you could make a computer vision playlist.Did you make any plans for it?

@md.muntasirulhoque8563 3 жыл бұрын

sir can u tell me why u se minmax scaler cant we use standard scalr ?

@abhisheknagar9000 3 жыл бұрын

Very nice explanation. Could you please let me the parameter value while training (for SCD, mini batch and batch) using Keras.

@priyajain6791 Жыл бұрын

@codebasics Loving your videos so far. The way you present the examples and explanations, things really seems to be easy to understand. Thanks a lot for thoughtful content! Just one request, can you please share the PPT you're using as well?

@tarunjnv1995 Жыл бұрын

@codebasics Yes, your content is really outstanding. Also for quick revision of all these concepts we need ppt. Could you please provide it?

@rociodelarosa1549 2 жыл бұрын

Excellent explanation, keep up the good work 👏

@sunilkumar-pp6eq 3 жыл бұрын

Your Videos are really helpful, you are so good in coding, it takes time for me to understand. But Thank you so much for making it simple!

@codebasics 3 жыл бұрын

I am happy this was helpful to you.

@1980chetansingla 3 жыл бұрын

Sir I tried this code for more than 2 inputs it is giving error in last line array with a sequence What to do

@suenosn562 2 жыл бұрын

you are great teacher thank you so much sir

@fariya6119 3 жыл бұрын

I think you have just made everything easy and clear. Thanks a lot . You have just allayed my fears to learn Deep learning.

@codebasics 3 жыл бұрын

Glad to hear that

@ashimanazar1193 3 жыл бұрын

The explanation was very clear. What if the input data X has outliers then if one takes a small batch size then one can't just compare the last two values for theta or cost function. What shall be the convergence condition then? Please explain

@vin-deep Жыл бұрын

Super explanation skill that you have!!!

@satinathdebnath5333 2 жыл бұрын

Thanks for uploading such informative and helpful videos. I am really enjoying it and looking forward to use it in my MS works. Please let me know where I can find the input data like the .CSV file. I could not find it in the link provided in the description.

@ramimoustafa Жыл бұрын

Thank you man for this perfect explanation

@chalmerilexus2072 Жыл бұрын

Lucid explanation. Thank you

@sahinmuratogur7556 2 жыл бұрын

I have a question why do you calculate cost for each epoch? if you would like to plot the costs for each 5 or 10 steps, is it logical to calculate the costs only at for every 10 th or 5 th step?

@aryac845 Жыл бұрын

I was following ur playlist and it's very helpful. But from where I can get the data u used ? So that I can work on it

@dutta.alankar 3 жыл бұрын

Really well explained in simple terms!

@codebasics 3 жыл бұрын

😊👍

@fahadreda3060 3 жыл бұрын

Thanks for the video , wish you all the best

@codebasics 3 жыл бұрын

I am glad you liked it

@humourin144p Жыл бұрын

Sir Big Fan ….best and simple explanation

@williammartin4416 4 ай бұрын

Excellent lecture

@vikrantgsai7327 Жыл бұрын

For mini batch gradient descent, can the samples for the mini batch picked in any order from the main batch?

@Breaking_Bold 6 ай бұрын

Great explanation !!!

@surabhisummi Жыл бұрын

One request, I am not able to find the csv file which you have used here. Please attach that as well, it would be a great help. Again thanks for teaching!

@very_nice_777 Жыл бұрын

Thanks a lot sir. Love from Bangladesh!

@shuaibalghazali3405 7 ай бұрын

Thanks for making this tutorial I think am getting somewhere

@nasgaroth1 3 жыл бұрын

Awesome teaching skills, nice work

@codebasics 3 жыл бұрын

Glad you think so!

@ashishmalhotra2230 7 ай бұрын

Hi, why did you do "y_predicted = np.dot(w, X.T) + b". Why is X transpose required here?

@DigvijayAnand 3 жыл бұрын

How to find R^2 for this algo?

@spicytuna08 2 жыл бұрын

thanks u r really good.

@shashisaini7919 Жыл бұрын

thankyou sir, good tutorial.❣💯

@dimmak8206 3 жыл бұрын

you have a talent at teaching cheers!

@codebasics 3 жыл бұрын

Glad you enjoyed it

@swaralipibose9731 3 жыл бұрын

You are truly talented in teaching

@codebasics 3 жыл бұрын

👍☺️

@alidakhil3554 Жыл бұрын

Very nice lesson

@shamikgupta2018 2 жыл бұрын

17:26 --> Sir it looks like the derivative formulae for w1 and bias are different than what you had shown in previous video.

@palashsrivastav6748 5 ай бұрын

sir why did you use sigmoid_numpy() to calculate y_pred in last code and not in this code for Batch Gradient descent

@JH-kj3xk 2 жыл бұрын

many thanks!

@harshalbhoir8986 Жыл бұрын

Thank you so much sir

@abhaydadhwal1521 2 жыл бұрын

Sir i have a question ... in stochastic u wrote -(2/total_samples) in formula of w_grad and b_grad. But in mini-batch u have written -(2/ len(Xj). why the difference?

@shouyudu936 3 жыл бұрын

I have a question, why do we also need to divide by n in stochastic gradient descent, isn't that we are going through each different point?

@r0cketRacoon 6 күн бұрын

same question, do you have an answer for that?

@mayurkumawatmusic 3 жыл бұрын

great series

@tinanajafpour7214 Жыл бұрын

thank you for the video

@spoonstraw7522 6 ай бұрын

Thank you so much and that cat trying to learn, mini batch gradient, descent is so relatable. In fact, that’s the reason I’m here. My cat is a nerd. We were partying, and then my cat the party pooper he is asked what is mini batch gradient descent and he kind ruined the party. He always does this last time he was annoying everyone by trying to explain what bullion algebra is What a nerd

@sindhuswrp 2 жыл бұрын

For SGD isn’t it supposed to be ‘m’ iterations per epoch? In the video it’s only 1 iteration per epoch.

@gokhanersoz5239 2 жыл бұрын

Why didn't we translate from an activation function after the prediction function here? We did it in the previous example. Did I miss something? Is the reason for coming together (2/..) because of derivative? In the previous video, we took it as (1/n). Is there a point I missed? I would be glad if you help.

@siddharthsingh2369 2 жыл бұрын

same problem facing I think the error is calculated using the formula (y_predicted - y_true)**2, if u notice in the starting. Hence total error in that case will be mean of all the errors found. However when u do the derivate square term i.e error **2 will also give 2 in the front ( By derivation of x**2) and along the weight it is showing 2 in front. The -ve value which u are seeing is just reversal of (y_true - y_predicted) in this video. As in previos video it was (y_predicted - y_true).

@prasanth123cet 2 жыл бұрын

Why the term total_samples in -(2/total_samples) in stochastic gradient descent function definition. We are taking the derivative of square of single error. I was wondering whether it is only (-2) instead of -(2/total_samples). Please clarify

@siddharthsingh2369 2 жыл бұрын

I think the error is calculated using the formula (y_predicted - y_true)**2, if u notice in the starting. Hence total error in that case will be mean of all the errors found. However when u do the derivate square term i.e error **2 will also give 2 in the front ( By derivation of x**2) and along the weight it is showing 2 in front. The -ve value which u are seeing is just reversal of (y_true - y_predicted) in this video. As in previos video it was (y_predicted - y_true).

@ritik444 2 жыл бұрын

You are an actual legend

@ahmetcihan8025 Жыл бұрын

Thanks a lot.

@Chinmay4luv 3 жыл бұрын

Ha ha this is a new style of teaching, liked too much 😍 😍 😍 and definitely i am going to open the solution part, however i have already vaccines for my computer in codebasics....

@codebasics 3 жыл бұрын

Ha ha. So chinmay ram is the first one to invent the vaccine for corona 😊 you should get a novel prize buddy 🤓 how is it going by the way? Are you still in Orissa or back to Mumbai?

@Chinmay4luv 3 жыл бұрын

@@codebasics that prize will dedicate to codebasics, i am in odisha continuing wfh....

@adityabhatt4173 Жыл бұрын

Good Bro, The way u used memes is expectational It makes learning fun.

@farrugiamarc0 4 ай бұрын

Thank you for sharing your knowledge on the subject with very good and detailed explanation. I have a question with reference to the slide shown at time 3:29. When configured to do batch gradient descent, and there are 2 features with 1 million samples, why is the total number of derivatives equal to 2 million? Isn't it 2 derivatives per epoch? After going through all the 1 million samples you calculate the MSE and then do back propagation to optimise W1 and W2. Am I missing something?

@GojoKamando 3 ай бұрын

why we used mean square error but not log loss function?

@aniksarker6114 3 жыл бұрын

please provide the link of this data set.

@mohdsyukur1699 3 ай бұрын

You are the best my boss

@010-haripriyareddy5 2 ай бұрын

can we say With large training datasets, SGD converges faster compared to Batch Gradient Descent

@navid9495 2 жыл бұрын

really useful tnq

@GaneshMuralidharan 2 жыл бұрын

Excellent bro.

@sandipansarkar9211 3 жыл бұрын

Great session

@ISandrucho 3 жыл бұрын

Thanks for the video. I noticed one thing. In SGD you didn't change the partial derivative formula of cost function (but cost function had changed).

@r0cketRacoon 6 күн бұрын

the same question, I wonder why do we need the derivatives divided by total samples when we only pick a stochastic sample? Have u figured out the answer?

@nicholasyuen9206 9 ай бұрын

Can someone please explain why @3:30 that there will be 20 million derivatives computed in the first epoch? Should'nt there be just 2 derivatives for the first epoch since there would only be solving for 2 partial derivatives (respective to the 2 features) of the MSE computed from all the 10million samples? Thanks.

@mohamedyassinehaouam8956 Жыл бұрын

very interesting

@devashishnigam5971 3 жыл бұрын

could you please explain y.to_numpy().reshape(-1,1) to convert pandas series to 2D array for Scaler.fit_transform()

@RAJIBLOCHANDAS 2 жыл бұрын

Great video.

@MrBemnet1 3 жыл бұрын

question , why do you have to do 20 million derivatives for 10 million samples? The number of derivatives you have to do should be equal to the number of W's and B's.

@danielahinojosasada3158 2 жыл бұрын

Remember that there are multiple features. One sample --> multiple features. This means calculating multiple derivatives per sample.

@rahulnarayanan5152 2 жыл бұрын

@golden water Same question

@uttamagrahari Жыл бұрын

Here in these 10 million samples there are 10 million weights and 10 million biases. So we have to do derivatives for every weight and bias, so we have to do 20 million derivatives while updating for the new weight and bias.

@Mathmagician73 3 жыл бұрын

Waiting 😍........Also make video on optimizers pls

@codebasics 3 жыл бұрын

👍😊

@abhaydadhwal1521 2 жыл бұрын

how can i get the dataset?

@VikramReddyAnapana 2 жыл бұрын

Wonderful as always.

@codebasics 2 жыл бұрын

Glad it was helpful!

@md.rahman56 Жыл бұрын

How I can find the dataset for the given solution here

@AJAYKUMAR-gl1vx 2 жыл бұрын

and second doubt is in SGD implementation. When we are taking only one random sample then why you are dividing the error by total number of sample?

@prasanth123cet 2 жыл бұрын

Even I have this doubt. I rerun without the total number of terms in denominator. Values were different from what I got from batch gradient descent

@vinny723 Жыл бұрын

Great series of tutorials. I would like to know for this tutorial (#14), why the implementations of Stochastic Gradient Descent or Batch Gradient Descent did not include an activation function? Thanks.

@r0cketRacoon 6 күн бұрын

no need to do that because this is a regression task, just classification problems that use sigmoid or softmax

@benlyazid 2 жыл бұрын

good explication thank for you for your effort , keep going bro ;)

@codebasics 2 жыл бұрын

I am happy this was helpful to you.

@tinanajafpour7214 Жыл бұрын

sorry, could you please explain why you have put[0][0] in this line? return sy.inverse_transform([[scaled_price]])[0][0] I will really be appreciated.🙏🙏