Locally Weighted & Logistic Regression | Stanford CS229: Machine Learning

Locally Weighted & Logistic Regression | Stanford CS229: Machine Learning - Lecture 3 (Autumn 2018)

Рет қаралды 534,971

Күн бұрын

For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: stanford.io/ai
Andrew Ng
Adjunct Professor of Computer Science
www.andrewng.org/
To follow along with the course schedule and syllabus, visit:
cs229.stanford.edu/syllabus-au...
An outline of this lecture includes:
Linear Regression Recap
Locally Weighted Regression
Probabilistic Interpretation
Logistic Regression
Newton's method
00:00 Introduction - recap discussion on supervised learning
05:38 Locally weighted regression
05:53 Parametric learning algorithms and non-parametric learning algorithms
21:32 Probabilistic Interpretation
46:18 Logistic Regression
1:05:57 Newton's method
#aicourse #andrewng

Пікірлер: 109

@raccoonious4038 2 ай бұрын

The simplification of log likelihood function log(L(theta)) to give you back the cost function J(theta) has to be one of the most beautiful transformations I've seen in a while!

@manudasmd Жыл бұрын

Damn , this guy just explains concepts so clearly. love this course

@morespinach9832 10 ай бұрын

Where else have you seen these techniques explained? It’s not that hard.

@hannukoistinen5329 7 ай бұрын

Damn, chinese communist teaching in Stanford!!!

@manudasmd 7 ай бұрын

@@hannukoistinen5329 Damn, Cool Joke bro!! You must be really a funny guy.

@ujjolchakrabarty9285 12 күн бұрын

Do you know where we can get the practice sets for the course?

@shaksham.22 16 сағат бұрын

@@ujjolchakrabarty9285 trying to find the same thing. Cant access it from the website

@MLLearner Ай бұрын

0:28: 📚 The video discusses supervised learning, specifically linear regression, locally weighted regression, and logistic regression. 5:38: 📚 Locally weighted regression is a non-parametric learning algorithm that requires keeping data in computer memory. 13:05: 📊 Locally weighted regression is a method that assigns different weights to data points based on their distance from the prediction point. 19:01: 📚 Locally linear regression is a learning algorithm that may not have good results and is not great at extrapolation. 24:46: 🔍 The video discusses Gaussian density and its application in determining housing prices. 31:31: 💡 The likelihood of the parameters is the probability of the data given the parameters, assuming independent and identically distributed errors. 36:55: 📊 Maximum Likelihood Estimation (MLE) is a commonly used method in statistics to estimate parameters by maximizing the likelihood or log-likelihood of the data. 43:44: 📊 Applying linear regression to a binary classification problem is not a good idea. 49:22: 🎯 The video discusses the choice of hypothesis function in learning algorithms and why logistic regression is chosen as a special case of generalized linear models. 54:45: 📚 The video explains how to compress two equations into one line using a notational trick. 1:01:31: ✏ Batch gradient ascent is used to update the parameters in logistic regression. 1:07:52: 📚 The video explains how to use Newton's method to find the maximum or minimum of a function. 1:13:55: 💡 Newton's method is a fast algorithm for finding the place where the first derivative of a function is 0, using the first and second derivatives. Recap by Tammy AI

@beautifulworld6163 29 күн бұрын

you are making my day bra'

@carvalhoribeiro 8 ай бұрын

Your clear explanation of these concepts is greatly appreciated. Thank you so much for sharing

@MLLearner Ай бұрын

00:10 Today's discussion is about supervised learning and locally weighted regression. 07:48 Locally weighted regression focuses on fitting a straight line to the training examples close to the prediction value. 16:15 Locally weighted linear regression is a good algorithm for low-dimensional datasets 22:30 Assumptions for housing price prediction 29:45 Linear regression falls out naturally from the assumptions made. 36:36 Maximum Likelihood Estimation is equivalent to the least squares algorithm 44:40 Linear regression is not a good algorithm for classification. 51:04 Logistic regression involves calculating the chance of a tumor being malignant or benign 58:30 Logistic regression uses gradient ascent to maximize the log-likelihood. 1:05:36 Newton's method is a faster algorithm than gradient ascent for optimizing the value of theta. 1:12:40 Newton's method is a fast algorithm that converges rapidly near the minimum. Crafted by Merlin AI.

@AyushGupta-zc4lh 6 ай бұрын

Awesome lecture

@tomzhangg 2 жыл бұрын

A classic tradeoff in locally weighted models between training cost and accuracy, though it seems like the cost really comes from refitting for each x input during testing.

@adityachauhan7269 Жыл бұрын

ohhh so thats how it does it, wouldnt this overfit? It's like the start of thinking towards "forest-lile" methods, amazing.

@atalantinopieva 9 күн бұрын

For anyone struggling with the concept.. likelihood indicates how likely a particular population is to produce an observed sample given a particular distibution. For example, if we have data that should follow a Gaussian Distribution with mean=5 and variance = 0.1 but the ACTUAL data in my dataset are all 0.5, well... the likelihood that my data actually follow this distribution is very low! If each ACTUAL data has a high density probability, the overall likelihood will be high!

@nanunsaram 2 жыл бұрын

Thank you!

@ramankr0022 6 ай бұрын

Very helpful.

@elonmusk4267 7 ай бұрын

What a phenomenal lecture! So beautiful, so elegant, just looking like a wow

@user-hy7mz1og7s 10 ай бұрын

Thank You Very Much

@fahimesokhangou3646 Жыл бұрын

I have a question about locally weighted regression. Imagine we want to calculate studentized residual. we have different hat matrix (projection matrix) for each observation and each hat matrix is a matrix (k by k) which k is a number of the observation in the span. Now I would like to calculate the leverage. I would like to know how to determine leverage for each observation?

@haoranlee8649 7 ай бұрын

i like this guy‘s video, it's amazing

@bwmartin24 2 жыл бұрын

A lot of the links aren't working on the syllabus linked in description. Is there an updated version with the class notes pdf's, etc.?

@aphievel Жыл бұрын

You can refer to the notes of the summer 2019 class. Though the topics were covered in a different order, the content is the same.

@ikrammaizi8678 Жыл бұрын

@@aphievel where?

@shakeelahmad3162 Жыл бұрын

docs.google.com/spreadsheets/d/18pHRegyB0XawIdbZbvkr8-jMfi_2ltHVYPjBEOim-6w/edit#gid=0

@malfuriosstormrage5218 4 ай бұрын

What a concept. I just "wow"d when MLE was shown. Anyone here familiar with Power System State Estimation?

@liketheblue5082 Жыл бұрын

32:18 I have a question about this likelihood function. Can somebody help me with it? According to the IID assumption, the probability of all the observations is equal to the product of each probability . However, isn’t the expression a density instead of a probability of a normal distribution? I am really confused. I think the probability should be the integral of density function. If it's density, what's the meaning of the product of densities?

@HamzaAsgharKhan Жыл бұрын

For I.I.D, P(AB) = P(A)P(B). Your observation about it being the probability density of the Gaussian is correct. When we maximize it, we are trying to find the point which has the highest probability. A point that has the highest density will have the highest probability. So using the probability density function is correct in that regard (I think you are confused by the fact that the density will probably result in a value that is not between 0 and 1 but with a little thought about what I said, hopefully you will be able to see why normalizing the values to be between 0 and 1 do not really matter). I don't know how much help this answer will be to you, I'm simply having a hard time to articulate what I'm trying to say.

@liketheblue5082 Жыл бұрын

@@HamzaAsgharKhan Thank you very much! I didn't expect someone would give me such a detailed answer! That's exactly what I thought. The product of density might not really have a meaning in statistics, the density can also be greater than 1 , but it would be enough to find the maximum point. I appreciate it!!

@HamzaAsgharKhan Жыл бұрын

@@liketheblue5082 I'm glad it helped! 😊

@user-js1ng7tl4d 6 ай бұрын

soooooo goood

@user-fh1do9xb4n Жыл бұрын

I love the videos and Mr Ng explains things clearly, but gosh, the markers he uses are so pale and hard to read

@kaipingli-mh3mw 7 ай бұрын

thx

@rushinshah4344 7 ай бұрын

where can i access the problem sets?

@glitchAI Ай бұрын

he speaks with so much bass that I have to ramp up my volume.

@KipIngram Жыл бұрын

10:24 - How is this just not a form of interpolation using shape functions? That doesn't really seem like "learning" to me.

@DagmawiAbate Жыл бұрын

Okay.

@SalihBekri 6 ай бұрын

i'm an EE student and we don't anything to do with ML except a simple course in the final year and i'm still taking this course wish me luck guys because it's hard reaally hard

@AadityaSaraf69 4 ай бұрын

good luck! and yes, it is very hard

@moussadiallo6430 7 ай бұрын

great lecture. ML is fun with you😀

@stanfordonline 7 ай бұрын

Thanks for your comment and for watching!

@youssera6352 Жыл бұрын

Hi, i'm trying to follow this courses in order to start reading papers for my phd research/preperation, i don't seem to understand most of the mathematic equations, do i really need to understand them to achieve my goal or i just need to understand the concepts and memorize the formulas ?

@Nett6799 Жыл бұрын

i have the same problem as you , what's your phd research theme ?

@jaimehernandezbascur8619 Жыл бұрын

Hi, it's highly recommended to have a background in probability and statistics, and linear algebra before studying machine learning. Personally i think that a few knownledge in optimization is sufficiently but no necessary.

@rayugamax183 11 ай бұрын

In the links given in description don't have the class notes he keeps mentioning and he tells to read from them. Can anyone help? I mean how do i get those?

@kinetic_kane9033 4 ай бұрын

Same question. I think the notes are only available to stanford students because its in their intranet.

@prathmeshmishra4357 2 ай бұрын

@@kinetic_kane9033 cs229.stanford.edu/lectures-spring2022/main_notes.pdf

@pavel.pavlov 8 ай бұрын

He needs to get the IBM guys blackboard

@albertlei9249 2 жыл бұрын

Looks like in gradient ascent if we replace the scalar learning rate alpha by the inverse H^{-1}, we get the Newton's method.

@tomzhangg 2 жыл бұрын

Also remember that the partial derivative is replaced with the gradient vector, allowing for matrix multiplication.

@shubhamkumar-nw1ui 2 жыл бұрын

Can you guys help me out ? I can't get my head around likelihood of theta thing ....why this is equal to product of probabilities of Y

@ZeroManifold Жыл бұрын

Newton's method uses 2nd order approximation vs the gradient descent uses 1st order approximation, the rationale is quite similar.

@haitematik5832 3 ай бұрын

ML for Goa'ulds

@basecode06791 3 ай бұрын

Can't find the derivation of the MLE

@bouazizzied5086 Жыл бұрын

can someone tell me after we derive the maximum likelihood of theta how do we use it to modify all our parameters theta?

@gautamgirotra3572 11 ай бұрын

From MLE of theta we have the function that should be maximized i.e. l(theta) Now use any optimization algorithm(like Gradient descent/ newton's method) to optimize for example using GD theta(new) = theta(old) +alpha * partial derivative of l(theta)

@mekuzeeyo Жыл бұрын

then what is the difference between the locally weighted regression and polynomial regression? in application

@closingtheloop2593 4 ай бұрын

I got the same question. Im sure with more exposure it will be clear. Polynomial regression and locally weighted regression echoes simularity in concept with gains scheduled control design for nonlinear systems. Same trick, different pony. IE, how can we apply linear theory to non linear systems?

@karthikeyapervela3230 2 ай бұрын

26:37 How is it being implied? Like we are assuming the error term to be a gaussian, from there we jumped to the conditional distribution of y given x parameterized by theta, I did not understand this implication.

@suvamsivam9658 27 күн бұрын

its assumed that the error term is normally distributed

@sophiafunworldatthepark6740 5 ай бұрын

I try to find way how to use this to teach kids.

@surajyadav1033 5 ай бұрын

at 1:16:59 shouldnt the formula have negative sign before the hessian inverse

@neelabhsomani5129 10 күн бұрын

We are trying to *maximize* the likelihood function. Hence the formula has a +ve sign instead of a negative sign.

@browndonkey Жыл бұрын

Are the class notes he mentions throughout the course available anywhere for download?

@agustinsalazar9351 Жыл бұрын

In the link to the syllabus in the description there are some lecture notes available, although many are dead links

@patrickt.4121 Жыл бұрын

google it and you'll find them. first hit.

@shashankrana977 Жыл бұрын

See what I am doing is to follow the current year course page for assignments as they are mostly working links. Lecture notes can be found in the course page given in the Lect 1 desc.

@stephendiopter2289 5 ай бұрын

can you share the link of current year course page @@shashankrana977

@All_Kraft 4 ай бұрын

Thank you for explanation. I don’t know why but it’s so annoying, when lecturer constantly erases and writes the same signs((

@stephendiopter2289 5 ай бұрын

the course page has some problem sets and class notes provided by prof. but are inaccessible . Is there any way to get those ? p.s. I just need those problem sets

@stephendiopter2289 5 ай бұрын

never mind. got them

@prienee 4 ай бұрын

@@stephendiopter2289how did you get them?

@sowaszpieg7528 3 ай бұрын

@@stephendiopter2289 where did you find them?

@durai5213 2 ай бұрын

@@stephendiopter2289 Can you help me where to find the lecture note

@durai5213 2 ай бұрын

@@stephendiopter2289 Can you help me where to find the lecture note

@ras4884 11 ай бұрын

someone, buy this guy better markers!

@logeshwaran1537 4 ай бұрын

Whether anybody knows how to get familiar with these concepts..like where to apply and practice these??

@YisneySoto 3 ай бұрын

I'm thinking about ChatGPT. Ask it for exercises and to evaluate your responses.

@viharivemuri7202 4 ай бұрын

While deriving maximum likelihood for linear regression, the professor modelled a gaussian error term. However, for logistic regression he did not use an error term, does anyone know why that is?

@shaksham.22 16 сағат бұрын

You wont need an error term for logistic regression because in linear regression you are trying to predict the h(x) which can vary based only some real world phenomenon. However, in case of classification(for which logistic regression is used) you are more or less trying to fit the h(x) into few defined classes of output, for instance the true or false of an occurrence. Hence presence of error function does not have any effect on the outcome. In other word, the output h(x) is discrete in classification so theres no requirement of an error term. I may be wrong with this though.

@dr.owl_the_great 6 ай бұрын

Where we get problemset of this courses

@ujjolchakrabarty9285 12 күн бұрын

Did you find the problem sets?

@vasudevrv7417 10 ай бұрын

can anyone explain me where that x came from in the final equation of gradient ascent

@morespinach9832 10 ай бұрын

Since when these these basic statistics techniques become “machine learning”??

@closingtheloop2593 4 ай бұрын

Its all marketing. To be fair, much of these results fall out of linear system theory that does not require any statistics. So the branding is somewhat subjective.

@ilpreterosso Жыл бұрын

What happened at 17:45

@sanspapyrus683 Жыл бұрын

Probably a mic failure. Not sure though.

@ShubhamKumar-it2uy 3 ай бұрын

Can anyone explain what had happened to Andrew's voice at 19:32 ?

@mshoshan9698 2 ай бұрын

It seems like they applied some audio distortion effect whenever a student asks a question (to preserve anonymity) that makes their voice sound very deep.

@meeqvin 4 ай бұрын

in our case parameter of learning algorithm(theta) is a cost of our house?

@shaksham.22 16 сағат бұрын

nope X is the cost of house, parameter are weights of every feature at a given point on x that help you identify the corresponding h(x)

@namphan9281 7 ай бұрын

now I know why my university teaches optimization techniques for CS program 💀

@vientios_talisman 10 ай бұрын

1:02:02

@aramuradyan2138 11 ай бұрын

Where are lecture notes?

@lohitaksha244 8 ай бұрын

look up cs229 autumn 2018 on google, you should find the repository maxim5/cs229-2018-autumn

@sanatani_0228 10 ай бұрын

Is it better than his course on coursera or it is same?

@stanfordonline 10 ай бұрын

Hi there, thanks for your comment! The material on coursera is more introductory level and this lecture is from the graduate course CS229 and covers more advanced topics.

@anuragsahu4527 11 күн бұрын

@@stanfordonline so what should i prefer?? this course or the one in coursera??

@McAwesomeReaper 8 ай бұрын

You know he's thought about just getting slightly shorter sleeves tailored, right?

@Lionsboy86 Жыл бұрын

Otu yo

@neelabhsomani5129 7 ай бұрын

Check point 44:16

@malfuriosstormrage5218 4 ай бұрын

THanks. Can you explain what he meant at H(x) is different when using logistic function? Is it because it's bounded [0,1]?

@neelabhsomani5129 10 күн бұрын

@@malfuriosstormrage5218 h(x) is nothing but our hypothesis function. So depending on the task (classification or regression), our hypothesis function will look different. For example, for linear regression our h(x) was w0 + w1x1 + ... + wnxn. (Here w is same as theta, parameters). But h(x) looked different in logistic function. Our hypothesis also depends on preferred outcome. Like you mentioned, h(x) looks different because we want to bound the output to [0,1]. Hope this helps.