as a student from cs at Tsinghua, I would say this is the best course in ML you can find out there
@kilianweinberger6984 жыл бұрын
Thanks! Please send my warmest regards to Prof. Gao Huang from me.
@matthieulin3354 жыл бұрын
@@kilianweinberger698 will do when this virus ends !
@user-me2bw6ir2i Жыл бұрын
I'm incredibly grateful for your intuitive explanation of SVM, that really helped me to understand this topic.
@raviraja26913 жыл бұрын
I really want to put my laptop away... But I'm watching Prof Kilian's awesome lectures... So can't help it!
@rajeshs28404 жыл бұрын
Thank you Prof ... After your videos ,started loving ML..
@StevenSarasin11 ай бұрын
log(cosh(x)) is such a clever idea, asymptotically linear and locally (near x=0) quadratic, a smooth version exactly analogous to the Huber idea of mixing L1 and L2 norm. Worth checking out the taylor expansion (which can be thought of us a microscope for functions to say what polynomial does this function look like close to a point (typically 0). You will get that cosh(log(x)) = .5*x^2 + O(x^4), i.e. quadratic near 0.
@sansin-dev4 жыл бұрын
Brilliant lecture.
@in100seconds54 жыл бұрын
Boy this is wonderful
@8943vivek3 жыл бұрын
Wow! CRISP!
@JoaoVitorBRgomes3 жыл бұрын
@killian weinberger in the first 20 min of lecture you say the derivative of squared loss is the mean. But shouldnt it be the bias and variance? Or the intercept or the weights?
@omarjaafor66462 жыл бұрын
Where were you all these years
@jachawkvr4 жыл бұрын
I loved your visualization for l1 and l2 regularization. I had seen these before, but never really understood what they meant. I have a question here : How would we optimize the objective function while using l1 regularization? I think gradient descent would not work well since the function is not differentiable at some very key points.
@kilianweinberger6984 жыл бұрын
Yes, good point. SGD gets a little tricky. If you use the full gradient (summed over all sample) you can use sub-gradient descent. As long as you make sure you reduce your step size it should converge nicely.
@Theophila-FlyMoutain4 ай бұрын
Hi Professor. Thank you for sharing the video. I am now using Gaussian Process Regression in physics field. One thing I noticed is that even though there exists specific loss function for GPR, many people use root-mean-squared-error as loss function. Is there any rule to choose loss function and regularization?
@theflippedbit4 жыл бұрын
Hi, professor. I really like your way of explaining the ML concepts. i wish there were assignments/quizzes on the related topics, where we could try out these learning algos and get a more hands-on experience. i checked the course page but couldn't find any assignments.
@kilianweinberger6984 жыл бұрын
Past 4780 exams are here: www.dropbox.com/s/zfr5w5bxxvizmnq/Kilian past Exams.zip?dl=0 Past 4780 Homeworks are here: www.dropbox.com/s/tbxnjzk5w67u0sp/Homeworks.zip?dl=0 Unfortunately, I cannot hand out the programming assignments from the Cornell class. There is an online version of the class (with interactive programming assignments and all that stuff), but the university does charge tuition. www.ecornell.com/certificates/technology/machine-learning/
@danielrudnicki883 жыл бұрын
@@kilianweinberger698 Is there a current version of link with exams? The one above unfortunately expired. Thanks for these amazing lectures:)
@vishnuvardhanchakka13083 жыл бұрын
Sir, In Plots of Common Regression Loss Functions , x-axis should be h(Xi) - Yi but in course page its showing h(Xi) * Yi
@sekfook973 жыл бұрын
I start to understand why optimise wTw instead of just W. wTw would be a scalar value and w is a vector. Guess it is much easier for us to use a scalar value as a constraint? also it would form a bigger vector space for us to search for optimal W if our constraints is wTw
@kilianweinberger6983 жыл бұрын
It is tricky to optimize over a vector, like w. Imagine w is two dimensional, which vector is more optimal [1,2] or [2,1], or [4,0]? When you optimize w’w you get a scalar for which minimization and maximization are well defined.
@sekfook973 жыл бұрын
@@kilianweinberger698 thanks for the detailed explanation!
@hdang19974 жыл бұрын
Is using MAP estimation synonymous to regularizing?
@kilianweinberger6984 жыл бұрын
No, not exactly, but in many settings the resulting parameter estimate is identical to what you would obtain with a specific regularizer (depending on the prior). The idea of enforcing a prior can be viewed as a form of regularization.
@bnglr3 жыл бұрын
does this has anything to do with ERM?
@aloysiusgunawan77092 жыл бұрын
Hello Prof, if the constraint is w1^2 + w2^2
@kilianweinberger6982 жыл бұрын
Yes, here B is the squared radius.
@smsubham342 Жыл бұрын
Why squared loss estimates mean and absolute error estimates median? Googled on this but no clear answer.
@kilianweinberger69811 ай бұрын
You can derive it pretty easily if you let your classifier be a constant predictor. Let's call your prediction p. What minimizes 1/n \sum_i=1^n (p-y_i)^2 ? If you take the derivative and equate it to zero you will see that the optimum is when p is the mean of all y_i. You can proof a similar result for the median if it is the absolute loss. Hope this helps.
@rodas4yt1373 жыл бұрын
Never seen 0 dislikes on a 10k views video though
@KulvinderSingh-pm7cr5 жыл бұрын
I need a little help, I am studying learning theory and need some good quality material for developing intuition about it. It would be greatly helpful if professor or anyone can refer to some sort of resource or something to learn more. Will be happy if anyone can help, Thanks a lot in advance.
@kokonanahji90625 жыл бұрын
this series kzfaq.info/love/R4_akQ1HYMUcDszPQ6jh8Qplaylists might help
@xenonmob3 жыл бұрын
put your laptops away?
@kareemjeiroudi19645 жыл бұрын
What the heck? That's like the first 10 minutes of my lecture in Theoretical Concepts of ML. I wish my lecture was that easy.