Support Vector Machines Part 2: The Polynomial Kernel (Part 2 of 3)

No video

Support Vector Machines Part 2: The Polynomial Kernel (Part 2 of 3)

Рет қаралды 335,210

4 жыл бұрын

Support Vector Machines use kernel functions to do all the hard work and this StatQuest dives deep into one of the most popular: The Polynomial Kernel. We talk about the parameter values and how they calculate high-dimensional coordinates via the dot-product and high-dimensional relationships
NOTE: This StatQuest assumes you already know about...
Support Vector Machines: • Support Vector Machine...
Cross Validation: • Machine Learning Funda...
ALSO NOTE: This StatQuest is based on...
1) The description of Kernel Functions, and associated concepts on pages 352 to 353 of the Introduction to Statistical Learning in R: faculty.marshal...
2) The Polynomial Kernel is also based on the Kernel used by scikit-learn: scikit-learn.o...
For a complete index of all the StatQuest videos, check out:
statquest.org/...
If you'd like to support StatQuest, please consider...
Buying The StatQuest Illustrated Guide to Machine Learning!!!
PDF - statquest.gumr...
Paperback - www.amazon.com...
Kindle eBook - www.amazon.com...
Patreon: / statquest
...or...
KZfaq Membership: / @statquest
...a cool StatQuest t-shirt or sweatshirt:
shop.spreadshi...
...buying one or two of my songs (or go large and get a whole album!)
joshuastarmer....
...or just donating to StatQuest!
www.paypal.me/...
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
/ joshuastarmer
#statquest #SVM #kernel

Пікірлер: 426

@statquest 2 жыл бұрын

Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

@davidonwuteaka2642 Жыл бұрын

How do I get it from Nigeria. I'd love to.

@statquest Жыл бұрын

@@davidonwuteaka2642 Unfortunately I don't have distribution of physical (printed) copies in Nigeria, but you can get the PDF.

@davidonwuteaka2642 Жыл бұрын

Yes, I have been trying to but the site kept rejecting my card. Thanks for your reply.

@statquest Жыл бұрын

@@davidonwuteaka2642 Bummer! I'm sorry to hear that.

@RHONSON100 2 жыл бұрын

Your videos should be mandatory tutorial for Data Science/ ML courses in all the Universities. Students throughout the world would get benefited after watching the best ML video.Hats off to you great Josh Starmer..............

@statquest 2 жыл бұрын

Wow, thanks!

@rameshmitawa2246 2 жыл бұрын

Not mandatory, but my prof recommends this channel after every slide/lecture.

@statquest 2 жыл бұрын

@@rameshmitawa2246 That's awesome!

@tinacole1450 Жыл бұрын

I believe because most instructors don't teach it. They simply give information ....Josh actually explains difficult concepts in a simple way.

@stanlukash33 3 жыл бұрын

I will make it easy for you guys: 3:38 - BAM 4:49 - DOUBLE BAM 5:54 - TRIPLE BAM

@statquest 3 жыл бұрын

Just the hits! BAM! :)

@MrPikkabo 3 жыл бұрын

Thanks I know statistics now

@madhuvarun2790 3 жыл бұрын

Dude, You are amazing. The best tutorial on SVM. I have searched the entire Internet to understand but couldn't. Please continue to make videos.

@statquest 3 жыл бұрын

Thanks, will do!

@atharvapatil6003 11 ай бұрын

Best machine learning playlist I have encountered on the KZfaq . The animations and your funny way of teaching makes it easy to understand concepts. The amount of work you put to create these videos deserves great appreciation. I would definitely recommend to go through the videos for anyone who is reading this comment.

@statquest 11 ай бұрын

Glad you like them!

@jacobwalker6891 2 ай бұрын

I have read and looked at most recommended books and videos on kernels and whilst somewhat familiar with the math, never truly understood the principles. Statquest actually makes complex topics simple, arguably one of the best if not the best teacher on youtube and definitely the best stat explanations. Thanks Josh much appreciated 👍

@statquest 2 ай бұрын

Thank you very much! :)

@marcoharfe9812 4 жыл бұрын

I want to thank you so much for all your videos. I was lost in a forest of vectors matrices and greek letters when I heard about these topics in lecture and I did not understand a thing. As I was practising for the exam, I discovered your videos and now I do actually understand what is happening. Really love the practical, example driven approach!

@statquest 4 жыл бұрын

Awesome!!!! Good luck with your exam and let me know how it goes. :)

@itsfabiolous 10 ай бұрын

Bro you're just a blessing. Never stop with the dry humor. Lot's of love for you!

@statquest 10 ай бұрын

Thank you! Will do!

@priyangkumarpatel9317 4 жыл бұрын

This is one of the best explanation for support vector machines... If anyone is interested in why dot products are integral to the idea of SVM, please refer to Professor Wilson's MIT lecture on SVM... It is another great explanation for SVM...

@statquest 4 жыл бұрын

Thanks! :)

@606Add 4 жыл бұрын

You are videos are simply amazing! And the level of abstraction is right at the sweet spot! Thank you for the extremely thoughtful and precise illustrations!

@statquest 4 жыл бұрын

Thank you very much! :)

@jonathannoll3386 4 жыл бұрын

My man. I'm so happy I have my presentation about SVM's after your uploads... Keep up the great work!

@statquest 4 жыл бұрын

Awesome! :)

@deashehu2591 4 жыл бұрын

I have grown to love your little songs. They sound like Pheobe's songs!!! I have a little question , what do you use for visualization?

@statquest 4 жыл бұрын

Thanks! I draw all the pictures in Keynote.

@gargidwivedi7700 4 жыл бұрын

That's exactly what I and my sister agreed just before we saw your comment! haha.

@statquest 3 жыл бұрын

@Leila Mohammadzadeh Google "svm lagrange dual" and you will see how SVM uses the dot products to find the classifier.

@amalboussere9270 4 жыл бұрын

thank you a lot you are such a big help in this harsh student world god bless you .

@statquest 4 жыл бұрын

I'm glad you like my videos! :)

@palashchandrakar1112 4 жыл бұрын

@@statquest we just don't only like them we love your videos XOXO

@leif1075 3 жыл бұрын

@@statquest this doesnt show where on esrth you dsrive that formula from..WHY do you multiply a times b and then add r .why not multiply all three or add all three..see what I mean? I don't see how anyone could figure itnout..not enough info here to derive it

@hayskapoy 4 жыл бұрын

Would love to see more math after seeing the big picture behind these algorithms 😄

@MrZidane1128 3 жыл бұрын

First of all, thanks for your explanation, after plugging two data points into polynomial kernel function a and b then get the value 16,002.25, then you said we get higher dimensional relationship. Could you elaborate further what "relationship" did you refer to based on the value 16,002.25? Sorry I was not quite sure about that

@statquest 3 жыл бұрын

In some sense the "relationships" are similar to transforming the data to the higher dimension and calculating the distances between data points.

@vedgupta1686 2 жыл бұрын

@@statquest But the value 16002.25 alone is a 1-D data point. How do you suppose that helps us classify? Am I missing something?

@statquest 2 жыл бұрын

@@vedgupta1686 Think of that number is a loss value that is used as input for an iterative optimization algorithm like gradient descent.

@HeduAI 2 жыл бұрын

I thought the whole point of using the kernel trick was to save on the computation cost. If we are using an iterative algorithm anyway, how is that better than transforming the data?

@statquest 2 жыл бұрын

@@HeduAI Either way you would still have to use an iterative procedure. So that computation is fixed.

@kwok9298 2 жыл бұрын

I really appreciate how the way it is explained. Please keep on the good job!

@statquest 2 жыл бұрын

Thank you!

@ahming123 4 жыл бұрын

What do you mean by high dimension relationship??

@huhuboss8274 4 жыл бұрын

like the distance but in higher dimensions

@Actanonverba01 4 жыл бұрын

a synonym for 'high dimension' is many features or variables. Relationship think connection(s). So if we have a high D. relationship, we have a set of many variables that are connected by some idea or mathematical formula. Does that help?

@BrandonSLockey 4 жыл бұрын

watch first video (Part I)

@leif1075 3 жыл бұрын

@@Actanonverba01 that's what I thought but that is irrelevant here because we only have obe variable with two possible categories of values. But of course we can add more connecfions and variables which I think is what you are alluding to

@clapdrix72 2 жыл бұрын

@@leif1075 It's not actually what he means and it's not irrelevant. High dimensional space means we take our original input feature space (in this case just X1) and transform it into higher dimensional space by "making up" new dimensions that are functions of our original dimensions (X1) so that the data is linearly separable in that new space. The pair wise relationships (aka similarity) are the dustances between the observations projected into that higher dimensional space (usually referred to as latent space). So it doesn't matter how many features you have in your original dataset nor how many outcome classes you have - those are irrelevant to the SVM algorithm mechanics, they only change the scale.

@chenghuang4724 Жыл бұрын

Sir, this is the best video for explaining the Kernel!

@statquest Жыл бұрын

Glad you think so!

@rrrprogram8667 4 жыл бұрын

After a lonnnnggg waitttt..... MEGAA MEGAAA MEGAAAA BAMMM is back

@statquest 4 жыл бұрын

Ha! Thank you! :)

@johnjung-studywithme Жыл бұрын

This is how concepts should be introduced to students.. makes so much more sense

@statquest Жыл бұрын

Thank you! :)

@billykristianto3818 7 ай бұрын

Thank you very much, the explanation is easier to understand compare to my class!

@statquest 7 ай бұрын

Glad it helped!

@tymothylim6550 3 жыл бұрын

Thank you for this video! It was very helpful in terms of understanding the details of how the kernel function leads to certain equations that need to be solved to obtain the relevant Support Vector Classifier!

@statquest 3 жыл бұрын

Bam! :)

@flaviodefalcao 4 жыл бұрын

It is awesome and satisfing to be able to learn an intuition with these videos and reading a textbook understanding everything. THANKS

@statquest 4 жыл бұрын

Awesome! I'm glad the videos are helpful! :)

@flaviodefalcao 4 жыл бұрын

@@statquest BAM!!!

@evelillac9718 3 жыл бұрын

You literally saved my homework with your videos

@statquest 3 жыл бұрын

Bam!

@trashantrathore4995 2 жыл бұрын

Earlier i had an intuition of all Algos which was incomplete and which could not be explained to others, Concepts are getting cleared now. Thanks STATQUEST Team, Josh Starmer, will contribute ASA i get a job in DS field.

@statquest 2 жыл бұрын

bam! :)

@muhtasirimran 2 жыл бұрын

Mr. Starmer almost unconsciously changing machine Learning's future 😀

@statquest 2 жыл бұрын

@axa3547 3 жыл бұрын

machine learning algorithimss!!! is it just me or other who has to learn these again n again to fill the gap in knowledge

@statquest 3 жыл бұрын

bam!

@thawinhart-rawung463 Жыл бұрын

Good job Josh

@statquest Жыл бұрын

@technojos 3 жыл бұрын

Thanksss Josh Starmer.I am facinated because of your videos. Please make a video about how 16002.25 is used bam?. Moreover I think that you can make video playlist about how machine learning algorithms has coded double bamm . Keep going man, we love you triple bamm!!!

@statquest 3 жыл бұрын

Great suggestions!

@kevinarmbruster2724 3 жыл бұрын

@@statquest How is the relationship of 16.002,25 to be interpreted? I understood that if we transfer everything to the higher dimension we can solve it, but I did not understand the part about relationships between the points and how they help.

@statquest 3 жыл бұрын

@@kevinarmbruster2724 We plug the relationships into an algorithm that is similar to gradient descent and it can use them to find the optimal classifier. However, the details are pretty complex and would require another video.

@edmondkeogh4057 3 жыл бұрын

the beep boop thing was hilarious

@statquest 3 жыл бұрын

@sinarb2884 3 жыл бұрын

I could be wrong, but I think there is a slight mistake in this video. The kernel function should be of the form (ab-1/2)^2. This is because the support vector classifier is essentially thresholding based on whether x>y or not. Let me know please if I am wrong. And, thanks for your cool videos.

@statquest 3 жыл бұрын

Most people define it the way I defined it in the video, (ab + r)^d. For more details, see: en.wikipedia.org/wiki/Polynomial_kernel and Page 352 of the Introduction to Statistical Learning in R.

@nightawaitsusall9607 4 жыл бұрын

You my friend are a champion. Yes.

@statquest 4 жыл бұрын

Thank you! :)

@benardmwanjeya8371 4 жыл бұрын

God bless you Josh STARmer

@statquest 4 жыл бұрын

Thank you very much! :)

@eric752 2 жыл бұрын

One suggestion: if at the beginning, if the all the topics are listed in a logical way, it would even better. Big thanks for the videos, really appreciate it 🙏

@statquest 2 жыл бұрын

Thanks!

@eric752 2 жыл бұрын

@@statquest thank you

@aryamahima3 2 жыл бұрын

@5:09, u said that we need to calculate dot product between each pair of point. How do we use this dot product further? could u please clear to me, u r the only person on whole internet who can clear this. :D

@statquest 2 жыл бұрын

We use it as input to an iterative optimization algorithm similar to gradient descent. For details on gradient descent, see: kzfaq.info/get/bejne/qaqmZ8ll2Ji3cmw.html

@aryamahima3 2 жыл бұрын

@@statquest thank u so much ☺️

@manasadevadas8685 3 жыл бұрын

First of all thankyou so much for explaining with such amazing illustrations. One doubt, how can we actually use relationship between points to find the support vector classifier?

@statquest 3 жыл бұрын

Unfortunately that's a difficult question to answer and I'd have to dedicate a whole video to it. However, the simple answer is that it uses a method like Gradient Descent to find the optimal values.

@manasadevadas8685 3 жыл бұрын

@@statquest Thanks for the response! Hopefully later you'd dedicate a whole video to it :)

@yulinliu850 4 жыл бұрын

Awesome! Josh is back.

@statquest 4 жыл бұрын

@dok3820 2 жыл бұрын

Thank you Josh. Just..thank you

@statquest 2 жыл бұрын

@temesgenaberaasfaw5076 4 жыл бұрын

best tutorial for SVM , YOU DID IT THANKS

@statquest 4 жыл бұрын

Thank you! :)

@tuongminhquoc 4 жыл бұрын

First comment! I have turned on notification for your videos. I love all of your videos!

@statquest 4 жыл бұрын

Awesome! Thank you! :)

@NathanPhippsONeill 4 жыл бұрын

Amazing vid! Thanks helping me prepare for my Machine Learning exam 😁

@statquest 4 жыл бұрын

Good luck and let me know how it goes. :)

@NathanPhippsONeill 4 жыл бұрын

@@statquest It went well for a difficult exam. BUT I had a lot to write about thanks to this channel. Appreciate it ❤️

@statquest 4 жыл бұрын

@@NathanPhippsONeill Hooray!!! That's awesome and congratulations. :)

@harithagayathri7185 4 жыл бұрын

Great explanation 👍 Thanks a ton Josh!!. But, a bit confused here on how to calculate appropriate 'r' coefficient for the eqn.I understand that 'd' value is calculated using Cross Validation

@statquest 4 жыл бұрын

'r' is also determined by cross validation, but I am under the impression that it doesn't have as much impact as 'd'. It basically scales things by a constant, rather than adding extra dimensions.

@thememace 3 жыл бұрын

@@statquest What's the point of setting r anyway since it later gets completely ignored?🤔

@statquest 3 жыл бұрын

@@thememace I'm not sure

@rohanpatel702 2 жыл бұрын

@@thememace it doesn't get completely ignored. When r=1/2, the math works out such that the x-axis doesn't get scaled at all. But when r=1, the x-axis gets scaled by sqrt(2). Even though the third element of the vectors combined by dot product is a constant (and thus ignored), the choice of r still affects how the dot product evaluates because of how it changes the first element of each vector.

@tinacole1450 Жыл бұрын

Does anyone laugh at how silly yet genius Josh is? Loved the robot.. I rewinded to do the robot.

@statquest Жыл бұрын

You are my favorite! Thank you so much! I'm glad you enjoy the silly sounds.

@preeethan 4 жыл бұрын

Amazing explanation:) We find the High Dimensional Relationship between 2 points to be 16002.25. Practically what do we do with this value.? How do we find the Support Vector Classifier with this value.?

@statquest 4 жыл бұрын

It's quite complicated - way too complicated to be described in a comment.

@preeethan 4 жыл бұрын

StatQuest with Josh Starmer Okay. I love all you videos, especially your intro songs! Great work keep it going Josh :)

@sanjivgautam9063 4 жыл бұрын

I want this answer too!

@balasubramanian5232 3 жыл бұрын

@@statquest I want answers for the question. It'll be helpful if you could share links to resources on this

@statquest 3 жыл бұрын

@@balasubramanian5232 Google "svm lagrange dual" and you will have lots and lots of resources.

@manaspatil4316 3 жыл бұрын

God bless you !!!

@statquest 3 жыл бұрын

@commentor93 2 жыл бұрын

I've understood more than I ever expected to understand in this topic all thanks to your videos. But now I've stumbled a bit: How do you solve a constant like the one in 5:50? Or what does solving mean in that context now that it isn't a formula? Could you please expand on that?

@statquest 2 жыл бұрын

Think of it as a loss value, and it is something we try to optimize with an iterative algorithm that is similar to Gradient Descent: kzfaq.info/get/bejne/qaqmZ8ll2Ji3cmw.html

@shahbazsiddiqi74 4 жыл бұрын

waited too long... Thanks a ton

@L.-.. 4 жыл бұрын

After we find the dot product, with that value how we decide whether the new sample belongs to positive class or negative class? Please clarify Josh.

@statquest 4 жыл бұрын

It's a little too much to put into a comment. The purpose of the video was only to give insight into how the kernel works, not derive the math.

@harshitsati 3 жыл бұрын

Thank you angel

@statquest 3 жыл бұрын

bam! :)

@alternativepotato 3 жыл бұрын

i love u my man you really are a life saver. Just because of that i am gonna buy a tshirt

@statquest 3 жыл бұрын

BAM! Thank you very much! :)

@harshitamangal8861 4 жыл бұрын

Hi Josh, the explanation is amazing. I had a question- you said that the equation (a*b + r) ^d is used for finding the relationship between two points, how is this found relationship used for getting where the Support Vector Classifier?

@statquest 4 жыл бұрын

Unfortunately the details of how it is used would require a whole video and I can't cram it into a comment. However, making the video is on the to-do list.

@zheyuanzhou3165 4 жыл бұрын

super clear tut. Thank you very much! But as a non-English native speaker, I am a little confused, what is BAM trying to express?

@statquest 4 жыл бұрын

kzfaq.info/get/bejne/n5qZiNmb2K2nfZc.html

@zheyuanzhou3165 4 жыл бұрын

@@statquest A tut for BAM! cool lol

@sornamuhilan.s.p 4 жыл бұрын

John Starmer, you are a genius sir!!

@statquest 4 жыл бұрын

Thank you! :)

@vincent-paulvincentelli2627 3 жыл бұрын

Great video ! It would be very nice to have such an intuitive one for kernel PCA :)

@statquest 3 жыл бұрын

I'll keep that in mind.

@TaylorSparks 2 жыл бұрын

bam. love it homie. keep it up

@statquest 2 жыл бұрын

Thank you!

@muhammadavimajidkaaffah7715 4 жыл бұрын

SVM for multiclass please, I like your video so much.

@ronitganguly3318 2 жыл бұрын

The high dimensional relationship you calculated at the end is a number which tells what exactly? How does it help to pseudo transform into higher dimensions?

@statquest 2 жыл бұрын

Are you familiar with Gradient Descent? kzfaq.info/get/bejne/qaqmZ8ll2Ji3cmw.html SVMs use a different algorithm, but the idea is similar, and you can think of the numbers, like 16002.25 as values that the algorithm is trying to optimize.

@rajdeepkumarnath8944 2 жыл бұрын

I once knew a kernal, whose name was Fred, But thats not the path we are gonna tread. (thats a better song Josh :D )

@statquest 2 жыл бұрын

bam!!!

@MrWincenzo 4 жыл бұрын

since the kernel requires to calculate the dot product for each couple of points, suppose we have 10 points when we do it just for each point with respect to the others and itself we should obtain 10 different dot products for each single point. Which one of those 10 dot products become the new "y" dimension of the point?

@statquest 4 жыл бұрын

None of them end up being the new "y" dimension. The kernel trick works without having to make that transformation. We use the transformation to give an intuition of how the process works, but the kernel trick itself bypasses the transformation. This is the "kernel trick", and I mention it in the first video in the series on SVMs: kzfaq.info/get/bejne/m8yCZKZnqNzMnXk.html

@MrWincenzo 4 жыл бұрын

@@statquest yes i misunderstood before, now i got it: when we plug the values into the polynomial expression is equivalent to calculate the dot product in higher dimensions. And since the SVM only depends on those dot products among point we have just "improved" the classification mimicking the dot product in higher dimensions as musch as infinite like with RBF. Still thank you for all your efforts and your gentle replies to our questions. Regards.

@beshosamir8978 2 жыл бұрын

quick question : why it is useful to calculate the relationships between every two point regardless in any dimensions , how it can be useful for calculating the decision boundary ?

@statquest 2 жыл бұрын

SVM's are optimized using an iterative algorithm that is similar to Gradient Descent, and the relationship values are essentially the "loss" values and help move the SVC to the correct spot.

@beshosamir8978 2 жыл бұрын

@@statquest So how to know That Is the best dimension i'm looking for according the relationship between every two points?

@statquest 2 жыл бұрын

@@beshosamir8978 www.cs.cmu.edu/~epxing/Class/10701-08s/recitation/svm.pdf

@dimitrismarkopoulos3964 2 жыл бұрын

First of all congratulations! your videos are super explanatory! One question: The equation of the polynomial kernel has always the same form?

@statquest 2 жыл бұрын

As far as I know. However, the variables might have different names.

@berknoyan7594 4 жыл бұрын

Hi Josh,Thanks for the video. You are helping me a lot. I have just one question. What do you mean by "high dimensional relationship"? Because It can be achieved by any 2 numbers that has multiplication result of 126 which is Infinite.Its just a dot product of two 3 dimensional data.Cross Validation uses misclassification rate to select best r and d as far as i know. Do CV use these numbers on any calculation?

@statquest 4 жыл бұрын

Cross Validation does not use these high-dimensional relationships. Instead, the algorithm that finds optimal fits, given constraints (like the number of misclassifications you will allow) uses them. Although the dot product seems like it would be too simple to use, it has a geometric interpretation related to how close the points are to each other. For more details, check out the Wikipedia article: en.wikipedia.org/wiki/Dot_product

@marijatosic217 4 жыл бұрын

Thank you for the video! And now, what does this number 16002.25 tell us? :D How will we know what the right dosage?

@statquest 4 жыл бұрын

That's just an example of the kind of values that are used by the kernel trick to determine the optimal placement of the support vector classifier.

@marcelocoip7275 2 жыл бұрын

Visually thinking about the last set of data: if you can draw a line to separate the data if you square each observation to the y-axis, then you can draw a line independently of the scale/ratio of the x-axis. Then I see is that the only thing that it is adding "solving/math value" is increasing the order of the xi-axis to fit a hyperplane (d value). What r contributes to arrive to a better solution?

@statquest 2 жыл бұрын

I don't think it adds much.

@donaldmahaya2689 4 жыл бұрын

I'm always left with the illusion that I understood what you just said.

@statquest 4 жыл бұрын

@donaldmahaya2689 4 жыл бұрын

@@statquest Re-watched it and I did get it after all. BAM!

@XoXkS 4 жыл бұрын

Another Great thing, besides the astonishing easy explanations, is the way you talk. You talk so slow, that I can watch the easy parts easily on 1.5 Speed and the hard parts on normal speed. Most people, when they talk slow, talk slow by making long pauses in between words, this way watching at a higher speed sounds very unnatural. You sound just fine on normal and 1.5 Speed!

@statquest 4 жыл бұрын

bam!

@iisc2022 Жыл бұрын

thank you

@statquest Жыл бұрын

Welcome!

@DeepakSingh-fo2wm 4 жыл бұрын

I am still not clear what happened after finding a relationship in higher dimension like in the video what happened after finding 16002.25 ?? Can you please add a short video over the same if possible.

@statquest 4 жыл бұрын

It would be a long video, but it's on the to-do list.

@geo1997jack 3 жыл бұрын

I did not understand what that 16000 value means or how it helps us. Could you please clarify? Everything else was crystal clear :)

@statquest 3 жыл бұрын

It's used as a measure of the relationship between two points. Once we calculate the relationships between all of the points, they are used in a method similar to Gradient Descent to find the optimal classifier.

@tsunningwah3471 23 күн бұрын

amazing

@statquest 23 күн бұрын

Thanks!

@p-niddy 2 жыл бұрын

What does the "relationship" between two points actually signify? Based on this video, it looks like a number without much meaning that you can map onto the graph.

@statquest 2 жыл бұрын

It has no use for us. However, the algorithm that finds the optimal support vector classifier can use those values to do it's job.

@jhfoleiss 4 жыл бұрын

Great explanation, thanks! One question: what happens when a and b are vectors? I understand that in this quest you wanted to give a simple example (with a single feature) to make things clear. If the answer to this question is in another quest, i'll gladly wait for it :)

@statquest 4 жыл бұрын

If 'a' and 'b' are vectors (because you have measured more than one thing per observation), then you just multiply a^T b, where a^T = a transpose.

@primeprover 4 жыл бұрын

@@statquest Doesn't that assume all the features have the same impact on the outcome? I would have thought that some form of weighting in the sums in the dot product of a and b would be necessary.

@statquest 4 жыл бұрын

@@primeprover That's a good point. Like PCA, SVMs are sensitive to scale, so the first thing you would do is normalize all of the variables you've measured.

@primeprover 4 жыл бұрын

@@statquest Surely more than just normalization is needed? If you provide two normalized variables to a linear regression model they will each get their own coefficient. One could be 1 and the other 0.1. As far as I can see we seem to be giving all features a coefficient of 1 in the models you described? I would have thought that all but one of the additional features(the other would be 1) would need an extra model parameter to scale it in relation to the others.

@statquest 4 жыл бұрын

@@primeprover I think conceptualizing SVMs in terms of linear or logistic models can be a little misleading. The choice of the parameters for the kernels, unlike linear or logistic regression, do not represent a relationship between the data and the classification. All the SVM is doing is applying relatively arbitrary transformations to the data to increase the dimensionality in a way that might be helpful for separation.

@abrahamjacob7360 4 жыл бұрын

Josh, this is a great video. One question on the Polynormal Kernal derivation. So the original problem was to find a classification point to find drug usage limits that cures or doesnt cure the disease. When we increased the value of 2, you mentioned it introduced a second dimension. I understood, how squaring the value helped to find a better Marginal classifier line, but ideally there is no meaning to the y axis here right, because the case still remains the same. We are just finding if the drug usage had a positive or negative impact. we could still use the y axis to determine its efficity, but if we increase the value to 3, what would Z axis represent here. Sorry if the question was confusing

@statquest 4 жыл бұрын

The new dimensions don't mean anything at all - they are just extra dimensions that allow us to curve and bend the data so that we can separate it. The more dimensions, the more we can curve and bend the data.

@aaditstudent 6 ай бұрын

Hey guys, did any of you figure out why we only need to transform the data to compute the dot product, and not tranform it ? Thanks in advance! :)

@statquest 6 ай бұрын

The kernel function itself is enough to give a metric of distance, which can be used for an iterative optimization procedure.

@leonugraha 4 жыл бұрын

Thank you for SVM follow up video, by the way, do you maintain a Github account?

@statquest 4 жыл бұрын

I should...

@chinzzz388 4 жыл бұрын

When we calculate relationships between 2 data points, do we calculate relationships between all the points w.r.t all the other points? Ex: if we have 4 data points (1,2,3,4) do we calculate relationship between (1,2) and (3,4) OR do we calculate relationship between (1,2),(1,3),(1,4),(2,3)...etc

@statquest 4 жыл бұрын

We calculate all of the relationships.

@The_Mashrur 2 жыл бұрын

When you say relationships between observations, what exactly do you mean? You didn't really go over how such relationships allow you to find an SVC in the higher dimension?

@statquest 2 жыл бұрын

In the case of SVM, the relationship is a rather abstract metric of distance.

@hrdyam865 4 жыл бұрын

Thanks for the videos 😊, Can we use SVM for multinomial classification?

@statquest 4 жыл бұрын

I believe you just create one SVM per classification, and each SVM compares one classification to all the others (i.e. a sample either has that classification or not).

@nick_g 3 жыл бұрын

I get the feeling some linear algebra might help with this stuff. I’m no expert here but it kernels remind me of how an extra column is added to a matrix in order to transform to a higher dimension without changing the original values I saw in a computerphile video: kzfaq.info/get/bejne/rLdmY9V33M6WmZs.html Also there’s a video I watched about factoring polynomials with matrices in the numberphile channel that might apply: kzfaq.info/get/bejne/rbqFht1erbnFcps.html

@statquest 3 жыл бұрын

Noted

@rajatsankhla9261 2 жыл бұрын

Hii Josh could you help me understand how one should choose the value of r in the kernal function.

@statquest 2 жыл бұрын

In theory, cross validation would work. This is not something I've done before but my guess is that it might not matter much.

@raktimnaskar2333 8 ай бұрын

Can anyone explain to me how the dot products of the feature vectors can find the separating hyperplane?

@statquest 8 ай бұрын

First, think of a dot product as a type of measure of similarity (the larger the absolute value, the more similar) and that similarity can be a proxy for closeness. Then those measures are plugged into an iterative algorithm, somewhat like gradient descent (see: kzfaq.info/get/bejne/qaqmZ8ll2Ji3cmw.html ), to find the optimal classifier.

@stoicism-101 2 жыл бұрын

Dear Sir, Kernels are basically used for finding the relationship between two points using the formulae. How do we further find the Support vector classifier?

@statquest 2 жыл бұрын

The SVC is found using an iterative process that is a lot like Gradient Descent, and the output from the kernels is like the "loss" values.

@harishh.s4701 2 жыл бұрын

Hi, Thanks a lot for your content. It is very easy to understand and I appreciate your way of explaining things. I had one doubt. Can you please explain how does Cross-validation help to determine the optimal degree of the polynomial kernel used in SVM's?

@statquest 2 жыл бұрын

I do that in this video: kzfaq.info/get/bejne/bqdnf5N42KjNfIU.html

@hamidomar3618 2 жыл бұрын

Hey, great video, thanks! What happens after the transformation though? I mean, how does the final result. i.e. a scalar corresponding to relationship between each observation, help in identifying an optimally classifying hyperplane?

@statquest 2 жыл бұрын

The value is used in a way similar to how loss values are used in Gradient Descent. There is an iterative algorithm that uses the values to optimize the fit.

@lonandon 3 жыл бұрын

What does the result of the dot product mean when it represents the relationship of two dots?

@statquest 3 жыл бұрын

It's the input to an iterative algorithm, much like gradient descent, that can find the optimal classifier.

@gunamrit 6 ай бұрын

Hello at 6.04 you said why is it a dot product is beyond the scope of this video.. can you help me materials which can help me go through it to find out why it's a dot product and not a cross product? maybe a book will do. Thanks

@statquest 6 ай бұрын

Just google "support vector machine optimization"

@annusrivastava4425 4 жыл бұрын

To find the value of r and d, can we use GridSearhCV as well?

@statquest 4 жыл бұрын

Yes. GridSearchCV is just a way to do CV.

@zeynabmousavi1736 4 жыл бұрын

How overfitting is evaluated in SVM? How do you check whether the output of SVM is generalizable or not?

@statquest 4 жыл бұрын

You compare the classifications made with the training dataset to classifications made with the testing dataset.

@zeynabmousavi1736 4 жыл бұрын

@@statquest Thank you. I should have mentioned that I have small data set and I take all datapoints as training set and do 10 fold cross validation. I am concerned about having ovefitting.

@abhishekanand5974 3 жыл бұрын

What exactly is meant by relationships between observations?

@statquest 3 жыл бұрын

It's some metric of distance.

@pratyanshvaibhav Жыл бұрын

respected josh sir, thank you for such amazing explanation..sir please help me i have a doubt. will we take the dot products for every pair of points like first red point with all the green points and then so on or we will take first red point with first green point and so on..

@statquest Жыл бұрын

All pairs

@pratyanshvaibhav Жыл бұрын

Thank you sir

@digitalzoul57 3 жыл бұрын

Hi StatQuest. you said the 'a' and 'b' are two different observations is this means that the k(a, b) depends on the number of classes. For example, if I have 4 classes does it means k(a, b, c, d)?

@statquest 3 жыл бұрын

I'm not sure how this works with more than 2 classes. Usually when there are more than 2 classes, people create one classifier per class and do 1 vs all other classification. So each classifier is still only separating 2 classes.

@suyashmishra8821 2 жыл бұрын

Hello sir, In the above example It was clear that new transformed axes were a,a^2 but It wasnt clear the mechanism how classifier draws line. Do we get the equation of that classification line from kernel function,dot product or something related?

@statquest 2 жыл бұрын

The output of the kernel function (the dot-products) is fed into an iterative algorithm (similar to gradient descent) to find the optimal support vector classifier.

@RowoonSamshu 3 ай бұрын

I don't understand why we need to calculate the dot products at all. I have a basic idea that the loss function for svm includes calculation of dot products between the observations but I don't understand the intuition behind it. i.e. what actually dot products (similarities) between observations do in finding the hyperplane that classifies the observations. And also they say we have to minimize |w| to get the optimal hyperplane but what is the geometrical intuition behind minimizing the |w|

@hassanjb83 4 жыл бұрын

At 6:33 you mention that we need to determine the value of both r and d through cross validation. If we have one dimensional data then shouldn't be d = 2 only?

@statquest 4 жыл бұрын

Why do you say that?

@hemersontacon3168 4 жыл бұрын

I think you got too attached to the example. Imagine the same example but with the two colors all mixed up. Then I think that d = 2 would not be enough to split things up!

@ccuny1 4 жыл бұрын

@@hemersontacon3168 That's an insightful comment that actually opened my eyes. Thank you.

@hemersontacon3168 4 жыл бұрын

@@ccuny1 Glad to know and glad to help ^^

@slirpslirp 4 жыл бұрын

awesome, so the dot product is equal to the result of the kernel function ?

@statquest 4 жыл бұрын

yep!

@utkarshagrawal4708 2 жыл бұрын

Any resources for understanding why the dot product?

@statquest 2 жыл бұрын

I'm not sure I fully understand your question - but I'm guessing you are asking how the dot product leads to the optimized support vector classifier. Think of it as the loss function that we use for gradient descent.

@rishabhmalhotra127 3 жыл бұрын

This StatQuest isn't about a kernel named Fred coz it's about a kernel named Polynomial. Hilarious xD

@statquest 3 жыл бұрын

Thanks! :)

@rezasaifuddin3967 3 жыл бұрын

Can someone explain to me what is the affect of our SVC due to the selection of r value ?

@statquest 3 жыл бұрын

I'm not sure it has a huge effect, as long as it is not 0.

@Han-ve8uh 3 жыл бұрын

1. Is there a relationship between the values of d and r in polynomial kernel, and the number of output dimensions? 2. At 3:27 why is the 3rd term ignored, is this part of the kernel trick? Are the 3rd terms always the same no matter what d or r is used? 3. It seems that the dot product exists only because d=2 which after expansion allows the expression to be expressed as a dot product, if d=3 then we cannot express as a dot product of 2 terms anymore? 4. Does this whole video apply to other kernels too?

@statquest 3 жыл бұрын

1) d ends up being the number of dimensions. 2) Regardless of the values for 'a' and 'b', the last dimension will always have the exact same value, 1/2. Thus, it will not help us establish how 'a' and 'b' are related. 3) If d=3, then we get a^3b^3 + 3a^2b^2 + 3ab + 1 = the following dot product (a^3 + sqrt(3)2^ + sqrt(3)a + 1) dot (a^3 + sqrt(3)2^ + sqrt(3)a + 1) 4) This video provides the background for understanding how the RBF kernel works. For details on that, see: kzfaq.info/get/bejne/h8llfNx9vMXMnqc.html

@Han-ve8uh 3 жыл бұрын

@@statquest Thanks a lot now i get the idea of how you can for 3), always use square roots to split up the constants, then put a and b into the 2 terms of the dot product. Also, for 1) I see the number of terms of (x+y)^n is n+1, but we always throw the last term since it's a constant that square roots to equal constants for both points, so end up having n dimensions. How did people invent these kernels? Did the kernel trick come later as a hack to overcome computation constraints, or it came first before a whole class of kernels was discovered possible? Also, why is there an obsession with learning straight lines through the data (no matter raw/dimension raised), has this got to do with limitations of the optimization method (I think you mentioned in other comments it uses gradient descent). Because i'm thinking if it could generate non straight lines, then maybe there's no need to raise to higher dimensions?

@statquest 3 жыл бұрын

@@Han-ve8uh I don't know how people came up with the kernel trick. However, straight lines are usually much easier to optimize than curved ones. However, neural networks, which I describe here, fit curved lines to data: kzfaq.info/get/bejne/edd_mcxllrLKdKs.html

@shivakiranreddy4654 2 жыл бұрын

Hi Josh, I couldn't get how 16002.25 will help us in drawing the Support Vector Classifier, In comments below you mentioned: "In some sense the "relationships" are similar to transforming the data to the higher dimension and calculating the distances between data points." even this above explanation did not help, if 16002.25 is one of the 2-dimensional relationships that we need to solve for the support vector classifier, what is the other one? how do we get the classifier?

@statquest 2 жыл бұрын