7.5 Gradient Boosting (L07: Ensemble Methods)

Рет қаралды 12,169

3 жыл бұрын

In this video, we will take the concept of boosting a step further and talk about gradient boosting. Where AdaBoost uses weights for training examples to boost the trees in the next round, gradient boosting uses the gradients of the loss to compute residuals on which the next tree in the sequence is fit.
XGBoost paper mentioned in the video: dl.acm.org/doi/pdf/10.1145/29...
Link to the code: github.com/rasbt/stat451-mach...
-------
This video is part of my Introduction of Machine Learning course.
Next video: • 7.6 Random Forests (L0...
The complete playlist: • Intro to Machine Learn...
A handy overview page with links to the materials: sebastianraschka.com/blog/202...
-------
If you want to be notified about future videos, please consider subscribing to my channel: / sebastianraschka

Пікірлер: 19

@deltax7159 2 ай бұрын

this is the first video of yours I have come across, and it's by far the best I have found on this topic. Will be binging everything you have to offer from now on. Thanks for all the content, man!

@yerhoam 7 ай бұрын

Thank you for the great explanation ! I liked the way you say "prediction" :)

@nazmuzzamankhan4764 3 жыл бұрын

I really liked the way you explained the steps with numbers. It helped me a lot to understand the notations of the equations.

@SebastianRaschka 3 жыл бұрын

glad to hear that it was useful!

@rohitgarg776 2 жыл бұрын

Thanks, explained very nicely

@hassandanamazraeh5975 Жыл бұрын

A great course. Thank you very much.

@SebastianRaschka Жыл бұрын

Thanks for the kind words! Glad to hear it was useful!

@newbie8051 9 ай бұрын

Well I understood the Gradient Boosting part, as in we focus on the residuals and further make trees to lower the loss of previously_made_trees. But couldn't grasp how XGBoost achieves this via parallel computations. Guess will have to read the paper : )

@just4onecomment 3 жыл бұрын

Hi Professor, thank you very much for the educational video! Do you have any thoughts on how this stepwise additive model compares to fitting a very large model with many parameters in a "stepwise" fashion based on gradient descent? For example, freezing and additively training subnetworks of a neural model.

@SebastianRaschka 3 жыл бұрын

Interesting question. There's something called layerwise pre-training in the context of neural networks. It's basically somewhat similar to what you describe, training one layer at a time. The difference is really the structure of the model though, because it's fully connected layers rather than tree-based. But yeah, it's an interesting thought

@urthogie 5 ай бұрын

Why does the tree in step 2 not have a third decision node to split Waunake and Lansing?

@asdf_600 2 жыл бұрын

Very nice video :) I was wondering why for gradient boosting we fit the derivative instead of the residual ? Intuitively that's what I would do :/

@SebastianRaschka 2 жыл бұрын

Good question. If we consider the squared error loss, "1/2(yhat-y)^2" we have "yhat-y" as the derivative but it is also what people refer to as residual in a linear regression context. Or in other words, the derivative looks like the residuals, so we basically do fit it to the derivative. If the loss is not the squared error loss, the derivative may be different, so we call it "pseudo residual" in general. However, we could also just be calling it loss derivative and don't use the term pseudo residual at all. I think it's just a convention in gradient boosting contexts to use the term pseudo residual.