Kaggle's 30 Days Of ML (Competition Part-6): Model Stacking

No video

Kaggle's 30 Days Of ML (Competition Part-6): Model Stacking

Рет қаралды 11,066

Күн бұрын

This video is a walkthrough of Kaggle's #30DaysOfML. In this video, we will learn what model stacking is and how to do it in a proper manner! Check part-5 for model blending!
Notebook:
Note: this video is not sponsored by #Kaggle!
Please subscribe and like the video to help me keep motivated to make awesome videos like this one. :)
To get my book, Approaching (Almost) Any Machine Learning Problem, for free, please visit: bit.ly/approac...
Follow me on:
Twitter: / abhi1thakur
LinkedIn: / abhi1thakur
Kaggle: kaggle.com/abh...

Пікірлер: 52

@abhishekkrthakur 3 жыл бұрын

Notebook is here: www.kaggle.com/abhishek/competition-day-6-stacking Like, subscribe, and share to help me keep motivated to make more amazing videos like this one ;)

@channelname9332 3 жыл бұрын

lub u!

@channelname9332 3 жыл бұрын

i m new in ml and i used your blending notebook and used 3 lightgbm 5 xgboost model and got top 20. thanks. i was reading article where the guy used stacking on around 6 model and added to its blender along with the same models he used for blending.

@sarthaksingh2175 3 жыл бұрын

Best series on Kaggle competitions EVER

@israelMendonca 3 жыл бұрын

I have been in Kaggle for years, but I was always lost on how to start it, so I never did. So many competitions and the impostor syndrome was always there. This 30DaysOfML was a really good starting point. Thanks for the videos Abhishek and for the support to the community in general. I hope one day I can contribute as well. Gonna keep working on it!

@abhishekkrthakur 3 жыл бұрын

Best of luck and thank you for your kind words :)

@NatashaMugwe 3 жыл бұрын

This is an interesting concept. I didn't know one could use multiple models for a project.

@AdityaJha1 3 жыл бұрын

I like before I watch. Your content can never disappoint 👍👍👍

@mgpaingzinkyaw727 3 жыл бұрын

not gonna lie your channel especially this series help me a lot for how to start and how to think from scratch to build models. I'll check out your other videos later. Thank you so much for sharing your knowledge sir

@holsetymoon 3 жыл бұрын

Hello Abhishek, thanks for making these videos! They're super helpful for studying! I have a question regarding the difference between blending and stacking. Here's my understanding, please correct me where I'm wrong: stacking can have different levels of predictions on top of the base level (base as in the original models fitted with the original test and train dataset), whereas blending only have 1 level on top of the base level. I was able to implement blending thanks to your previous tutorial, now I'm trying to wrap my head around stacking so I can give it a try.

@abhishekkrthakur 3 жыл бұрын

correct

@longqua69 3 жыл бұрын

You inspired and trained many data people. Hope you can manage time to release more practical tutorials.

@kiranchowdary8100 3 жыл бұрын

How often are these stacking and blending used in real life tabular data sets for real life tasks ?

@JeremyWhittakerAZ 3 жыл бұрын

Got a couple of questions if this is your last video. What further reading material or tutorials would you recommend to keep pushing this competition? Or would you end here? It would be cool if you could do another video series breaking down some other dataset competition on kaggle like you did for this one. I learned a lot. Are there other video series similar to this with high quality that you would recommend from yourself or another author? Specifically, I would love to see how you break down time series datasets. As well this data is random so I assume your approach is completely different than a dataset that was not randomly generated. I imagine the exploratory phase of your approach would be different given the different importance of features? I got your book but haven't read it yet. Planning on getting it in on the beach today. Perhaps questions will be answered here.

@abirhasanx 3 жыл бұрын

Thanks for the amazing videos.

@kentdaniel7838 3 жыл бұрын

Thank you sir I really like your series and because of your teaching now I rank 11 on the 30 days of ML competition. I really appreciate your time to share your knowledge with us. Will you upload any further techniques recommendations that you haven't upload in the series? Also, can you recommend to me what is the next step to take after the 30DaysOfML competition is over? I hope I can become a Kaggle grandmaster like you too in the future!

@abhishekkrthakur 3 жыл бұрын

Thanks for your kind words. This series is over now. I would recommend you to take a look at ongoing prize money competitions on kaggle to learn further and maybe even finish all kaggle learn courses?

@kentdaniel7838 3 жыл бұрын

Thank you for your reply! I definitely will finnish and understand all the course in kaggle then jump to the prize competition. Once again thank you for your content it mean so much for me and the community!

@fabianhamza8546 3 жыл бұрын

Hi Abhishek! Thanks for the videos! They are really helpful and super good in terms of concept explanation. Have you done any video tutorial on feature selection? Heard that some of the features do just make noise to the model. Boruta-SHAP was recommended, but I didn't manage to grasp fully understading of it to freely play with it over the dataset.

@ishasharma1252 2 жыл бұрын

This series is really very helpful and informative, thank you for sharing! A quick question, when blending the hyperparameters of Level-0 models can be finetuned on the original train dataset. But while stacking, should the level-1 models be finetuned on the predictions of the Level-0 models, i.e the 'pred' columns? Your response will be very helpful.

@EngRiadAlmadani 2 жыл бұрын

Great work sir But as i see you train level one on the same training data that you use to train level 0 and i read before in hands on machine learning book that the level 1 should train on different set of data that the level 0 never seen before correct me if i wrong

@georgemichel9278 3 жыл бұрын

Don't you think it is a lil bit bad that you share notebooks that is literally top 5? I mean people are just downloading your results and get ahead the people who did not do that? Already 4 people did it!

@abhishekkrthakur 3 жыл бұрын

I thought about it and also discussed with others then i decided to open the notebook.I have not provided the files used for training the model, the base models are quite basic and the final model can also be improved easily by little tuning or improving base models. competition is just 50% done. There are still 7 days to make improvements! :)

@vezga 3 жыл бұрын

There are no medals in this competition so what the harm.

@KennethQuisado 3 жыл бұрын

I think the benefit outweighs the drawbacks, beginners like me can learn straight from the best on how they approach a machine learning problem. The content here is gold and the rank in the public leaderboard reflects it. Blindly copying will not help of course but not all are doing that, others are improving on this which is more important

@jojushaji3010 3 жыл бұрын

sr ure amazing

@bruceonfire8210 3 жыл бұрын

Hi Abhishek! Thank you so much for your video. I have a question regarding to stacking. So, in your video, normal way to do stacking is use different models to predict valid V1, V2, V3 and test T1,T2,T3, and then concate them with original train and test. Then, use models for those predict_valid V1V2V3 as features to predict again. However, I am wondering if I can just use T1,T2,T3 and do some linear regression or just take percentage of T1,T2,T3 and to get the final submission? I want to ask for your help! Does latter method cause data leakage or other issues or it’s fine to use?

@yogitad4136 3 жыл бұрын

I too click like before even watching. Tried stacking jumped to 13 , need to add more models .i only tried with 3 base models. Now the question is, how do one choose base models? Choose weak learners or strong learners while stacking? Or combination of both ? There is this sklearn library called StackingRegressor as per my understanding it does the same thing stacking the output of individual estimator and use a regressor to compute the final prediction. Right?

@abhishekkrthakur 3 жыл бұрын

No clue. I like my own stacking code better. Also, its better if as a beginner we write our own code as much as possible rather than using wrappers. AutoML libraries can do everything we are doing in 3 lines of code and some without any code. But what are we gonna learn from that? Regarding number of models: upto you and the data. models different from each other produce better results. in simpler words: avoid models that have highly correlated predictions with other models in stack or blend.

@yogitad4136 3 жыл бұрын

@@abhishekkrthakur ,Thank you for your response!

@jojushaji3010 2 жыл бұрын

How to do this for classification problem

@PratapO7O1 3 жыл бұрын

Love ur content

@vikasjha1204 3 жыл бұрын

I've just started with machine learning & a total newbie in the world of kaggle. Just found your channel from Twitter. So where should I start with ?

@harshavardhanasrinivasan3125 3 жыл бұрын

You can start from day 1 video and parallely practise

@madhu1987ful 3 жыл бұрын

have you not uploaded the notebook for stacking for reference?

@abhishekkrthakur 3 жыл бұрын

see pinned comment pls

@subhayanroy2218 3 жыл бұрын

Blending was amazing...have a slight hikkup though I was trying to optimize the weights at the end using optuna(as you mentioned at the end if the last video) instead of the linear regresser, can you help please?

@abhishekkrthakur 3 жыл бұрын

sure. if you share code.

@faysalmh6468 3 жыл бұрын

Firstly Thank you for your effort that's help me a lot. In a regression problem, we use k-fold for cross-validation but when the problem is classification what rule we use? You mention a rule "soguess" or something but I can't find it. It would help if you tell me the rule name or some links. Thank you again.

@abhishekkrthakur 3 жыл бұрын

Sturge's rule. Check out this video: kzfaq.info/get/bejne/aN2Bn6dlm8utc3k.html&ab_channel=AbhishekThakur or get my book for free which has a chapter dedicated to cross-validation: bit.ly/approachingml

@faysalmh6468 3 жыл бұрын

@@abhishekkrthakur Thank you very much.

@thanhtoanvuong8525 3 жыл бұрын

Can you explain how to choose model (base model, meta model) and parameter of them for stacking model?

@abhishekkrthakur 3 жыл бұрын

i wish i could! its upto you and the data. models different from each other produce better results. in simpler words: avoid models that have highly correlated predictions with other models in stack or blend.

@thanhtoanvuong8525 3 жыл бұрын

@@abhishekkrthakur thank you very much for your advice

@decadewgame9802 3 жыл бұрын

7:17 From where did you add this data?

@abhishekkrthakur 3 жыл бұрын

to avoid people copy-pasting code and running the notebook without understanding, this data has not been shared by me. you can, however, look at the notebooks shared by me in this competition and get the same data from them and build your own stacking dataset. good luck! :)

@shiva_acharii 3 жыл бұрын

when will your next book "Approaching (almost) Any NLP Problem" will be available for sale?

@abhishekkrthakur 3 жыл бұрын

end of year.

@sumitdwivedi9474 3 жыл бұрын

I followed everything in this video and created ny note book but still got a very poor RMSE of 0.728. What am I doing wrong?

@abhishekkrthakur 3 жыл бұрын

did you see my reference Notebook in the pinned comment ?

@sumitdwivedi9474 3 жыл бұрын

@@abhishekkrthakur yes, I did the see notebook which u used in blending but still unsure as to why this is happening. Also why u did not used GPU here will running the model

@abhishekkrthakur 3 жыл бұрын

@@sumitdwivedi9474 you need to use cpu with tree method as exact. gpuhist method doesnt give good enough results