XGBOOST in Python (Hyper parameter tuning)

Рет қаралды 56,296

4 жыл бұрын

Trainer: Mr. Ashok Veda - / ashokveda
XGBoost is one of algorithms that has recently been dominating applied machine learning and Kaggle competitions for tabular or structured data. XGBoost is a preferred algorithm for production. This video about XGBoost Hyper parameter tuning.
You can also watch below playlist if you are a data science aspirant.
Full Data Science Tutorials: • Data Science Tutorials...
Statistics for Data Science Tutorials: • Statistics for Data Sc...
DataMites provides Data Science, Machine Learning, Artificial Intelligence, Deep Learning and IoT training courses with global valid certifications. You can choose classroom training or ONLINE training for you're scheduled course. Learn Data Science with Python programming, Statistics, and Machine Learning Algorithms.
For more details visit: datamites.com/
For ONLINE Training visit: datamites.com/data-science-on...
Classroom Training Centers:
Bangalore: datamites.com/data-science-co...
Chennai: datamites.com/data-science-co...
#DataScienceStatistics

Пікірлер: 106

@jaripeltola Жыл бұрын

This presentation is the best overall view on the most important XGBoost model parameters I have seen.

@DataMites Жыл бұрын

Thank You

@wanneesalkhayyali8386 Жыл бұрын

Indeed!

@kmnm9463 3 жыл бұрын

Hi Ashok, What I think is, there is no need to check for training accuracy. This is a redundant approach. The reason is the model is trained on the training data. So obviously the accuracy, whatever the hyperparameter tuning we do, is more likely to be close to 1.0. The better approach is to just focus on the test data. In real time scenario , for a problem statement, we would be feeding unseen data to the model and then fine - tune the hyper parameter. Thanks for the tutorial. Thanks from KM

@DataMites 3 жыл бұрын

Thank you

@darrencr1987 7 ай бұрын

Just for discussion… I think the purpose for calculating training performance is to compare it with test performance and see if there is any overfitting, otherwise how would you know ? Also I don’t think accuracy is a good measure here, AUC might be a better one, just my 2 cents

@prakharbaheti4055 3 жыл бұрын

Great tutorial , exact and to the point.

@DataMites 3 жыл бұрын

Thank you!

@pradeepsharma30 4 жыл бұрын

This is amazing stuff!!

@DataMites 3 жыл бұрын

Thank you!

@satishb9975 7 ай бұрын

Thank you and excellent way with detailed elaboration, of each parameters for Hyper parameter tuning) explained very well, finally in got the topic of hyper parameter tuning concept

@DataMites 7 ай бұрын

Thank you, Keep Supporting

@gauravrajpal3101 3 жыл бұрын

Very good explanation and test strategy, thank you so much sir

@DataMites 3 жыл бұрын

All the best

@analuciademoraislimalucial6039 3 жыл бұрын

Thanks Teacher. Love it explanation

@DataMites 3 жыл бұрын

You're welcome!

@xolanijozi8375 2 жыл бұрын

This is great.

@DataMites 2 жыл бұрын

Thank you

@madhur089 3 жыл бұрын

Thank you this helped in understanding

@DataMites 3 жыл бұрын

Glad it helped!

@qazdata-science4420 3 жыл бұрын

Amazing Tutorial!!!!

@DataMites 3 жыл бұрын

Thanks!

@AkshayArbune Ай бұрын

Very helpful Video

@DataMites Ай бұрын

Glad it was helpful!

@wangrichard2140 3 жыл бұрын

perfect！

@DataMites 3 жыл бұрын

Thank You!

@davintjandra4226 4 жыл бұрын

Hey, i ve got a question, say if i use a correlation matrix, and manually deselect the feature that are ambiguous(neutral), can i still put the col sample as 1?? Great tutorial man

@DataMites 3 жыл бұрын

Yes you can but check how you model performed.

@planetscore 3 жыл бұрын

What a chaos!

@wimavlogs6826 4 жыл бұрын

can you do a full video of time series forecasting for any future prediction using previous data? (Using XGBoost)

@DataMites 3 жыл бұрын

We will definitely do in future. Thank you

@estebanbraganza1067 3 жыл бұрын

Amazing video it would be better if you could use a different dataset so we can see the effects of the different parameters better.

@DataMites 3 жыл бұрын

Sure will do that since this is to explain you basic concept.

@carolinnerabbi965 4 жыл бұрын

Very good explanation and test strategy, thanks!

@DataMites 3 жыл бұрын

Glad it was helpful!

@nasifosmanshuvra8607 2 жыл бұрын

Great explaplnation Sir! How can I provide batches of Images by using data generator for image dataset to Xgb classifier model to fit images and labels ??

@DataMites 2 жыл бұрын

Kindly refer this: github.com/bnsreenu/python_for_microscopists/blob/master/195_xgboost_for_image_classification_using_VGG16.py

@SHarshithaBandaru 3 жыл бұрын

Help me correcting this error while calculating accuray_score ValueError: Classification metrics can't handle a mix of binary and continuous targets I'm geting this error as my output contains continuous variables also o/p:[0.96478105 0.01573407 0.01140928 ... 0.00143398 0.00143398 0.00143398]

@Tropical188 3 жыл бұрын

Thank you. What's said regarding random state... true for regression problems as well?

@DataMites 3 жыл бұрын

Hi , yes Heshini

@arjungoud3450 2 жыл бұрын

There is explanation of what they. Hoping you would a video in more detail

@DataMites 2 жыл бұрын

sure, will do that

@ltrahul1016 2 жыл бұрын

nice

@DataMites 2 жыл бұрын

Thank you.

@Nixterrex 2 жыл бұрын

Thank you! Are the parameters for XGBClassifier similar for the XGBRegressor? I can look at the documentations on my own, but it’s late at night for me and I can’t sleep thinking about it but i also don’t want to get sucked back into my project (i fixate XD) and i need to sleep hahah… Thank you again though! The video really helped me. I’m only 3 months into learning data science with python so it feels good every time i finally piece things together.

@DataMites 2 жыл бұрын

Hi Niko Blanco, yes you can find some similar parameters in XGBClassifier and XGBRegression. Thank you

@avaolsen1339 3 жыл бұрын

Thank you, Mr. Veda! This is really helpful. I have a question: is there an efficient way to tune these parameters automaticall?.

@DataMites 3 жыл бұрын

Hi Ava Olsen, you can automate the tuning of hyper parameter using python scripts. Or you can have a look in automl.

@youmadvids 2 жыл бұрын

@@DataMites Hi, what about GridCV?

@allalzaid1872 2 жыл бұрын

grid search cv

@avaolsen1339 2 жыл бұрын

But grid can is resource/time consuming. Is there an efficient way to do it?

@vikasrajput1957 4 жыл бұрын

increase learning rate makes the algorithm learn faster but at the cost of accuracy and does not dicrease the sensitivity contributed by a single point by a great amount, and thus does not generalises the model well and leads to overfitting in some cases

@DataMites 3 жыл бұрын

That is what convergence of algorithm means.

@prakashaiml8423 4 жыл бұрын

EXCELLENT EXPLANATION..

@DataMites 3 жыл бұрын

Thank you!

@gauravverma365 2 жыл бұрын

Such an informative video about the tunning of xgboost hyperparameter. My question is, can we extract mathematical equation for the input and output parameters. For instance, I have successfully applied Xgboost regression to predict y parameter using X1, X2, X3, X4 input parameters, now how can I get the xgboost's predicting equation between those input and output parameters. Please provide the information in this manner

@DataMites 2 жыл бұрын

No we cannot extract mathematical equation

@PhenomenalInitiations 3 жыл бұрын

Sir I want to use sotmax as objective, I have 4 dependent varaibles. How to make xgboost understand that there are 4 such variables? Pls reply.

@DataMites 3 жыл бұрын

Hi Sai Akhil Katukam, Thanks for your comment. If you want to use softmax and define the number of class in xgboost you need to put the following parameter while building the model... from xgboost.sklearn import XGBClassifier XGBClassifier(objective= 'multi:softmax', num_class=4,...)

@dehumanizer668 2 жыл бұрын

Nice one 👍🏼

@DataMites 2 жыл бұрын

Thanks

@mdfahd1795 3 жыл бұрын

Keep it up bro

@DataMites 3 жыл бұрын

Thank you!

@kuox0005 3 жыл бұрын

It appears that the target variable, y, is limited to nx1 array for making predictions using XGBOOST. Could the target variable, y, be a nXm, where m > 1, array ?

@DataMites 3 жыл бұрын

Yes possible, you can use multioutputregressor as a wrapper on xgboost

@prajothshetty6848 4 жыл бұрын

great video sir! straight & to the point explanation. sir where is the link to the code report or the repository?

@DataMites 3 жыл бұрын

We request you to pause video and type the code and will soon update the code in the description

@sugandhchauhan1900 2 жыл бұрын

Great video. Could you help me fine-tune my model, please? I am getting really low training and testing accuracy?

@DataMites 2 жыл бұрын

How can I help you?

@sugandhchauhan1900 2 жыл бұрын

@@DataMites I have messaged you on LinkedIn 😊

@carlmemes9763 3 жыл бұрын

Sir this problems also in the gradient boosting? Am i correct? If it in, we can do as you explained. If no, what have we da sir? Thank you sir, your videos are amazing ❤️

@DataMites 3 жыл бұрын

Hi, Thank you for your comment, can you clarify which problem you are trying to figure out?

@carlmemes9763 3 жыл бұрын

@@DataMites overfitting sir....

@majorcemp3612 4 жыл бұрын

Hi, what about gamma, don't you use it ? I think it's the only important missing here.

@vikasrajput1957 4 жыл бұрын

I guess since he is already using max_depth just 2-3, he doesn't need much of a pruning parameter for the trees, I guess. Your thoughts?

@majorcemp3612 4 жыл бұрын

@@vikasrajput1957 Surely, and you can tune parameters differently with gamma too, I just think in term of education he should mention it 😅😊

@DataMites 3 жыл бұрын

Please refer stats.stackexchange.com/questions/418687/gamma-parameter-in-xgboost

@praveenk302 2 жыл бұрын

What is min_child_weight and its significance?

@DataMites 2 жыл бұрын

Hi, please refer to this documentation. xgboost.readthedocs.io/en/latest/parameter.html

@rafsunahmad4855 3 жыл бұрын

Is knowing the math behind algorithm must or just knowing that how algorithms works is enough? please please please give a reply.

@DataMites 3 жыл бұрын

"Hi Rafsun Ahmad, thanks for your comment. It is necessary to know the math and other background behind any algorithm so that you will have better idea on why and how that algorithm should be used."

@rafsunahmad4855 3 жыл бұрын

Thank you very much

@welcomethanks5192 Жыл бұрын

WHy your digital pad can have pressure? my wacom intuos doesn't?

@DataMites Жыл бұрын

Can you reframe your question?

@welcomethanks5192 Жыл бұрын

@@DataMites I mean you are writing something with a digital pad and the words you write can have different thicknesses. But my digital pad only works like a marker pen(all same thickness)...

@DataMites Жыл бұрын

@@welcomethanks5192 You will have an option to change the thickness

@johnmasalu8703 3 жыл бұрын

Fruitful and informative training, please share your email, for clarifications on some of the issues

@DataMites 3 жыл бұрын

"Hi John Masalu, Thanks for reaching to us. You can share all your queries and doubt here in the comment section, we will reply in the comment itself."

@souptikmukhopadhyay6531 2 жыл бұрын

If your train accuracy is 1 and test accuracy is 0.97 how can you say that the model is overfitted ? The model is clearly performing very well on the test data. What you can do is perform k-fold cross validation to be more sure that it gives high accuracy on various test sets .... But having high train and test accuracies is not overfitting, it means that the data is relatively simple for the model to learn.

@DataMites 2 жыл бұрын

Yes it could be a simple dataset. But we can validate this model using cross validation to see if model overfits.

@nizarhaidar5225 4 жыл бұрын

Starts at 14:50

@dineshpramanik2571 4 жыл бұрын

please keep your microphone near your mouth...can't hear properly

@aiinabox1260 Жыл бұрын

Training accuracy was 1 don't u think it's a overfit

@DataMites Жыл бұрын

Yes. Hyperparameter tuning will help to overcome that. But as said, this is a very small dataset.

@shashankgpt94 2 жыл бұрын

you could have chosen a better dataset

@DataMites 2 жыл бұрын

Hi Shashank Gupta, thank you for your suggestion but this dataset is working good for this task.

@nassimbouhaouita1697 2 жыл бұрын

the data was too easy for the model

@DataMites 2 жыл бұрын

Yes. This video is to focus on hyper parameters of XGBoost.

@lextor99 4 жыл бұрын

Better to make this on a real dataset, that's how this video could be better.

@DataMites 4 жыл бұрын

Aleksei, Do you mean a large dataset? The one used in this video is a real dataset, contributed by the University of Wisconsin in 1995. ref: archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)

@lextor99 4 жыл бұрын

@@DataMites, Yeah I mean something more realistic and more challenging.

@DataMites 4 жыл бұрын

@@lextor99 Sure.

@nathan_falkon36 4 жыл бұрын

it's enough for it's teaching proposal i think

@wimavlogs6826 4 жыл бұрын

can you do a full video of time series forecasting for any future prediction using previous data? (Using XGBoost)

@DataMites 3 жыл бұрын

Sure till that time keep checking our channel for more videos