Kaggle's 30 Days Of ML (Day-13 Part-2): Cross-validation

No video

Kaggle's 30 Days Of ML (Day-13 Part-2): Cross-validation

Рет қаралды 7,463

Abhishek Thakur

Күн бұрын

Пікірлер: 11

@isaacyn8256 3 жыл бұрын

Again, much appreciated, I know I would've been lost without your videos and the same for countless others.

@abhishekkrthakur 3 жыл бұрын

Glad to help

@studywithme824 2 жыл бұрын

Here's the thing man, when i thought of quitting it ur videos came to rescue.. Learning from a grandmaster helped a lot.. Really appreciate your efforts.. Keep posting more informative videos like this... P.s - also if u have any guidance related thing for a beginner or intermediary by you, whether it could be a video or article or blog or a comment for me please give me... It would really help me to be you...

@Duychienvt 3 жыл бұрын

20:02 Thank you for your answer to my comment on the day 10 tutorial. I also did a quick test which proves that in a small dataset the best params is not still the best param for whole data: - your first submission when you try rf with 700 estimators - I tried from [100,1000,100] and find the best param is 900(hope that I remember correctly) based on the validation set. Then train the with 900 and 500 with the whole data than 900 get the lower score on Kaggle submission 😃 Hope that help other learners I have another question: - method 1: keep the k models trained on k fold CV. Submission: Average result of k models - method2: keep the best-param when evaluating on k folds then retrain with all data, playing around with different seeds. Then average the result. Which method do you prefer to do first in competition?

@abhishekkrthakur 3 жыл бұрын

great. in subsequent videos related to the competition, ive been following method 1 :)

@Duychienvt 3 жыл бұрын

@@abhishekkrthakur got it, thank you.

@2mitable 2 жыл бұрын

thank you abhishekh

@aquibalikhan6930 3 жыл бұрын

abhishek can we take someone else model and do a tunning to it? as model building is a tough job to do

@code4u941 3 жыл бұрын

Hi, In CV we use StratifiedKFold in order to keep same ratio of target labels in each fold/split. So that when we measure our model performance, it correctly represents model stability rather getting affected by unbalanced splits. Am I correct?

@decadewgame9802 3 жыл бұрын

15:57 Here why did you wrote n_estimators = n_estimators. I mean wouldn't n_estimators , alone be suffice

@dineshsaini6051 3 жыл бұрын

It will suffice but it might lead to confusion. It works in this case coz the first parameter expected by RandomForestRegressor is n_estimators but if we write n_estimators = n_estimators then we don't have to worry about position of parameter. The code becomes easier to understand too. There's hardly any benefit in taking that shortcut.