No video

Kaggle's 30 Days Of ML (Competition Part-1): Cross Validation & First Submission on Kaggle

  Рет қаралды 24,613

Abhishek Thakur

Abhishek Thakur

Күн бұрын

Пікірлер: 101
@abhishekkrthakur
@abhishekkrthakur 3 жыл бұрын
Notebook-1: www.kaggle.com/abhishek/30-days-create-folds Notebook-2: www.kaggle.com/abhishek/competition-day-1-baseline Please subscribe and like the video to help me keep motivated to make awesome videos like this one. :)
@harishchandrasekaran6096
@harishchandrasekaran6096 3 жыл бұрын
I just saw your video thrice, searched for what is kfold once more. And then I was able to understand your code mostly. Since its my first time, I guess I have such a feeling. And the ones after for loop was amazing. I need some more practice to think sequentially that way. Thanks once again for the amazing tutorial.
@malawad
@malawad 3 жыл бұрын
I love those walkthrough especially in a competition like this , any more coming ?
@abhishekkrthakur
@abhishekkrthakur 3 жыл бұрын
obviously! ;)
@praveerparmar8157
@praveerparmar8157 3 жыл бұрын
Hi Abhishek, when I started reading your book AAMLP, I felt a bit confused while creating the folds. But after watching this video, everything has become crystal clear. Thanks a lot for such lucid and crisp explanation.
@williamsokol0
@williamsokol0 3 жыл бұрын
Thanks for giving us a place to start, I was absolutely clueless on what do before this video
@TheMISBlog
@TheMISBlog 3 жыл бұрын
Great Content Mr.Abhishek, Keep up the good work
@sumeerabhat
@sumeerabhat 3 жыл бұрын
I am new to DS ML and kaggle as well and I was confused how to start with kaggle and your tutorials cleared a lot more concepts ..Thank you so much..
@GIT_Somya
@GIT_Somya 3 жыл бұрын
Damn it 20 days ,and I'm seeing this now! I feel sad for missing out but I'm finally here
@RaushanKumar-qb3de
@RaushanKumar-qb3de 3 жыл бұрын
Thanks Abhishek sir!😀 I learned a lot, this will improve my level of coding. Will follow your all upcoming videos.Thanks Again for channel!!
@vaishnavibelsare7847
@vaishnavibelsare7847 3 жыл бұрын
Thanks for this video. It was helpful. Please elaborate and explain a little bit more so that if we messed up with concept during these 15 days, those concepts may get cleared.
@naylamp_gaming
@naylamp_gaming 3 жыл бұрын
A thousand thanks!! Your tutorials are the best :) I will purchase your book
@siddharthganjoo
@siddharthganjoo 3 жыл бұрын
Thank you for this tutorial, you videos are really helpful
@YuI-lf8dg
@YuI-lf8dg 2 жыл бұрын
thank you, this help me a lots!
@sunnybhojwani3199
@sunnybhojwani3199 2 жыл бұрын
Great Explanation
@theDrewDag
@theDrewDag 3 жыл бұрын
This is friggin gold.
@dhruvnivatia9222
@dhruvnivatia9222 3 жыл бұрын
You deserve millions of suscribers....
@somiab5857
@somiab5857 3 жыл бұрын
Your video was very useful for me.thank you very mach for your nice explanations.
@penninahgathu7956
@penninahgathu7956 3 жыл бұрын
Thank you for your amazing work. I'm learning alot from you
@MonikaRabha
@MonikaRabha 2 жыл бұрын
26:40 i love this xD
@renelara5339
@renelara5339 3 жыл бұрын
Thank you, nice video
@junsenchen4514
@junsenchen4514 3 жыл бұрын
Really appreciate your video, learnt a lot from it
@purposeoriented6094
@purposeoriented6094 3 жыл бұрын
It was A little harder and thank you for the great help...
@mithilesh03
@mithilesh03 3 жыл бұрын
@Abhishek sir, can you please create a video on pipeline especially performing feature engineering (feature interaction to create new features) through pipeline? This will be really helpful. Thank you
@discsanddata6021
@discsanddata6021 3 жыл бұрын
You are a legend.
@AlexKite68
@AlexKite68 3 жыл бұрын
Wow! Great work! Thank you!
@marinagorden11
@marinagorden11 3 жыл бұрын
wow great work, genuine thanks!
@talhaanwer8559
@talhaanwer8559 3 жыл бұрын
@1:56 you are saying… ….you can just simply use k-Fold, All you need to do is you have to see if the distribution of data is same in each fold if it is not use others… I want ask how you will know by looking at histogram that the distribution of data is same in each fold?
@dustin12188
@dustin12188 3 жыл бұрын
Thank you so much for this! Can you explain why we do df_train.loc[valid_indices...] and not train_indices?
@appunram6881
@appunram6881 3 жыл бұрын
I am also having the same question. Did you find it?
@rishabhchandel4786
@rishabhchandel4786 3 жыл бұрын
Each time going through loop valid indices form 20% of the data, since kfold is 5 and train indices form 80%. Valid indices would be different for each loop. In 5 loops all data point will become validation data in exactly one loop or in one of the folds. Note this is not the case of training indices. Since a single training data would be utilized across multiple loops(4folds or loops to be exact.). Hence valid indices is utilized to uniquely assigned fold values and not training indices
@appunram6881
@appunram6881 3 жыл бұрын
@@rishabhchandel4786 Thanks. I got it 👍
@M_SAI_VARMA
@M_SAI_VARMA 3 жыл бұрын
Thanks a lot 😊
@HARSHRAJ-2023
@HARSHRAJ-2023 2 жыл бұрын
Hi at 4:37 you says we don't need train indicies we need valid indicies. Can you explain reason behind it please?
@adnana2351
@adnana2351 3 жыл бұрын
I was running this following cell final_predictions = [] for fold in range(5): xtrain = df[df.kfold != fold].reset_index(drop=True) xvalid = df[df.kfold == fold].reset_index(drop=True) xtest = df_test.copy() ytrain = xtrain.target yvalid = xvalid.target xtrain = xtrain[useful_features] xvalid = xvalid[useful_features] ordinal_encoder = OrdinalEncoder() xtrain[object_cols] = ordinal_encoder.fit_transform(xtrain[object_cols]) xvalid[object_cols] = ordinal_encoder.transform(xvalid[object_cols]) xtest[object_cols] = ordinal_encoder.transform(xtest[object_cols]) model = XGBRegressor(random_state=fold, n_jobs=4) model.fit(xtrain, ytrain) preds_valid = model.predict(xvalid) test_preds = model.predict(xtest) final_predictions.append(test_preds) print(fold, mean_squared_error(yvalid, preds_valid, squared=False)) But getting the following error AttributeError Traceback (most recent call last) in 5 xtest = df_test.copy() 6 ----> 7 ytrain = xtrain.target 8 yvalid = xvalid.target 9 /opt/conda/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name) 5463 if self._info_axis._can_hold_identifiers_and_holds_name(name): 5464 return self[name] -> 5465 return object.__getattribute__(self, name) 5466 5467 def __setattr__(self, name: str, value) -> None: AttributeError: 'DataFrame' object has no attribute 'target' Can't figure out what happened
@user-wo1dn4do6x
@user-wo1dn4do6x 3 ай бұрын
Hello sir, can you reopen the data? This contest is no longer available. Thank you.
@kafaayari
@kafaayari 3 жыл бұрын
Hi Abhishek, Thx for the great content. I just want to know name of below technique you used. - in each fold, you fit the model with training data of the fold. - with this model, you make prediction test data, and stack this prediction - finally you average the predictions of test data at the end. this is not cross validation, not out of fold prediction whatever.What is it called?
@seemasharma-mn7fk
@seemasharma-mn7fk 3 жыл бұрын
Thank you!
@Frank-mu6qy
@Frank-mu6qy Жыл бұрын
Sir,I'm watching your video,how can I download the datasets inyour video?I can't fork your project from kaggle
@romainleclair5119
@romainleclair5119 3 жыл бұрын
Amazing ! ;)
@shreymittal3907
@shreymittal3907 2 жыл бұрын
Hey Abhishek just wanted to ask if challenge is open now. Or it is closed
@dhruvnivatia9222
@dhruvnivatia9222 3 жыл бұрын
i have one question, after applying cross-validation, how can I select the best model ( through which I can make predictions )??
@youssefbakadir2625
@youssefbakadir2625 3 жыл бұрын
Hi Abhishek! thanks again and again for your valuable videos! My question is about the autocompletion will codding in kaggle notebook ! plz is there any way to get the autocompletion in kaggle notebook without using the Tab key.
@manisha0209
@manisha0209 3 жыл бұрын
Thankuu so much sir
@harshraj5739
@harshraj5739 2 жыл бұрын
Hi Abhishek. Can I get this dataset for practice ? It would be of great help if I get this as csv in my local for learning.
@shailendrayadav3155
@shailendrayadav3155 2 жыл бұрын
Hi Abhishek, I have started the playlist, Is there a way you can post the original data as its not available coz the competition is closed. I am not able to follow this video hands on due to the data issue.
@sollysebela2259
@sollysebela2259 3 жыл бұрын
Keep it up bro
@pankajshaw674
@pankajshaw674 3 жыл бұрын
Is there any interesting story behind random_state = 42;
@hemesh5663
@hemesh5663 3 жыл бұрын
Could some one pls do explain me why he is assigning kfold value on train data set only on valid indices
@niteshkavathankar1300
@niteshkavathankar1300 3 жыл бұрын
when i tried oridinal encoding to tranform test data i got error of unknown category but in Abishek sir he didnt got error why this happned BTW i didnt used it inside for loop i tried to do it out side for and was goin to use gridsearch CV
@Name-pss
@Name-pss 3 жыл бұрын
Thanks for the tutorials, they were really helpful. I am not a participant so I was wondering if there is a way for me to get the train and test data. Still gonna sponge off the content and other's notebooks but was wondering if I could do it on my own too.
@abhishekkrthakur
@abhishekkrthakur 3 жыл бұрын
TPS feb 2021 competition is very similar to the data in this one. you can even follow it with the videos ive posted.
@Name-pss
@Name-pss 3 жыл бұрын
@@abhishekkrthakur thanks for the tip...will do. Just saw an old comment with the same issue. Thanks for replying.
@akshayshimpi1265
@akshayshimpi1265 3 жыл бұрын
I loved the video, but I have one question that. How target column is calculated?
@gaguirre04
@gaguirre04 3 жыл бұрын
man, target column is a data column that allows you to perform the learning. It is used when fitting the model.
@penninahgathu7956
@penninahgathu7956 3 жыл бұрын
Hello guys, I am still currently having a hard time understanding this block of code kf = model_selection.KFold(n_splits=5, shuffle=True, random_state=42) for fold, (train_indicies, valid_indicies) in enumerate(kf.split(X=df_train)): df_train.loc[valid_indicies, "kfold"] = fold particularly this line, df_train.loc[valid_indicies, "kfold"] = fold, why excatly are we doing this? I can't seem to wrap my head around it
@abhishekkrthakur
@abhishekkrthakur 3 жыл бұрын
- in each iteration we divide the original data into two parts - one part is 80% and other is 20% - 20% is the validation data - lets say we are in second iteration - training data in 2nd iteration will have overlap with training data in 1st iteration - validation data in 2nd iteration will have no overlap with validation data in 1st iteration - in each iteration, set of validation data (or validation indicies) are disjoint with any other iterations's validation data - after 5 iterations and 20% of validation indicies in each iteration, we cover the whole original training set - thats why we use validation iterations and not training which might have overlap
@penninahgathu7956
@penninahgathu7956 3 жыл бұрын
@@abhishekkrthakur Thank you, I understand it now
@mithilesh03
@mithilesh03 3 жыл бұрын
Thank you for this video, sir @Abhishek Thakur. Quick question, at 5:45 duration, you are updating the valid_indices with fold (df.loc[valid_indices,'kfold'] = fold), and in the next line when you type head, all instances including the train_indices get updated with the fold value. Can you please explain this? I didn't understand how the train_indices updated with the fold value. Thank you
@mithilesh03
@mithilesh03 3 жыл бұрын
Sorry, I got it, please ignore this comment. Thank you
@HARSHRAJ-2023
@HARSHRAJ-2023 2 жыл бұрын
@@mithilesh03 Hi Can you explain me in understanding because I am also having the same query.
@sylvainthibault9733
@sylvainthibault9733 3 жыл бұрын
What is your dark theme setup ? Looks really good.
@abhishekkrthakur
@abhishekkrthakur 3 жыл бұрын
its kaggle notebooks's dark theme. :)
@amrirasyidi
@amrirasyidi 3 жыл бұрын
2:05 Do anyone know in which video the kfolds mentioned earlier? Or let met just ask it here, what is the reasoning behind creating the folds? Thanks!
@abhishekkrthakur
@abhishekkrthakur 3 жыл бұрын
its the same reasoning behind any kind of cross validation. since all parts are connected, another reasoning will be clearer in parts 5 & 6
@amrirasyidi
@amrirasyidi 3 жыл бұрын
@@abhishekkrthakur I see. Will look up to that Thanks a lot!
@manojkumar-ir8lh
@manojkumar-ir8lh 3 жыл бұрын
I am trying to read train_folds data like pd.read_csv("../input/30days-folds/train_folds.csv") but its throwing file not found. Can you tell me how I can refer your public uploaded dataset
@abhishekkrthakur
@abhishekkrthakur 3 жыл бұрын
Click on "add data" button on top right corner and then look for "30days-folds". Add it and when its added, you will be able to read train_folds.csv
@md.mahmudulislam4802
@md.mahmudulislam4802 3 жыл бұрын
Sir, Who Missed registration 30daysofml challenge he/she can participant these competition using his kaggle account If can, How to get competition link?
@rajathslr
@rajathslr 3 жыл бұрын
On which day of "30 days of ML "was the "kfold" stuff explained ?
@abhishekkrthakur
@abhishekkrthakur 3 жыл бұрын
kzfaq.info/get/bejne/Z-Cfq9hks5-YoYk.html
@rajathslr
@rajathslr 3 жыл бұрын
@@abhishekkrthakur it's such a surprise to get a reply from Abhishek, thanks very much!!
@rfa8668
@rfa8668 3 жыл бұрын
Dear Prof. thank you very much for your effort,. After the day no. 15 I didn't receive any mail, and from that day the rules of competition is not clear for me.
@abhishekkrthakur
@abhishekkrthakur 3 жыл бұрын
on day 15, you received link to competition. so, now you need to join and take part in that competition. ive made 4 parts of tutorials for competition till now.
@rfa8668
@rfa8668 3 жыл бұрын
@@abhishekkrthakur Thank you very much for your help. God save you.
@sagarkhule6439
@sagarkhule6439 3 жыл бұрын
Like wise I m also not able to participate I didn't get an. Y e-mail 🥺
@appunram6881
@appunram6881 3 жыл бұрын
What is the use of putting index=false?
@abhishekkrthakur
@abhishekkrthakur 3 жыл бұрын
put index=true and see the resulting csv to know the reason :)
@trojan12352
@trojan12352 3 жыл бұрын
Sir, why did we create a seperate file with Folds ? Can't we use that for cross_val_score ? Please let me knkow, I'm just trying to understand why create a Folds file and iterate ourselves, while cross_val_score does that for us. Pretty new here so trying to connect the dots. Thanks in Advance.
@abhishekkrthakur
@abhishekkrthakur 3 жыл бұрын
We will know in next couple of videos :)
@sagarkhule6439
@sagarkhule6439 3 жыл бұрын
Hello sirrr , I'm following this playlist since day 1 as I was late for my enrollment I'm not able to make a notebook in competition So what should I do now Like should I jst go through it ? Or make some private notebooks for reference..???
@narendralv6379
@narendralv6379 3 жыл бұрын
Please check your email. You should have received a link to competition on day 15. Join the competition using the link and you should be able to create notebooks. I did the same today and it worked.
@sagarkhule6439
@sagarkhule6439 3 жыл бұрын
@@narendralv6379 can you forward the same link here please it will be helpful
@ashrafarzu3285
@ashrafarzu3285 3 жыл бұрын
Sir, when i am trying to add df_test= "../30-/test.csv file, it is showing error. Would i have to download test.csv file too and add here? Or i directly use it from competition data? If it is. How will i do it then? @Abhishek Thakur
@abhishekkrthakur
@abhishekkrthakur 3 жыл бұрын
you need to add that dataset to your kernel. please see the beginning of this video: kzfaq.info/get/bejne/o5uJhq6BstDPop8.html
@phucn5589
@phucn5589 3 жыл бұрын
Can anyone share input dataset plz? I'm late to competition, and access restricted to participants only. Thanks
@abhishekkrthakur
@abhishekkrthakur 3 жыл бұрын
try this competition data: www.kaggle.com/c/tabular-playground-series-feb-2021 its very similar to the one going on!
@phucn5589
@phucn5589 3 жыл бұрын
@@abhishekkrthakur Thanks Abhishek!
@sylvainthibault9733
@sylvainthibault9733 3 жыл бұрын
When running the loop to set the kfold column I get "Your notebook tried to allocate more memory than is available. It has restarted." Do you have more memory allocated to your account ? If yes, how does one increase allocated account memory.
@abhishekkrthakur
@abhishekkrthakur 3 жыл бұрын
for this dataset, you shouldnt get memory error. did u fix the issue?
@sylvainthibault9733
@sylvainthibault9733 3 жыл бұрын
@@abhishekkrthakur yes thank you. Issue resolved itself.
@NikhilRaj-vz6cq
@NikhilRaj-vz6cq 3 жыл бұрын
Hlo sir... What's the time when u upload submission?
@abhishekkrthakur
@abhishekkrthakur 3 жыл бұрын
random times
@markcuello5
@markcuello5 Жыл бұрын
HELP
@marios673
@marios673 3 жыл бұрын
Please, do not read the code. Just, explain.......:)
@abhishekkrthakur
@abhishekkrthakur 3 жыл бұрын
ill try my best. anything specific that i missed?
@ashrafarzu3285
@ashrafarzu3285 3 жыл бұрын
@@abhishekkrthakur Sir, you are so helpful. The problem is which code is doing what we puzzled with it and also when to use which part of code.
Kaggle's 30 Days Of ML (Day-5, Part-1): Python Lists and Tuples
33:51
Abhishek Thakur
Рет қаралды 8 М.
طردت النملة من المنزل😡 ماذا فعل؟🥲
00:25
Cool Tool SHORTS Arabic
Рет қаралды 22 МЛН
А ВЫ УМЕЕТЕ ПЛАВАТЬ?? #shorts
00:21
Паша Осадчий
Рет қаралды 1,8 МЛН
SPONGEBOB POWER-UPS IN BRAWL STARS!!!
08:35
Brawl Stars
Рет қаралды 21 МЛН
What Are Decision Trees And How Do They Work? (From Scratch)
49:54
Abhishek Thakur
Рет қаралды 11 М.
Kaggle's 30 Days Of ML (Day-13 Part-1): Scikit-Learn Pipelines
19:46
Abhishek Thakur
Рет қаралды 8 М.
Kaggle's 30 Days Of ML (Day-1): Getting Started With Kaggle
43:42
Abhishek Thakur
Рет қаралды 185 М.
Hyperparameter Optimization: This Tutorial Is All You Need
59:33
Abhishek Thakur
Рет қаралды 107 М.
Beginner Data Science Portfolio Project Walkthrough (Kaggle Titanic)
2:20:17
Ryan & Matt Data Science
Рет қаралды 16 М.
Kaggle's 30 Days Of ML (Day-12 Part-2): Handling Categorical Variables
55:43
This is why Deep Learning is really weird.
2:06:38
Machine Learning Street Talk
Рет қаралды 383 М.