Kaggle's 30 Days Of ML (Competition Part-2): Feature Engineering (Categorical & Numerical Variables)

Рет қаралды 13,526

Күн бұрын

This video is a walkthrough of Kaggle's #30DaysOfML. In this video, I will discuss over 10 different feature engineering techniques that you can apply for categorical and numerical features. #FeatureEngineering
Notebook: www.kaggle.com/abhishek/compe...
Note: this video is not sponsored by #Kaggle!
Please subscribe and like the video to help me keep motivated to make awesome videos like this one. :)
To buy my book, Approaching (Almost) Any Machine Learning problem, please visit: bit.ly/buyaaml
Follow me on:
Twitter: / abhi1thakur
LinkedIn: / abhi1thakur
Kaggle: kaggle.com/abhishek

Пікірлер: 49

@abhishekkrthakur 2 жыл бұрын

Notebook is here: www.kaggle.com/abhishek/competition-part-2-feature-engineering like, subscribe and share to help me keep motivated to make more amazing videos like this one ;)

@yogitad4136 2 жыл бұрын

Thank you, Abhishek. This helps a lot. Highly appreciate the time and effort that you put into the creation of these videos.

@longqua69 2 жыл бұрын

Thank you so much, this is really helpful. There aren't many Machine Learning practical tutorials like yours. I regret that you did not start to record videos like this one sooner.

@malawad 2 жыл бұрын

I needed this , i sooo very badly needed this . thank you so very much Abhishek ❤️

@kamalchapagain8965 2 жыл бұрын

Great job Abhishek sir. Really fruitful.

@seemasharma-mn7fk 2 жыл бұрын

Your videos are very helpful and make learning new topics and concepts so much easier! Thank you!

@abirhasanx 2 жыл бұрын

I'm learning so much from these videos, thank you so much

@purposeoriented6094 2 жыл бұрын

Thank you so much for the lessons...

@snitox Жыл бұрын

You know how you get so attuned to DS that you can listen to these like podcasts and not even have to look at the notebook to know what's going on.

@GAURAVSINGH-nu2cu 2 жыл бұрын

Thanks a lot Sir, It was really helpful. 👍

@ninaddate3756 2 жыл бұрын

First of all Thank you so so much for all your videos related to the course topics and now these ones for providing additional understanding for the competition. I have one question - you said that along with feature engineering we need to do hyper-parameter tuning, typically, do we need to tune model differently when we use the different techniques or we can apply same for all methods?

@user-or7ji5hv8y 2 жыл бұрын

This is like learning from the master of the craft.

@Orchishman 2 жыл бұрын

Before we concat the categorical cols back to the dataset after OHE, don't we need to drop those categorical cols from the DF first? Or does that not really affect the model predictions?

@sujitmohapatra4978 2 жыл бұрын

I feel the numerical features are already standardized.

@linnhtet001 2 жыл бұрын

Thank you for doing this. Your videos guide me and help learn many new things. I was self studying way before the 30 days challenge with kaggle learn but don't know where to go or where to start even after finishing the kaggle micro courses. I really appreciate all the great work you're doing for this community. Thank you, Sir.

@lucaspimentel1375 2 жыл бұрын

Going through your book while going through these videos at the same time is like next level learning

@heyrobined 2 жыл бұрын

Thanks

@shashihnt 2 жыл бұрын

Thank you it was really informative video. Do you think it’s okay to generate features by using frequency encoding of categorical features ?

@theonlypicklericktheonlypi2963 2 жыл бұрын

Learning from the best, as it should be done! I have a small query, are we free to create new features however we want to ? As long as our logic holds and it makes sense to the model, can we create new features independently without any restrictions or should we just follow some basic rules while creating one without experimenting too much on how to create one?

@abhaykshirsagar1166 2 жыл бұрын

Hey, can I use feature compression for the cont columns using something like PCA?

@abhishekkrthakur 2 жыл бұрын

yeah sure. feel free to use whatever works!

@dharitsura1520 2 жыл бұрын

@Abhishek Thakur , For calculating the RSME I guess np.sqrt is missing? Am I missing out something?

@abhishekkrthakur 2 жыл бұрын

squared=False

@AtulSharma-gf3tt 2 жыл бұрын

Thankyou so much sir for this helpful containt. Sir can you please make video of Data visualisation day of Kaggle Competition I am confuse in final project part of Data visualization please sir help me

@fmussari 2 жыл бұрын

Great video, learning a lot, thanks! I think that in the case of polynomial encoding we need to drop numerical columns before concat. Same with One Hot Encoding, we need to drop categorical columns as you did in previous videos. Am I right? Thanks again.

@abhishekkrthakur 2 жыл бұрын

thanks. for polynomial features, i use interaction only. so i dont drop original features. you can vhoose what to drop and what to keep. its totally upto you and the model. so, choose what fits and improves the model :)

@fmussari 2 жыл бұрын

@@abhishekkrthakur Great! Understood, thanks a lot.

@code4u941 2 жыл бұрын

Hi, great video Learning a lot from this. one thing, interaction_only = True removes a**2 and b**2 so we are left with :- 1, a, b, ab When we concat this with original dataframe doesn't it creates duplicates of a and b as a and b were already there.

@abhishekkrthakur 2 жыл бұрын

yes. you should remove a & b :)

@docleo63 2 жыл бұрын

Hi! Excellent work. Combines perfectly with your book. Do you have and email for errata?

@abhishekkrthakur 2 жыл бұрын

you can create issues here: bit.ly/approachingml

@aykutcayir64 2 жыл бұрын

Hi Abhishek, I think there is a logical mistake when you use the generated features coming from groupby methods. You should have used the same groupby value obtained from the training set for training, validation, and test sets because the numbers of A and B are different for training and test sets.

@abhishekkrthakur 2 жыл бұрын

yes you should! :) you should apply the groupby functuons in the training loop on training set and then use same values for validation and test sets! thanks!

@pujaurfriend 2 жыл бұрын

Thanks Abhishek :) it was a wonderful teaching . I have 2 things to ask here, 1 is in polynomial feature engineering we used fit transform for test data also, shouldnt it be only trnasform? also we havent transform validation data there, it should be done right..if not what is the reason. please reply. thanks

@abhishekkrthakur 2 жыл бұрын

polynomial features are not "learnt", they are just arithematic operations on columns, so you dont need to fit on train and transform test/valid. you can fit_transform everything in case of polynomial features.

@AAGLeon 2 жыл бұрын

@@abhishekkrthakur Shouldn't we concatenate our new poly-cols into the old df-s, like: df = df.drop(numerical_cols, axis=1) df_test = df_test.drop(numerical_cols, axis=1) df = pd.concat([df, df_poly], axis=1) df_test = pd.concat([df_test, df_test_poly], axis=1)

@abhishekkrthakur 2 жыл бұрын

@@AAGLeon yes. did i miss it? 😱

@abhishekkrthakur 2 жыл бұрын

@@AAGLeon its at 22:17 :)

@pujaurfriend 2 жыл бұрын

@@abhishekkrthakur Thank you very much for explaining

@md.al-imranabir2011 2 жыл бұрын

`test_poly = poly.fit_transform(df_test[numerical_cols])` - won't the use of `fit` method here cause data leakage?

@abhishekkrthakur 2 жыл бұрын

nope. polynomial features are simple artithematic operations and are not "learnt"

@md.al-imranabir2011 2 жыл бұрын

@@abhishekkrthakur Thanks.

@RaushanKumar-qb3de 2 жыл бұрын

👏🙌🤝

@alikayhanatay9080 2 жыл бұрын

Does scaling really matters with tree-based algorithms. In logical sense it shouldn't differ scaled or not. thank you for videos:)

@abhishekkrthakur 2 жыл бұрын

nope. it doesnt.

@rajeshyalla9512 2 жыл бұрын

Hello sir I am new to kaggle And when I tried your code I am getting like this "Your Notebook tried to allocate more memory than available. It has been restarted."