Kaggle's 30 Days Of ML (Day-12 Part-1): Handling Missing Values in Datasets (imputing missing value)

  Рет қаралды 13,474

Abhishek Thakur

Abhishek Thakur

2 жыл бұрын

This video is a walkthrough of Kaggle's #30DaysOfML. In this video, we learn how to handle missing values in a given dataset and how to select the best imputation strategy. Bonus strategy mentioned at the end of the video!
Tutorial Link-1: www.kaggle.com/alexisbcook/in...
Tutorial Link-2: www.kaggle.com/alexisbcook/mi...
Note: this video is not sponsored by #Kaggle!
Please subscribe and like the video to help me keep motivated to make awesome videos like this one. :)
To buy my book, Approaching (Almost) Any Machine Learning problem, please visit: bit.ly/buyaaml
Follow me on:
Twitter: / abhi1thakur
LinkedIn: / abhi1thakur
Kaggle: kaggle.com/abhishek

Пікірлер: 36
@abhishekkrthakur
@abhishekkrthakur 2 жыл бұрын
If you like the videos, please do consider subscribing. It helps me keep motivated to make awesome videos like this one. :)
@grzegorzzawadzki8718
@grzegorzzawadzki8718 2 жыл бұрын
Thank you @Abhisheh Thakur, I watched all the previous 11 episodes yesterday. By combining the knowledge from this and the previous video I got into the top 5%!
@isaacyn8256
@isaacyn8256 2 жыл бұрын
You are helping lot of newbies in ML.
@talk2yuvraj
@talk2yuvraj 2 жыл бұрын
@44:13 Where magic happens!
@gmguimaraess
@gmguimaraess 2 жыл бұрын
Thank you, Abhishek! In the beginning, when you were explaining the missing value imputation using the titanic dataset as an example, I was just wondering: hmmmm, couldn't we use Pclass and Sex features and try to predict the missing values of age? Then in the last part, you actually said we could try using this method! This got me excited and motivated to try this approach once again, thanks! Your videos are helping a lot!
@adriandiaz5688
@adriandiaz5688 Жыл бұрын
You are an absolutely great teacher, you've made this a lot easier for me to understand and have given me a ton of tips, and answered a bunch of unrelated questions I had about pandas along the way!! Thanks a ton!!
@deepakdas8884
@deepakdas8884 2 жыл бұрын
Sir, Thank you so much again
@jsklair
@jsklair 2 жыл бұрын
With the machine learning imputation method you discuss at the end; would you input the the 'F4' predicted numbers from model 1 (once it has run) into the 'X' of model 2 used to predict F6? Thanks for the videos.
@anmolsmusings6370
@anmolsmusings6370 2 жыл бұрын
Thanks for the video. I was wondering if in the last part where you use the model to predict the column with missing values is somehow related to the Expectation-Maximization (EM) algorithm? I reckon that the expectation step in the EM algorithm actually completes the missing data for each sample using the model itself. Was curious to know your take on it? Thanks again.
@fmussari
@fmussari 2 жыл бұрын
Why the model fit before submission is done only on train data and not on all X data imputed? Am I missing something? Thanks a lot for the videos!
@sunilsurendrasingh7736
@sunilsurendrasingh7736 2 жыл бұрын
in KFold cross-validation should the missing value imputation be done before CV or during CV for each train/Validation fold?
@samirana8931
@samirana8931 2 жыл бұрын
Hello Abhishek! First of all, thank you very much for making this understandable. Secondly I have tried to impute missing values by building a model and predicting them and my score got increased. I am attaching the kaggle link for my notebook. www.kaggle.com/mlsami/exercise-missing-values I know, my notebook is written neither in professional nor its the best. I just want you to take a look into that and tell us how improvement can be made. Apologies in advance if code is written badly, but its working for now atleast. Once again, Thank you very much.
@sauravkumar9454
@sauravkumar9454 2 жыл бұрын
Hello Abhishek, Why don't we just try imputation on the whole training dataset before splitting it for validation, this way we don't have to transform x_valid. Please let me know. Thanks.
@md.al-imranabir2011
@md.al-imranabir2011 2 жыл бұрын
Is it possible to use different strategies for different columns? Say, mean for one column and constant for another column?
@abhishekkrthakur
@abhishekkrthakur 2 жыл бұрын
yeap!
@alongbarbrahma484
@alongbarbrahma484 2 жыл бұрын
This was a lot to process
@swayamsingh4650
@swayamsingh4650 2 жыл бұрын
Sir as you discussed in last imputation method where we have to use a model to predict the column with missing values right. So for that I also need to train my model first with the rows which don't have any missing values and then pass those rows that have missing values as a validation set for prediction. Am i right ?
@abhishekkrthakur
@abhishekkrthakur 2 жыл бұрын
yes. as test set, not validation :)
@swayamsingh4650
@swayamsingh4650 2 жыл бұрын
@@abhishekkrthakur yeah sorry wrong term :
@swayamsingh4650
@swayamsingh4650 2 жыл бұрын
@@abhishekkrthakur just tried last approach of filling missing values with predictions and guess the rank it's 464 now :), huge jump from 1556. Thanks again sir
@nischaypatel4
@nischaypatel4 6 ай бұрын
Can you please share the solution of this???I tried the same thing but my model did not improve by a significant amount like yours did.
@thelazydeveloper
@thelazydeveloper 2 жыл бұрын
mech iyad​does this cource come with a certification
@abhishekkrthakur
@abhishekkrthakur 2 жыл бұрын
yes
@tubasiddiqui7345
@tubasiddiqui7345 2 жыл бұрын
When we had X_test, why did we create X_valid?
@abhishekkrthakur
@abhishekkrthakur 2 жыл бұрын
validation data is derived from original test data and has target labels. X_test doesnt have any target labels!
@tubasiddiqui7345
@tubasiddiqui7345 2 жыл бұрын
​@@abhishekkrthakur Oh I got it. We use validation data to match our predicted values with original ones and we use test data to actually use the model. Thank you
@abhishekkrthakur
@abhishekkrthakur 2 жыл бұрын
@@tubasiddiqui7345 eggjactly 😀
@MrLycantree
@MrLycantree 2 жыл бұрын
@@abhishekkrthakur What differs X_valid than y data?
@tubasiddiqui7345
@tubasiddiqui7345 2 жыл бұрын
@@MrLycantree X has features while y has target/label
@GurpreetKaur-nn8bb
@GurpreetKaur-nn8bb 2 жыл бұрын
Abhishek Sir, Is it necessary to submit the assignments in order to get the certificate? Or just these assignments/exercises are for practice purpose only? Are Kaggle keeping the record of us to going through the tutorials and doing assignments?
@suriyaprakaashjl5642
@suriyaprakaashjl5642 2 жыл бұрын
No actually there are three courses you have to complete and you will get 3 certifcates
@abhishekkrthakur
@abhishekkrthakur 2 жыл бұрын
you need to do the exercises to get certificate :)
Kaggle's 30 Days Of ML (Day-12 Part-2): Handling Categorical Variables
55:43
Handling Missing Values (with Rob Mulla)
1:16:07
Abhishek Thakur
Рет қаралды 9 М.
ОСКАР vs БАДАБУМЧИК БОЙ!  УВЕЗЛИ на СКОРОЙ!
13:45
Бадабумчик
Рет қаралды 5 МЛН
Clowns abuse children#Short #Officer Rabbit #angel
00:51
兔子警官
Рет қаралды 54 МЛН
Became invisible for one day!  #funny #wednesday #memes
00:25
Watch Me
Рет қаралды 58 МЛН
КАК ДУМАЕТЕ КТО ВЫЙГРАЕТ😂
00:29
МЯТНАЯ ФАНТА
Рет қаралды 5 МЛН
25 Nooby Pandas Coding Mistakes You Should NEVER make.
11:30
Rob Mulla
Рет қаралды 263 М.
how to make every day special
9:42
Timofey
Рет қаралды 34 М.
Kaggle's 30 Days Of ML (Day-13 Part-1): Scikit-Learn Pipelines
19:46
Abhishek Thakur
Рет қаралды 8 М.
What Are Decision Trees And How Do They Work? (From Scratch)
49:54
Abhishek Thakur
Рет қаралды 11 М.
Generative AI in a Nutshell - how to survive and thrive in the age of AI
17:57
Sklearn Simple Imputer Tutorial
7:51
Greg Hogg
Рет қаралды 8 М.
Hyperparameter Optimization: This Tutorial Is All You Need
59:33
Abhishek Thakur
Рет қаралды 105 М.
ОСКАР vs БАДАБУМЧИК БОЙ!  УВЕЗЛИ на СКОРОЙ!
13:45
Бадабумчик
Рет қаралды 5 МЛН