No video

fit vs transform vs fit_transform | fit vs fit_transform | fit and fit_transofrm in sklearn

  Рет қаралды 12,338

Unfold Data Science

Unfold Data Science

Күн бұрын

fit vs transform vs fit_transform | fit vs fit_transform | fit and fit_transofrm in sklearn
#machinelearning #datascience #unfolddatascience
Hello ,
My name is Aman and I am a Data Scientist.
All amazing data science courses at most affordable price here: www.unfolddata...
Topics for the video:
fit transform fit transform
fit vs fit_transform
fit and fit_transform in sklearn
fit vs fit transform sklearn
fit vs transform vs fit_tranform
fit vs transform vs fit tranform
About Unfold Data science: This channel is to help people understand basics of data science through simple examples in easy way. Anybody without having prior knowledge of computer programming or statistics or machine learning and artificial intelligence can get an understanding of data science at high level through this channel. The videos uploaded will not be very technical in nature and hence it can be easily grasped by viewers from different background as well.
Book recommendation for Data Science:
Category 1 - Must Read For Every Data Scientist:
The Elements of Statistical Learning by Trevor Hastie - amzn.to/37wMo9H
Python Data Science Handbook - amzn.to/31UCScm
Business Statistics By Ken Black - amzn.to/2LObAA5
Hands-On Machine Learning with Scikit Learn, Keras, and TensorFlow by Aurelien Geron - amzn.to/3gV8sO9
Ctaegory 2 - Overall Data Science:
The Art of Data Science By Roger D. Peng - amzn.to/2KD75aD
Predictive Analytics By By Eric Siegel - amzn.to/3nsQftV
Data Science for Business By Foster Provost - amzn.to/3ajN8QZ
Category 3 - Statistics and Mathematics:
Naked Statistics By Charles Wheelan - amzn.to/3gXLdmp
Practical Statistics for Data Scientist By Peter Bruce - amzn.to/37wL9Y5
Category 4 - Machine Learning:
Introduction to machine learning by Andreas C Muller - amzn.to/3oZ3X7T
The Hundred Page Machine Learning Book by Andriy Burkov - amzn.to/3pdqCxJ
Category 5 - Programming:
The Pragmatic Programmer by David Thomas - amzn.to/2WqWXVj
Clean Code by Robert C. Martin - amzn.to/3oYOdlt
My Studio Setup:
My Camera : amzn.to/3mwXI9I
My Mic : amzn.to/34phfD0
My Tripod : amzn.to/3r4HeJA
My Ring Light : amzn.to/3gZz00F
Join Facebook group :
www.facebook.c...
Follow on medium : / amanrai77
Follow on quora: www.quora.com/...
Follow on twitter : @unfoldds
Get connected on LinkedIn : / aman-kumar-b4881440
Follow on Instagram : unfolddatascience
Watch Introduction to Data Science full playlist here : • Data Science In 15 Min...
Watch python for data science playlist here:
• Python Basics For Data...
Watch statistics and mathematics playlist here :
• Measures of Central Te...
Watch End to End Implementation of a simple machine learning model in Python here:
• How Does Machine Learn...
Learn Ensemble Model, Bagging and Boosting here:
• Introduction to Ensemb...
Build Career in Data Science Playlist:
• Channel updates - Unfo...
Artificial Neural Network and Deep Learning Playlist:
• Intuition behind neura...
Natural langugae Processing playlist:
• Natural Language Proce...
Understanding and building recommendation system:
• Recommendation System ...
Access all my codes here:
drive.google.c...
Have a different question for me? Ask me here : docs.google.co...
My Music: www.bensound.c...

Пікірлер: 26
@HimanshuKumar-oi8qh
@HimanshuKumar-oi8qh Жыл бұрын
fit_tranform on x_train and tranform on x_test. Reason - by fit_transform we are learning the parameters and transforming the x_train and if we do again fit_transform on x_test it will learn the parameters again so will do only transform on x_test. and sara mazra overfitting ka hai . Hope this is making sense.
@UnfoldDataScience
@UnfoldDataScience Жыл бұрын
Yes - you understand the concepts well. Only thing to keep in mind, where we can use "learned parameters" on new data and where we can not
@mushinart
@mushinart 11 ай бұрын
After 2 long years ....now i know the answer 😭....im grateful
@shubhamagrawal7068
@shubhamagrawal7068 Жыл бұрын
We can apply fit on training data so that we have parameter values with us. We can also use fit_transform on training data. It will calculate parameter values from training data and do transformation as well. But on testing data, we always use transform and use the parameter values from training data. This will lead to data leakage problem. To avoid leakage problem we might use fit_transform on testing data. Correct me if I am wrong. And plz avoid this confusion by making a video Aman bhaiya...!!!!
@dakshbhatnagar
@dakshbhatnagar Жыл бұрын
For prediction we should ideally use transform because the data is fitted on training data and the test data is transformed using that fitted object. This can be for both tfidf and the scaler object. I could be wrong but this makes sense for me.
@UnfoldDataScience
@UnfoldDataScience Жыл бұрын
Hi Daksh, hope u are doing great. Seeing your comment after long. For scaling, do you see some data leakage problems?
@kausikkar2587
@kausikkar2587 Жыл бұрын
Well that's what we follow usually, but then, there are cases where you have a completely different type of data with different number of maximum features. In that case you have to again fit your test data too. I applied it today on my Vikram IMDb film review NLP project using CountVectorizer and MultinomialNB. And it worked as expected. Hope this helps.
@Krishna-pn5je
@Krishna-pn5je 11 ай бұрын
Hi Aman , thanks for the video. my answer is below. In the prediction stage we don't require scalar object because the model still understands the numeric data and we require scaling only if the dataset has multiple numeric features and if we want to compute distance between data points In the prediction stage of tfidif vector, we should pass the vectorizer object because the vectorizer object helps in transforming the text to vector at evaluation stage before passing it to the model for prediction which is necessary.
@himalayaashish947
@himalayaashish947 Жыл бұрын
Hi, For the prediction.. we will have to use only transform because we have trained the model and we want to use same parameters so we will only use transform. For tfidf we will use fit_transform. Since the corpus is changing so we need to calculate the parameters and then apply so we will have to use fit_transform.
@squadgang1678
@squadgang1678 Жыл бұрын
I go with his answer
@UnfoldDataScience
@UnfoldDataScience Жыл бұрын
Thanks for the answer, do you see data leakage problems with your approach?
@UnfoldDataScience
@UnfoldDataScience Жыл бұрын
Also for tf idf, if your new corpus has a new word that was never there in training then what happens to model?
@chandrabhanbahetwar9638
@chandrabhanbahetwar9638 Жыл бұрын
Bhai btana ha to puri chije clear btaya kro yr ye kya bhai tumne to hme hi confuse kr diya ki fit_transform use krege ya nhi test dataset me. video me reach chahiye to bol diya kro bhai hm sb comment kr dege lekin aisa confusion me fsake mt jaya kro. btana h to pura clear btao vrna rhne do
@ibrahimmosty1860
@ibrahimmosty1860 4 ай бұрын
I will use separated scaler because each scaler save the data for the specific column
@subhashdixit5167
@subhashdixit5167 Жыл бұрын
Thanks for taking my comments seriously
@shrirajpathak
@shrirajpathak Жыл бұрын
Why create all this confusion, just make the video with the answers in it...
@iyyappanmuthusamy1678
@iyyappanmuthusamy1678 Жыл бұрын
I don't think we will use both fit and transform function because while testing the dataset for our ml model we will not use testing dataset. we will use xtrain and ytrain dataset alone to feed for train our model in scaling.
@UnfoldDataScience
@UnfoldDataScience Жыл бұрын
Which use case? is it Sclaing or tfidf you are suggesting about?
@weirdyounes7618
@weirdyounes7618 Жыл бұрын
Thkuuuuuu 🎉
@niranjan.tanpure
@niranjan.tanpure Жыл бұрын
Product manager vs Data scientists which 1 pays you well sir ?
@UnfoldDataScience
@UnfoldDataScience Жыл бұрын
Managing data science product is not at all an easy task - it will need all qualities of a seasoned data scientist + more. I believe should be paid more than a normal data scientist.
@learning_with_irving4266
@learning_with_irving4266 9 ай бұрын
So is standardizing just finding the z score?
@rosemarydara1025
@rosemarydara1025 Жыл бұрын
This guy's teaching is really really amazing
@arpittrivedi6636
@arpittrivedi6636 Жыл бұрын
In prediction we use only fit
@UnfoldDataScience
@UnfoldDataScience Жыл бұрын
Only "fit" or only "transform"? Also in which scenario scaling/tf-idf
@faheemkhan-dm8zy
@faheemkhan-dm8zy Жыл бұрын
bakwas kia hy
My Cheetos🍕PIZZA #cooking #shorts
00:43
BANKII
Рет қаралды 28 МЛН
а ты любишь париться?
00:41
KATYA KLON LIFE
Рет қаралды 3,6 МЛН
Oh No! My Doll Fell In The Dirt🤧💩
00:17
ToolTastic
Рет қаралды 7 МЛН
ML Was Hard Until I Learned These 5 Secrets!
13:11
Boris Meinardus
Рет қаралды 281 М.
Standardization vs Normalization Clearly Explained!
5:48
Normalized Nerd
Рет қаралды 133 М.
I Studied Data Job Trends for 24 Hours to Save Your Career! (ft Datalore)
13:07
Thu Vu data analytics
Рет қаралды 199 М.
Normalization Vs. Standardization (Feature Scaling in Machine Learning)
19:48