No video

Use FunctionTransformer to convert functions into transformers

  Рет қаралды 7,075

Data School

Data School

Күн бұрын

Want to do feature engineering within a ColumnTransformer or Pipeline?
1. Select an existing function (or write your own)
2. Convert it into a transformer using FunctionTransformer
3. 🥳
👉 New tips every TUESDAY and THURSDAY! 👈
🎥 Watch all tips: • scikit-learn tips
🗒️ Code for all tips: github.com/jus...
💌 Get tips via email: scikit-learn.tips
=== WANT TO GET BETTER AT MACHINE LEARNING? ===
1) LEARN THE FUNDAMENTALS in my intro course (free!): courses.datasc...
2) BUILD YOUR ML CONFIDENCE in my intermediate course: courses.datasc...
3) LET'S CONNECT!
- Newsletter: www.dataschool...
- Twitter: / justmarkham
- Facebook: / datascienceschool
- LinkedIn: / justmarkham

Пікірлер: 31
@dataschool
@dataschool 3 жыл бұрын
Did you know that the code for all of these tips is on GitHub? Check it out: github.com/justmarkham/scikit-learn-tips
@marcelocruz1785
@marcelocruz1785 Жыл бұрын
I recently discover your channel, and it's incredible the amount of excellent information you provide!
@dataschool
@dataschool Жыл бұрын
Thank you!
@santiagogonzalezq1954
@santiagogonzalezq1954 2 жыл бұрын
I love your content because it's very well explained and I can practica my english with your pronuntiation. Cheers!
@dataschool
@dataschool 2 жыл бұрын
Thank you! That's awesome to hear!
@mingqian813
@mingqian813 2 жыл бұрын
I like all your well-explained videos! In the future, will you consider guiding a hands-on Kaggle project from beginning to end?
@dataschool
@dataschool 2 жыл бұрын
Thanks for your suggestion!
@roy11883
@roy11883 3 жыл бұрын
Cheers to Feature Transformer, thanks for sharing this Kevin
@dataschool
@dataschool 3 жыл бұрын
You're welcome!
@kevinozero
@kevinozero 2 жыл бұрын
Thank you so much, this was a super clear and simple explanation.
@dataschool
@dataschool 2 жыл бұрын
Thanks so much for your kind words!
@harshedirisinghe6864
@harshedirisinghe6864 Жыл бұрын
This is an excellent explanation!
@dataschool
@dataschool Жыл бұрын
Thank you!
@shubhamchoudhary5461
@shubhamchoudhary5461 3 жыл бұрын
please upload more videos like this ..thanks for this great content !! 🙏
@dataschool
@dataschool 3 жыл бұрын
Glad you like it! I will be uploading 2 more tips every week (Tuesdays and Thursdays) until I reach 50 tips. You can find all of them in this playlist: kzfaq.info/sun/PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6
@joanbrunet6
@joanbrunet6 2 жыл бұрын
if i pass this subject is all due to your videos
@dataschool
@dataschool 2 жыл бұрын
😄
@Dara-lj8rk
@Dara-lj8rk 3 жыл бұрын
Learned something new. Thanks heaps
@dataschool
@dataschool 3 жыл бұрын
Great to hear!
@hemangdhanani9434
@hemangdhanani9434 2 жыл бұрын
thanks for uploading such great videos...
@dataschool
@dataschool 2 жыл бұрын
Thank you!
@AceOnBase1
@AceOnBase1 6 ай бұрын
Hey man, if I have a function that does a bunch of regex operations (.str.extract etc) can I put that into a functiontransformer?
@atulsingh-uy2he
@atulsingh-uy2he 3 жыл бұрын
Helpful..!!
@dataschool
@dataschool 3 жыл бұрын
Thanks Atul!
@wadewattts5126
@wadewattts5126 3 жыл бұрын
Hi sir can you provide example on when using pandas instead of sklearn leads to data leakage.
@dataschool
@dataschool 3 жыл бұрын
Sure! If you do missing value imputation on the whole dataset (before splitting the dataset as part of your model evaluation procedure), data leakage will result.
@wadewattts5126
@wadewattts5126 3 жыл бұрын
Thank you sir. Another question if you may. But data leakage you indicated is not because of using pandas instead of sklearn, but because you impute before splitting the data. Can I say that I can use pandas or sklearn for preprocessing as long as I split the data to train test validation split first? Thank you in advance
@dataschool
@dataschool 3 жыл бұрын
That's technically true, but it misses the bigger picture. pandas lacks separate fit and transform steps, and so your code will quickly become overly complex if you want to do multiple different transformations within pandas without data leakage. And if there are any transformations you need to do that pandas doesn't offer, it's a pain to combine transformations from pandas with transformations from scikit-learn. Finally, it's completely impractical to do cross-validation (without data leakage) if your transformations are done in pandas (depending on the exact nature of the transformation). And if you can't use cross-validation, you also can't do hyperparameter tuning with GridSearchCV. Thus what you are saying is not technically incorrect, but it also means you are not going to be able to use some of the most important parts of scikit-learn. Hope that helps!
@wadewattts5126
@wadewattts5126 3 жыл бұрын
Thank you very much for that very comprehensive explanation, Mr. Kevin. I guess I expected to get away with things by using pandas but that turns out to be inefficient. Time to use the power of sklearn. You do very good content. Appreciate it.
@lk2055
@lk2055 2 жыл бұрын
how is this different from TransformerMixin? thanks
@dataschool
@dataschool 2 жыл бұрын
FunctionTransformer is simpler to use, but TransformerMixin is more flexible. Hope that helps!
Add feature selection to a Pipeline
2:29
Data School
Рет қаралды 8 М.
Simplify Data Preprocessing with Python's Column Transformer: A Step-by-Step Guide
13:52
لااا! هذه البرتقالة مزعجة جدًا #قصير
00:15
One More Arabic
Рет қаралды 52 МЛН
OMG what happened??😳 filaretiki family✨ #social
01:00
Filaretiki
Рет қаралды 13 МЛН
Bony Just Wants To Take A Shower #animation
00:10
GREEN MAX
Рет қаралды 7 МЛН
Running With Bigger And Bigger Feastables
00:17
MrBeast
Рет қаралды 142 МЛН
Create Custom Sklearn Estimators (1)
21:39
Out Of Sample Sam
Рет қаралды 3,5 М.
Scikit-Learn Model Pipeline Tutorial
16:50
Greg Hogg
Рет қаралды 26 М.
25 Nooby Pandas Coding Mistakes You Should NEVER make.
11:30
Rob Mulla
Рет қаралды 267 М.
How do I encode categorical features using scikit-learn?
27:59
Data School
Рет қаралды 138 М.
لااا! هذه البرتقالة مزعجة جدًا #قصير
00:15
One More Arabic
Рет қаралды 52 МЛН