No video

My top 50 scikit-learn tips

  Рет қаралды 12,142

Data School

Data School

Күн бұрын

Пікірлер: 49
@dataschool
@dataschool Жыл бұрын
👩‍💻 Code: github.com/justmarkham/scikit-learn-tips 🤖 Learn ML from me: courses.dataschool.io/ml-courses 💌 Weekly Data Science tips: tuesday.tips/ Thanks for watching! 🙌
@KartikeyRiyal
@KartikeyRiyal Жыл бұрын
Amazing as always. I have been following you since 2019 and every time it's something new.
@dataschool
@dataschool Жыл бұрын
Thank you so much for your kind words! 🙏
@KartikeyRiyal
@KartikeyRiyal Жыл бұрын
@@dataschool welcome
@KenJee_ds
@KenJee_ds Жыл бұрын
These are amazing! I learned a lot!
@dataschool
@dataschool Жыл бұрын
Thanks Ken! 🙌
@akbarboghani1
@akbarboghani1 Жыл бұрын
Great video, very informative. Thank you so much for sharing.
@dataschool
@dataschool Жыл бұрын
You're very welcome!
@philwebb59
@philwebb59 Жыл бұрын
24:08 handle_unknown='ignore'. A most useful tip! If only I'd read the docs. But, I don't understand when you say to go back and include the previously unknown categories. How can you train on unknown data? Even if you include the unknown "labels" in your encoder, they will all be zero during training, because, obviously, they weren't in your training data. I think it's best to just leave it alone. If it wasn't in your training data, then it's probably a rare occurrence and you can just ignore it. Zeros in all known categories simplifies what happens down stream? If you want to train on unknown data, you would need to use "dummy data" and set min_frequency or max_categories, then handle_unknown='infrequent_if_exists' to give down steam modules something to work with.
@dataschool
@dataschool Жыл бұрын
Glad tip 7 was useful to you! When I said "go back and include previously unknown categories", that means that the next time you train your model, you can incorporate that sample into your training data, and thus that previously unknown category will now be a known category.
@uncledez8
@uncledez8 Жыл бұрын
This is a Masters level info on Data science.
@dataschool
@dataschool Жыл бұрын
Thank you! 🙏 Just wait for my next Machine Learning course, it will blow your mind 🤯
@user-oj6rl5kc6i
@user-oj6rl5kc6i Жыл бұрын
Excellent, well done and thank you!
@dataschool
@dataschool Жыл бұрын
You're very welcome!
@Ahmed_Eid
@Ahmed_Eid Жыл бұрын
I'm a new subscriber. I'm so glad I found u amazing explanation
@dataschool
@dataschool Жыл бұрын
Thank you!
@tassoskat8623
@tassoskat8623 Жыл бұрын
Hello Kevin! Thank you for your great work and tips. Could you please include in the repository notebooks for the tips that are missing? I suppose those are the ones that do not contain code. However, it would be great to have those included in some way so nothing is missing when someone would like to do a quick review. Again, thank you so much for your sharing!
@dataschool
@dataschool Жыл бұрын
Thanks for your kind words! You are right that those 6 tips don't have notebooks, since they don't have code. I'll consider adding notebooks for those tips in the future... thanks for the suggestion!
@maziarjamshidi4505
@maziarjamshidi4505 Жыл бұрын
Awsome resource for Machine Learning. Thanks!
@dataschool
@dataschool Жыл бұрын
You're very welcome! Glad it's helpful to you!
@philwebb59
@philwebb59 Жыл бұрын
2:10:00 Yeah, if you have the time and the determination, you could run DecisionTreeClassifier, then plot_tree, and look through it for conditions like name != value. Then, you could use the order the decision tree "discovers" categories as the ordinal value for that feature, 0 being first. You just need to write a custom transformer to preprocess your validation data and assign -1 to all unknowns. Another trick I've had success with is ordering by frequency, with 0 being the most frequent. In that case, your custom transformer should assign 0 to all unknowns. Easy-peasy.
@dataschool
@dataschool Жыл бұрын
Thanks for sharing, Phil!
@rohitchan007
@rohitchan007 Жыл бұрын
We need more videos like these.
@dataschool
@dataschool 10 ай бұрын
Glad you like it!
@shahriyarabedinnezhad3162
@shahriyarabedinnezhad3162 Жыл бұрын
Super useful...Thanks Kevin
@dataschool
@dataschool Жыл бұрын
You're welcome!
@philwebb59
@philwebb59 Жыл бұрын
2:09:40 Hopefully, you'll never have 200 columns to passthrough, but I think specifying which columns to passthrough makes what you intend clearer. The default is remainder=drop, so the author thought that as well.
@dataschool
@dataschool Жыл бұрын
Sure! But there's nothing necessarily wrong with passing through 200 (or 200,000) columns if they don't need transformations.
@gary8421
@gary8421 Жыл бұрын
Thank you Kevin.
@dataschool
@dataschool Жыл бұрын
You're welcome Gary!
@TexasStar007
@TexasStar007 Жыл бұрын
Thanks Kevin!
@dataschool
@dataschool Жыл бұрын
You're welcome Shashi!
@venkataramana6975
@venkataramana6975 Жыл бұрын
Good work❤
@dataschool
@dataschool Жыл бұрын
Thank you!
@hedeyhod
@hedeyhod Жыл бұрын
thank you 🙏
@dataschool
@dataschool Жыл бұрын
You're welcome!
@philwebb59
@philwebb59 Жыл бұрын
30:20 Missingness. So, what happens when a feature is fully populated in your training data, but has missing values in your validation data? Just bringing that up in case you don't get to it.
@dataschool
@dataschool Жыл бұрын
If a feature has no missing values in training, but has missing values in testing, then the prediction step will fail. If that happens, you can go back and set up an imputer for that feature, and thus the prediction step will no longer fail.
@ayyappahemanth7134
@ayyappahemanth7134 Жыл бұрын
U r awesome sir
@dataschool
@dataschool Жыл бұрын
Thank you!
@philwebb59
@philwebb59 Жыл бұрын
2:03:00 Drop=if_binary makes sense, otherwise you have two columns which are perfectly redundant, not just implied. At least, it's a happy compromise. My only hesitation, without playing with it, is that the order is probably alphabetic. If it assigned 0 to the most frequent category, then handle_unknown=ignore would make sense. Otherwise, you're lumping unknowns in with the "least" alphabetic category. That's kinda silly.
@dataschool
@dataschool Жыл бұрын
You're correct that the left-to-right order of categories in the matrix is alphabetical.
@saharrichi2718
@saharrichi2718 11 ай бұрын
Please I want a video of (PSO) with RF in jupyter
@FabioRBelotto
@FabioRBelotto 7 ай бұрын
Hey kevin, why aren't you bringing some new videos anymore? :(
@dataschool
@dataschool 7 ай бұрын
I'm working on other projects right now, but I hope to return to publishing videos soon!
@manalabughazaleh7619
@manalabughazaleh7619 8 ай бұрын
Hi Can help me to find android spyware dataset ???
@dataschool
@dataschool 8 ай бұрын
I'm not familiar with that, I'm sorry! You can try Google Dataset Search.
How do I encode categorical features using scikit-learn?
27:59
Data School
Рет қаралды 138 М.
My top 25 pandas tricks
27:38
Data School
Рет қаралды 267 М.
КТО ЛЮБИТ ГРИБЫ?? #shorts
00:24
Паша Осадчий
Рет қаралды 1,3 МЛН
لااا! هذه البرتقالة مزعجة جدًا #قصير
00:15
One More Arabic
Рет қаралды 51 МЛН
Challenge matching picture with Alfredo Larin family! 😁
00:21
BigSchool
Рет қаралды 42 МЛН
123 GO! Houseによる偽の舌ドッキリ 😂👅
00:20
123 GO! HOUSE Japanese
Рет қаралды 5 МЛН
25 Nooby Pandas Coding Mistakes You Should NEVER make.
11:30
Rob Mulla
Рет қаралды 267 М.
21 more pandas tricks
24:40
Data School
Рет қаралды 47 М.
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 286 М.
Python Machine Learning Tutorial (Data Science)
49:43
Programming with Mosh
Рет қаралды 2,8 МЛН
The moment we stopped understanding AI [AlexNet]
17:38
Welch Labs
Рет қаралды 944 М.
КТО ЛЮБИТ ГРИБЫ?? #shorts
00:24
Паша Осадчий
Рет қаралды 1,3 МЛН