No video

Use OrdinalEncoder instead of OneHotEncoder with tree-based models

  Рет қаралды 4,237

Data School

Data School

Күн бұрын

With a tree-based model, try OrdinalEncoder instead of OneHotEncoder even for nominal (unordered) features.
Accuracy will often be similar, but OrdinalEncoder will be much faster!
👉 New tips every TUESDAY and THURSDAY! 👈
🎥 Watch all tips: • scikit-learn tips
🗒️ Code for all tips: github.com/jus...
💌 Get tips via email: scikit-learn.tips
=== WANT TO GET BETTER AT MACHINE LEARNING? ===
1) LEARN THE FUNDAMENTALS in my intro course (free!): courses.datasc...
2) BUILD YOUR ML CONFIDENCE in my intermediate course: courses.datasc...
3) LET'S CONNECT!
- Newsletter: www.dataschool...
- Twitter: / justmarkham
- Facebook: / datascienceschool
- LinkedIn: / justmarkham

Пікірлер: 14
@dataschool
@dataschool 2 жыл бұрын
Have you tried OrdinalEncoder with your tree-based model? Let me know how it compares to OneHotEncoder!
@sophiazhou9119
@sophiazhou9119 Жыл бұрын
I tried with randomforest and tree classifier, but the problem with ordinalEncoder is that the tree might treat it as a real number and break it down into a decimal number when spitting. How do you deal with that?
@dhirajkumarsahu999
@dhirajkumarsahu999 2 жыл бұрын
Yes, this makes sense to me. Models like linear regression gives importance to features based on the weights. Hence using one hot encoding in case of unordered categories is important in case of linear regression. Please correct me if I am wrong.
@dataschool
@dataschool 2 жыл бұрын
Right!
@dhirajkumarsahu999
@dhirajkumarsahu999 2 жыл бұрын
@@dataschool thanks for the reply ❤️
@grzegorzzawadzki8718
@grzegorzzawadzki8718 2 жыл бұрын
Thanks! That was very helpful.
@dataschool
@dataschool 2 жыл бұрын
Great to hear!
@elmoreglidingclub3030
@elmoreglidingclub3030 Жыл бұрын
Very interesting. I’d like to work with this a bit; what is the data set you used? I have an interesting data set (~2,300 rows, 13 features) that can give some bizarre accuracy results using a single classification tree but performs much, much better with a random forest. I’ll try ordinal encoding on it and let you know how it performs. Good stuff! Again, please, what is this data set?
@dataschool
@dataschool Жыл бұрын
See here: nbviewer.org/github/justmarkham/scikit-learn-tips/blob/master/notebooks/43_ordinal_encoding_for_trees.ipynb
@Dara-lj8rk
@Dara-lj8rk 2 жыл бұрын
Good one thanks
@dataschool
@dataschool 2 жыл бұрын
You're very welcome!
@alfathterry7215
@alfathterry7215 2 жыл бұрын
interesting...
@dataschool
@dataschool 2 жыл бұрын
Thanks!
@anandvyavahare2031
@anandvyavahare2031 2 жыл бұрын
Who on earth even tried it to find it? 😂😂
Speed up GridSearchCV using parallel processing
2:16
Data School
Рет қаралды 4,9 М.
How do I encode categorical features using scikit-learn?
27:59
Data School
Рет қаралды 138 М.
Kids' Guide to Fire Safety: Essential Lessons #shorts
00:34
Fabiosa Animated
Рет қаралды 15 МЛН
Кадр сыртындағы қызықтар | Келінжан
00:16
Happy birthday to you by Tsuriki Show
00:12
Tsuriki Show
Рет қаралды 11 МЛН
If Barbie came to life! 💝
00:37
Meow-some! Reacts
Рет қаралды 71 МЛН
Three reasons not to use drop='first' with OneHotEncoder
4:37
Data School
Рет қаралды 5 М.
Use AUC to evaluate multiclass problems
3:40
Data School
Рет қаралды 8 М.
Create feature interactions using PolynomialFeatures
4:08
Data School
Рет қаралды 7 М.
Adapt this pattern to solve many Machine Learning problems
7:49
Data School
Рет қаралды 12 М.
Use cross_val_score and GridSearchCV on a Pipeline
7:02
Data School
Рет қаралды 13 М.
Impute missing values using KNNImputer or IterativeImputer
5:50
Data School
Рет қаралды 40 М.
Risking my life to save $5000
19:29
Linus Tech Tips
Рет қаралды 1,1 МЛН
Kids' Guide to Fire Safety: Essential Lessons #shorts
00:34
Fabiosa Animated
Рет қаралды 15 МЛН