Рет қаралды 12,442
Here's a simple pattern that can be adapted to solve many ML problems. It has plenty of shortcomings, but can work surprisingly well as-is!
Shortcomings include:
- Assumes all columns have proper data types
- May include irrelevant or improper features
- Does not handle text or date columns well
- Does not include feature engineering
- Ordinal encoding may be better
- Other imputation strategies may be better
- Numeric features may not need scaling
- A different model may be better
- And so on...
Want to watch all 50 scikit-learn tips? Enroll in my FREE online course:
👉 courses.datasc... 👈
Tips mentioned in this video:
Tip 1: • Use ColumnTransformer ...
Tip 2: • Seven ways to select c...
Tip 6: • Encode categorical fea...
Tip 7: • Handle unknown categor...
Tip 9: • Add a missing indicato...
Tip 11: • Impute missing values ...
Tip 16: • Use cross_val_score an...
Tip 27: • Two ways to impute mis...
Tip 43: • Use OrdinalEncoder ins...
=== WANT TO GET BETTER AT MACHINE LEARNING? ===
1) LEARN THE FUNDAMENTALS in my intro course (free!): courses.datasc...
2) BUILD YOUR ML CONFIDENCE in my intermediate course: courses.datasc...
3) LET'S CONNECT!
- Newsletter: www.dataschool...
- Twitter: / justmarkham
- Facebook: / datascienceschool
- LinkedIn: / justmarkham