Correcting Skewed Data with Scipy and Numpy

  Рет қаралды 6,209

Christopher Pulliam, PhD

Christopher Pulliam, PhD

Жыл бұрын

Skewed data can adversely affect your analysis and machine learning models. In this video, I demonstrate five methods for cleaning skewed data using the NumPy and SciPy modules. The methods include taking the square root, cube root, fourth root, log, and Yeo-Johnson transform. I also showcase the effectiveness of each method by summarizing the skewness of the data after each transformation with a bar plot.

Пікірлер: 27
@metinunlu_
@metinunlu_ 6 ай бұрын
Thank you for the video, subscribed! KZfaq needs more quality content like this.
@officialscience101
@officialscience101 Жыл бұрын
the on-screen text is a great addition, Dr. P!
@CJP3
@CJP3 Жыл бұрын
🙏🏽, I’ll incorporate more in upcoming videos! Thanks for the feedback!
@user-gx6tn4wk7r
@user-gx6tn4wk7r 4 ай бұрын
Bro this is data science ASMR 🤤
@CJP3
@CJP3 4 ай бұрын
Hahaha I didn’t mean for it to be but glad you enjoyed it (I hope) 😂
@mushinart
@mushinart 10 ай бұрын
Outstanding explanation, professor
@CJP3
@CJP3 10 ай бұрын
Thank you so much!
@user-mm2uc6ye9x
@user-mm2uc6ye9x 2 ай бұрын
Amazing video I like it's structure: motivation, overview with examples, practical advices Thanks!
@CJP3
@CJP3 2 ай бұрын
Thanks for the feedback! I’ll do more of this style!
@nicolaslpf
@nicolaslpf Жыл бұрын
Amazing video! I was creating a function for measuring the same you forgot to name log1p Wich is log of (x+1) really useful for right skewed data with values less than 1
@dannybee9068
@dannybee9068 Жыл бұрын
Thank you! That was helpful! So we basically can make the root of any power? Is there a drawbag for exploiting it , like keep increasing the n value for feature to the power of 1/n?
@CJP3
@CJP3 Жыл бұрын
Hi Danny! Context definitely matters. For analytical chemistry 1/n scaling is usually ok. a few downsides are that it makes the models less sensitive to potential outliers. Also its not suitable for certain distributions. Lastly, because 1/n scaling is non-linear, it can make data interpretation more difficult.
@thoniasenna2330
@thoniasenna2330 4 ай бұрын
SUBSCRIBED! What should one do before? Or, what's the correct order? - treating outliers, impute missing values, correct symmetry? Thanks Dr. P!
@CJP3
@CJP3 4 ай бұрын
You’re not going to like the answer 😂… it depends a lot on the application. It’s first best to be aware they exist and then evaluate their impact on your outcome. For example if you’re trying to determine outlier samples - then outlier msmts wouldn’t be so bad.. maybe. Or missing values could be useful depending on the application so instead of imputing maybe you engineer a new feature.
@CJP3
@CJP3 4 ай бұрын
Don’t unsubscribe after my answer! 😂 🤣
@pabloagogo1
@pabloagogo1 Ай бұрын
This is interesting. If one corrects the original skewed data, via doing these kinds of transformations, in the context of linear regression or multiple linear regression, will that not change the interpretation of the original data. Curious to know.
@CJP3
@CJP3 28 күн бұрын
Perhaps, but that change may be for the better. I’d say it’s worth considering these transformation if you know you have skewed data. Many models especially linear models assume normally distributed variables. I usually build models with and without significant preprocessing and feature scaling/engineering.
@AyahuascaDataScientist
@AyahuascaDataScientist Ай бұрын
Skewing doesn’t necessarily matter if you’re using XGBoost, correct? For classification or regression, that is
@CJP3
@CJP3 Ай бұрын
Exactly! Skewed data doesn’t impact all model frameworks.
@pewkaboo
@pewkaboo Жыл бұрын
What if my data contains a lot of useful '0' values?
@CJP3
@CJP3 Жыл бұрын
Howdy! Can you explain more about the 0’s?
@pewkaboo
@pewkaboo Жыл бұрын
@@CJP3 it is a expenditure data where the budget column contains a lot of '0' (not null) values.
@prathambhatnagar8653
@prathambhatnagar8653 3 ай бұрын
please dont add background music
@CJP3
@CJP3 3 ай бұрын
Thanks for the feedback. Most of the newer coding tutorials don’t have background music. Have a great day!
@AyahuascaDataScientist
@AyahuascaDataScientist Ай бұрын
I like it. Don’t listen to this hater!
@mouhsineelqesry9446
@mouhsineelqesry9446 Ай бұрын
Bro you explain a concept, but go you need the music!! It’s distracting
@CJP3
@CJP3 Ай бұрын
I 💯 understand, they newer videos don’t have the music and the audio has a better EQ :)
Handling skewness
11:33
Sukamal Das
Рет қаралды 28 М.
HOW DID HE WIN? 😱
00:33
Topper Guild
Рет қаралды 45 МЛН
NumPy vs SciPy
7:56
IBM Technology
Рет қаралды 33 М.
Pandas  Sketch - The Jupyter Notebook Data Analysis Tool You Didn't Know You Wanted
9:44
Learn How to Boost Your Python Sklearn Models with GridsearchCV!
11:21
Christopher Pulliam, PhD
Рет қаралды 2,8 М.
LogTransformations.1.Why Log Transformations for Parametric
10:12
Quantitative Analysis Institute
Рет қаралды 65 М.
Normalization Vs. Standardization (Feature Scaling in Machine Learning)
19:48
Curve Fitting in Python (2022)
24:50
Mr. P Solver
Рет қаралды 90 М.
Regression with Count Data: Poisson and Negative Binomial
19:36
Matthew E. Clapham
Рет қаралды 57 М.
SPSS:  Skew and Kurtosis  (Non-Normal Distributions) - 3 different ways
11:44