No video

Data Cleaning Tutorial | Cleaning Data With Python and Pandas

  Рет қаралды 153,385

Soumil Shah

Soumil Shah

5 жыл бұрын

Пікірлер: 135
@anmol_seth_xx
@anmol_seth_xx 2 жыл бұрын
First of all thanks to you, I learned ffill, bfill and interpolate functions from here. But it's recommended from many professionals that missing values should be imputed with mean, median & mode.
@kaustubhjain7066
@kaustubhjain7066 3 жыл бұрын
the chill moment when he codes and sips up coffee, cool man great work !!!
@kdausu90
@kdausu90 3 жыл бұрын
This guy is chill af
@YOGIT_Singh
@YOGIT_Singh Жыл бұрын
Bro, you are Doing such great work.. The way you are explaining things is excellent and easy to understand. Kudos to you .
@kurosaki2510
@kurosaki2510 2 жыл бұрын
Could you make a tutorial on Big Data as well, for situations with e.g. 500k rows and 200 columns where you don't see all of your data and don't know what kinds of Nan values to expect and therefore can't name them textually? Big thanks in advance :)
@oktafajarandrian7352
@oktafajarandrian7352 2 жыл бұрын
Up
@Alexander-ms2ct
@Alexander-ms2ct Жыл бұрын
It’s a **kwark called “chunksize=“ the integer passed to it is the amount per chunk. So if you select 1000 and have a df of 500k. It would load 500 times. In pieces
@alishalbayev264
@alishalbayev264 3 жыл бұрын
I just started watching but I see that you are really good man, thanks
@ravitejapavuluri945
@ravitejapavuluri945 3 жыл бұрын
You said everything but but missed the one I am waiting for is filling NaN with mean or media values.
@sheetalurankar4660
@sheetalurankar4660 4 жыл бұрын
Explain the more concepts like standardizing, matching, consolidation so we get all idea about data cleaning.
@waynefmj
@waynefmj 3 жыл бұрын
This
@Young-Prof
@Young-Prof 2 ай бұрын
This is amazing. I learned a lot. I want to come to India to study Data Analytics
@sifar1857
@sifar1857 2 жыл бұрын
Great tutorial! No need to hesitate on referring to the code snippets btw… I don’t think any sane person watching this has the expectation for you to memorise a to z what you want to articulate…
@rommel23nb
@rommel23nb Ай бұрын
Thanks Mr. Shah--- I used these commands to prepare a cheat sheet for data cleaning--- regards
@sayantikabiswas8739
@sayantikabiswas8739 Жыл бұрын
thanks for teaching ffill, bfill, interpolate and fillna ...it was a great session
@thewhiskybottle4641
@thewhiskybottle4641 Жыл бұрын
Take the ads off please, thanks, great tutorial by the way.
@aliaitazaz7040
@aliaitazaz7040 2 ай бұрын
Learnt something new, THANK YOU!
@sammy0722
@sammy0722 4 жыл бұрын
Nice way of explaining. One request, do make one video on Outlier removal. Thanks.
@SoumilShah
@SoumilShah 4 жыл бұрын
you got it !! will add on my to do List
@harshvardhansahay3864
@harshvardhansahay3864 Жыл бұрын
This video has really me understand the data cleaning process. Thanks Man.
@RavikaUniverse.
@RavikaUniverse. 11 ай бұрын
awesome ,good and precise data cleaning please load some more stuff related to pandas
@kousumichaudhuri3793
@kousumichaudhuri3793 Жыл бұрын
Thanks a lot bro. The way you explained the steps are really helpful.
@shyamkumar-fh2fh
@shyamkumar-fh2fh 4 жыл бұрын
Can u tell about yourself... The company u are working and can u give some tips to get a job in data science field as a fresher
@santhoshbharath2910
@santhoshbharath2910 2 жыл бұрын
hats off to you man, you really made my day, you gave me a good confidence today, once again thankyou so much sir
@ganeshkumarpatel
@ganeshkumarpatel 4 жыл бұрын
Please explain to fiilna or replace zero with mean value by groupby... Means you have 3 groups in data frame and you want to fillna with respective to group mean
@RaviPrakash-ml8qb
@RaviPrakash-ml8qb Жыл бұрын
Very useful video keep it up champ
@user-zu7wc4xr2m
@user-zu7wc4xr2m Жыл бұрын
essentially you explained it very well
@laibakhan1835
@laibakhan1835 Жыл бұрын
Great job well done n thanx a lot ...I explained v well
@michaelchapisa3709
@michaelchapisa3709 19 күн бұрын
I got lost at the very beginning, from the "print(os.listdir())" what i got as my output is very different from what you got
@slayergaming1852
@slayergaming1852 Жыл бұрын
This video really helped me a lot, but I still got more to understand. I've zero basic knowledge on this. I'm working on a thesis which needs some coding to complete. I've few questions to what you've explained in this video; 1. What if there are lot of dataset and how do you define the missing value for each? 2. What was that in the missing value you defined "np.nan" ? As I said earlier, I'm working on a project which is about human-in-the-loop code. Initially I'll be given the dataset and have to figure out a code to include human for feedback from system (Reinforcement Learning). I would like get your response, and if possible any helpful idea or suggestions on the project mentioned above. Thank you
@debanjangg
@debanjangg Жыл бұрын
1. You can use separate lists (with diff variable names,) or a single list as a master list for all the datasets. Depends on the said datasets and the data they contain. 2. np.nan is the "NaN" value in the dataset, which means Not a Number. So basically the np.nan returns a float object whose value is NaN. Hope this helps.
@slayergaming1852
@slayergaming1852 Жыл бұрын
@@debanjangg That was helpful. Thanks mate!🤩
@luckycreative7418
@luckycreative7418 10 ай бұрын
1) u can also use unique function to get only unique values Example df['Customers'].unique in the example above u will get all the unique values in the column 'Customers'
@poojakumarirollno9880
@poojakumarirollno9880 10 ай бұрын
Great job sir . Thank you so much for great explaining . Can u make more videos on data analytics
@alaberedaisy8171
@alaberedaisy8171 5 ай бұрын
Thank you so much. This was really helpful
@jaiprakashlic484
@jaiprakashlic484 Жыл бұрын
BRO YOU SHOULD HAVE UPLOADED CSV FILE AS WELL.
@littlecreator4838
@littlecreator4838 2 жыл бұрын
Hi,very nice explanation. I am totally new to python. Can you pls make a tutorial on how to install jupyter and all the other required libraries to perform forecasting.
@ShresthBhakta
@ShresthBhakta Жыл бұрын
Thanks Soumil, really helpful !!
@hamdansiddiqui3294
@hamdansiddiqui3294 2 жыл бұрын
Great video very helpful
@spyder5204
@spyder5204 2 жыл бұрын
Good and nice explaination
@ogunoyeadebamigbe1715
@ogunoyeadebamigbe1715 Жыл бұрын
Good job
@crazystuff5854
@crazystuff5854 3 жыл бұрын
Superb broo Rock on
@eshaal2525
@eshaal2525 Жыл бұрын
Hi...I have na values but tha boxplot is even not showing it null...
@balatechtvm1438
@balatechtvm1438 Жыл бұрын
Thank you sir. I'm very satisfied
@AyaanKhan-rh5vx
@AyaanKhan-rh5vx Ай бұрын
I have a csv file and when i am using concat function it automatically name unnamed group 1,2,3... Also the alignment gets messy with songle line of code How to fix it
@harrymary100
@harrymary100 3 жыл бұрын
Nice tutorial keep it up
@angshumanbardhan3729
@angshumanbardhan3729 4 жыл бұрын
Thank you for making this video.
@enricomendiola9952
@enricomendiola9952 Жыл бұрын
This is a great video😊
@samimerk5313
@samimerk5313 2 жыл бұрын
Thank you for thr explanation!
@Ayanshedipelly2312
@Ayanshedipelly2312 2 ай бұрын
How to do interpolation for categorical variable
@akashme-ek3vc
@akashme-ek3vc 2 жыл бұрын
please donot stop uploading such videos
@Nitswits007
@Nitswits007 4 жыл бұрын
Would you be able to provide mentorship. I have started learning DS . Want to keep moving in a direction .jyst don't stop due to coding lag.
@kelikisbiyantoro2518
@kelikisbiyantoro2518 3 жыл бұрын
thankyou very much soumilshah, its help for me
@sudeepjayaprakash9224
@sudeepjayaprakash9224 Жыл бұрын
Thank you sir helped me a lot
@kamalkantmahour9641
@kamalkantmahour9641 4 жыл бұрын
Sir, please tell me the book from where I can learn all these concepts and programming skills required for this.
@gideonopoku-gyamfi1114
@gideonopoku-gyamfi1114 Жыл бұрын
So please which do you think is more efficient to be used without changing the accuracy of the data
@GracefulTalesPluto
@GracefulTalesPluto 10 ай бұрын
what is the use of ffill? Isn't it data corruption? filling nulls with values from above rows
@mohammedkaifmirza7585
@mohammedkaifmirza7585 2 жыл бұрын
overall good tutorial, the only thing that is missing is source code (jupyter notebook)
@rinkubaria3900
@rinkubaria3900 Жыл бұрын
So helpful 👌
@machinelearning1357
@machinelearning1357 Жыл бұрын
really great
@laithdarras6389
@laithdarras6389 Жыл бұрын
Very helpful
@rakshansadhu2073
@rakshansadhu2073 3 жыл бұрын
Loved it man
@world52love
@world52love Ай бұрын
how to handle zero values in csv file and how to fill those values
@SONALIKUMARI-is9jc
@SONALIKUMARI-is9jc 4 жыл бұрын
What if the file is not there on which we want to work?
@rashigupta5611
@rashigupta5611 2 жыл бұрын
How to get how many type different type of value is there to put in na_values? I mean to say the value you have mentioned for missing_value.. how you are getting that.. we cant check the file if that has huge data
@vandanasharma.sharma33
@vandanasharma.sharma33 4 жыл бұрын
Can't we add any value to these NaN?
@minhaaj
@minhaaj 4 жыл бұрын
good job
@nazeer9933
@nazeer9933 Жыл бұрын
The dataset which you have is having fewer instances what if we have thousands of rows of data how to find Nan, and Na there in the dataset ...? if you see this please respond ASAP
@jayakhanal1720
@jayakhanal1720 3 жыл бұрын
how to cleaning data that is TZAN dataset from kaggle for,music genre classification using cnn?
@sonalikoli384
@sonalikoli384 Жыл бұрын
i have question i wrote print(os.listdir()). but i got many files that is inside my jupyter. may i know how can i import my csv file that i have clean.
@srujangowda8490
@srujangowda8490 3 жыл бұрын
5:09 OP
@kunalkishore5260
@kunalkishore5260 2 жыл бұрын
what to mention in na_values if we dont know the missing vlaues or there are hundreds of different missing values
@bloom6874
@bloom6874 Жыл бұрын
here values in missing_values list are case-sensitive or case-insensitive?
@husnarazool9866
@husnarazool9866 3 жыл бұрын
How to save the dataset after cleaning process in python?
@rishibaul5000
@rishibaul5000 2 жыл бұрын
Use file operations
@NaveenKumarsoma
@NaveenKumarsoma 4 жыл бұрын
why you have used np.nan in the mission_values?
@asifurchoudhury9905
@asifurchoudhury9905 3 жыл бұрын
SAME QUESTION.
@thekras177
@thekras177 Жыл бұрын
I refer to using mean but thank you it was helpful
@deutschvalley3574
@deutschvalley3574 3 жыл бұрын
How i can handle float values 2.o or 0.04576 kindly let me i am doing a research using a big datasets
@user-iu2ph6ws8i
@user-iu2ph6ws8i Жыл бұрын
Hii Soumil, right now I'm working on language translation project for that I have collected the data, but I'm facing preprocessing data could you please help me with that.
@bhawitbalodi4324
@bhawitbalodi4324 2 жыл бұрын
Pls can you tell me that from where you bought that LAPTOP stand? Pls attach link in comment
@greeshmatejamyna
@greeshmatejamyna Жыл бұрын
i can use excel for it right ? my work will be more easy kindly persuade me y i have to use python instead of excel here
@Goal_Huntter_16
@Goal_Huntter_16 Жыл бұрын
I have to say. "Very Helpful". or print("very helpful") 😂😂
@rohitsanam8829
@rohitsanam8829 4 жыл бұрын
Please make a video outlier treatment and detection
@abhinavsingh4208
@abhinavsingh4208 3 жыл бұрын
Thanks for this video !
@aditi4677
@aditi4677 3 жыл бұрын
This was really useful. Thankyou!
@ehteshamali2893
@ehteshamali2893 3 жыл бұрын
interpolation is basically like average? right?
@enricomendiola9952
@enricomendiola9952 Жыл бұрын
Hello can you include in this video in cleaning special characters using pandas regular expression?
@Lejhand10
@Lejhand10 4 жыл бұрын
well explained .
@comptegmail273
@comptegmail273 2 жыл бұрын
Hello sir, thank you so much for the tutorial. I'm actually stuck since my source in a CSV file. Except that sadly the file I'm working is extremely complex with indefinete columns since my main columns are repeated everyday based on the date. I've been stuck on this problem since over a week. Is there a way I could reach out to you and have your mail to maybe help solve this problem? Thanks a lot in advance.
@alinajaved2165
@alinajaved2165 2 жыл бұрын
I need your help please please...how it work in automatically cleaning data?
@mahidhar9787
@mahidhar9787 3 жыл бұрын
how can we replace using mean, median & mode
@IndianHacker-hisBest
@IndianHacker-hisBest Жыл бұрын
bro, could you please share the dataset in the description ?
@sunilkhandale9232
@sunilkhandale9232 3 жыл бұрын
I want to replace 4wd with fwd in particular column please help
@leticiavillatima1844
@leticiavillatima1844 3 ай бұрын
AttributeError Traceback (most recent call last) Cell In[7], line 1 ----> 1 pd_cleaned = pd.dropna() AttributeError: module 'pandas' has no attribute 'dropna i can't find drop na
@kiranpawar8798
@kiranpawar8798 5 ай бұрын
What is noisy data
@ranamahrous7814
@ranamahrous7814 3 жыл бұрын
i want messy dataset for practicing do u know where can i find one? or do u have one ?
@shilpachowdhury8860
@shilpachowdhury8860 2 жыл бұрын
Sir, by running the data cleaning code in jupyter notebook by following the same code instruction given by u, when i run the code in the output it is not showing unnamed:0 temperature humidity & in my jupyter system it is showing such as v1 &v2 in the output.Why it is so?can u plz explain.
@deepthi5970
@deepthi5970 Жыл бұрын
I also faced the same problem. When we creating a new csv file there is no unnamed 0 : column... But if we saved the same file as csv into a folder it will create a new column lke unnamed 0: If we read this data the output will be like in this video.. If repeated each time one extra column will add. For avoiding use index= False while saving a code. It will work
@sabbirahmmed7161
@sabbirahmmed7161 2 жыл бұрын
Thanks a lot ❤️
@siddheshbhalerao3152
@siddheshbhalerao3152 Жыл бұрын
sir, facing a issue in a code to convert the variable from object to integer in jupyter notebook:- it shows the error:-invalid literal for int() with base 10: '-'
@bloom6874
@bloom6874 Жыл бұрын
you can typecast object into int
@kyleevalencia1827
@kyleevalencia1827 3 жыл бұрын
Sir, i'm still new in python and this data cleaning thing. And i want to ask what is 11 in df11 ?? is it some kind of function ?? and i also don't understand the snippet concept
@kashyapsantoki4889
@kashyapsantoki4889 3 жыл бұрын
df11 is just variable you can take anything df11a,df12 df15 anything and snippet is basically a piece of code which he already have
@harshalkshirsagar7618
@harshalkshirsagar7618 Жыл бұрын
yo thanks a lot
@aiswaryacd7419
@aiswaryacd7419 2 жыл бұрын
How install Android phone Jupiter notebook
@darshan7673
@darshan7673 3 жыл бұрын
from where i can download this dataset can anyone provide me link
@marouaneyoussfi3560
@marouaneyoussfi3560 4 жыл бұрын
thank you
@ehteshamali2893
@ehteshamali2893 3 жыл бұрын
Brother Great video. I need the tea as well. ;)
@mayukbasu7228
@mayukbasu7228 4 жыл бұрын
Where can I get the data set?
@maniteja2167
@maniteja2167 4 жыл бұрын
nan palce -1 how to clean that
@shafiqahmed1976
@shafiqahmed1976 3 жыл бұрын
data.fillna({"P11,":NEGATIVE}) NameError: name 'NEGATIVE' is not defined ?? Please help me ?what can I can do??
@leavemealone3198
@leavemealone3198 3 жыл бұрын
Maybe you didn’t install the libraries?
Real World Data Cleaning in Python Pandas (Step By Step)
40:01
Ryan & Matt Data Science
Рет қаралды 67 М.
Python for Data Analysis: Exploring and Cleaning Data
28:22
DataDaft
Рет қаралды 39 М.
لااا! هذه البرتقالة مزعجة جدًا #قصير
00:15
One More Arabic
Рет қаралды 51 МЛН
Jumping off balcony pulls her tooth! 🫣🦷
01:00
Justin Flom
Рет қаралды 36 МЛН
Smart Sigma Kid #funny #sigma #memes
00:26
CRAZY GREAPA
Рет қаралды 19 МЛН
Они так быстро убрались!
01:00
Аришнев
Рет қаралды 3 МЛН
Ease of Install
2:04
Check Point Software
Рет қаралды 4
How is data prepared for machine learning?
13:57
AltexSoft
Рет қаралды 56 М.
Clean Excel Data With Python Pandas - Removing Unwanted Characters
5:52
Derrick Sherrill
Рет қаралды 111 М.
Data Preprocessing in Machine Learning | Complete Steps - in English
38:57
WsCube Tech! ENGLISH
Рет қаралды 65 М.
Data Cleaning Project in Python
40:48
Her Data Project
Рет қаралды 24 М.
Data Cleaning in Pandas | Python Pandas Tutorials
38:37
Alex The Analyst
Рет қаралды 282 М.
لااا! هذه البرتقالة مزعجة جدًا #قصير
00:15
One More Arabic
Рет қаралды 51 МЛН