CODE: github.com/ash... DATA : github.com/ash... / ashokveda
Пікірлер: 232
@pandharpurkar_3 жыл бұрын
best teacher i have ever seen! Explaining in very proper way! in short time explaining exact things!!!
@DataMites3 жыл бұрын
Thank you!
@JainmiahSk4 жыл бұрын
Data Mites is a hidden gem now but soon they will be a Brand for Data Science. Keep my note for Future.
@DataMites4 жыл бұрын
Thank you 😊
@akshiwakoti78514 жыл бұрын
A real pro! Subbed this channel after watching first 3 minutes. Glad to have found it.
@DataMites3 жыл бұрын
Thank you so much.
@donaloleary55143 жыл бұрын
Thank you, Ashok! This is an outstanding explanation of a complex subject. You make it all feel very intuitive. Awesome stuff - I will look for more DataMites videos in the future!
@DataMites3 жыл бұрын
"Hi, Donal O'Leary, Thanks for your comment and keep on visiting our channel for more and updated content."
@bhagwatchate75114 жыл бұрын
Amazing in depth explanation! I was exactly searching for this type of explanation.. Thanks for sharing
@DataMites3 жыл бұрын
Glad it was helpful!
@SurajSingh-wn4wu4 жыл бұрын
Great Ashok.!! Genuinely liked your way of explanation in depth and the solution... Glad i landed on your page... Thank You..!
@DataMites3 жыл бұрын
Thanks and welcome
@MLA263 Жыл бұрын
Thanks Ashok, very clear and simple explanation.
@DataMites Жыл бұрын
Thank You
@user-dn8uc5sc8l8 ай бұрын
Wow sir liked u r session .please continue posting such videos
@user-km4hl8lx8x Жыл бұрын
This is really helpful and thank you again!
@DataMites Жыл бұрын
Glad it was helpful! Keep Watching!
@alisalariyan66763 жыл бұрын
The best smote tutorial I've seen. Thanks
@DataMites3 жыл бұрын
Glad it was helpful!
@lalithapriya94843 жыл бұрын
extreme clarification really superb teaching skills along with good communications
@DataMites3 жыл бұрын
Hi lalitha priya, thank you for you comment.
@binoypaul97723 жыл бұрын
Nice and informative. Please keep up the good work.
@DataMites3 жыл бұрын
Thank you.
@milliekim50723 жыл бұрын
Thank you so much, sir! I hope I see more videos
@DataMites3 жыл бұрын
Keep watching.
@osamaamir9311 Жыл бұрын
Such an amazing topic
@DataMites Жыл бұрын
Thank You
@b1k1m14 жыл бұрын
Hello Sir, Thanks for explaining this very clearly.. keep it up....
@DataMites3 жыл бұрын
You're most welcome
@adeyinkasotunde68704 жыл бұрын
wow...... i am very well impressed. well explained. thanks
@DataMites3 жыл бұрын
You are most welcome
@ChrisHalden007 Жыл бұрын
Great video. Thanks
@DataMites Жыл бұрын
Glad you like it! Keep Supporting
@jagannadhareddykalagotla6243 жыл бұрын
DataMites is like hidden pattern in unsupervised learning thank you so much ashok❤️❤️
@DataMites3 жыл бұрын
Thank you!
@dewipurnamasari5814 Жыл бұрын
Thank you very much
@DataMites Жыл бұрын
Most welcome! Keep Watching
@inspiritlashi99943 жыл бұрын
Thank you so much for the great tutorial.. As someone who does not have even the basic knowledge of python, I could learn many things from you, sir.
Hai sir! thanks a lot for very simple and clear explanation.keep going we expect more videos from you...
@DataMites2 жыл бұрын
Keep watching
@alishahsaber37953 жыл бұрын
Thank you so much!!! Really helpful. thanks
@DataMites3 жыл бұрын
Glad it helped!
@Cobra-bo1fy2 жыл бұрын
excellent explanation!
@DataMites2 жыл бұрын
Thank you.
@siddhantagarwal2744 жыл бұрын
Nicely explained. Thanks!
@DataMites3 жыл бұрын
You're welcome!
@niswandi6122 Жыл бұрын
Thank you ashok, clear explanation, but howto handle the imbalanced datasets if we have 4 classes?
@DataMites Жыл бұрын
For multiclass also same technique is applied as that of 2 classes
@svitirur16653 жыл бұрын
very good explanation
@DataMites3 жыл бұрын
Keep watching
@ombb35763 жыл бұрын
Thank you for your sincere lecture sir
@DataMites2 жыл бұрын
You are most welcome
@defres152 жыл бұрын
Great video. Great explanation. Thank you
@DataMites2 жыл бұрын
You are welcome!
@riorizkiaryanto3 жыл бұрын
Great video and explanation! Thanks.
@DataMites3 жыл бұрын
You're welcome!
@ringgaershaikhwani3478 Жыл бұрын
hello sir, the material that you explain is very easy to understand. I want to ask about my project. I have imbalanced data, then I do smote and I model it with KNN, but why after smote does the accuracy go down? 79% to 78%, is there something wrong with my data? Can you help explain this? I am very grateful if you respond to my comment.
@DataMites Жыл бұрын
Using SMOTE, your model will start detecting more cases of the minority class, which will result in an increased recall, but a decreased precision. Accuracy is not a good measure of performance on unbalanced classes. That's because SMOTE technique puts more weight to the small class, makes the model bias to it. The model will now predict the small class with higher accuracy but the overall accuracy may decrease.
Nice explanation .. Looking for more NLP related video
@DataMites3 жыл бұрын
Sure
@athilakshmir85893 жыл бұрын
nice explanation
@DataMites3 жыл бұрын
Thank You!
@AsiaMSaeed2 жыл бұрын
Amazing. Thanks a lot.
@DataMites2 жыл бұрын
You are welcome!
@muhammedalisahan9661 Жыл бұрын
Firstly, Thank you for sharing. I wanna ask something about time series. I have lots of data. But datas are different frequency. I wonder how deal with all datas. And assume that datas edited to same frequency. By the way datas are not fitted normal distribution so imbalanced that's why i am asking. If datas be same frequency, Smote can be appliable for time series? If not how to resample my time series?
@ffckode4 жыл бұрын
Thanks for sharing. Very helpful
@DataMites3 жыл бұрын
Glad it was helpful!
@tanvipataskar45974 жыл бұрын
Amazing Explanation!!! Thankyou.
@DataMites3 жыл бұрын
You are welcome!
@babukoshy3 жыл бұрын
This was a great lesson. Thanks a lot
@DataMites3 жыл бұрын
You're very welcome!
@perusona_desu5534 Жыл бұрын
in oversampling do you have to make the minority class instances equals the majority class instances ? for example: can it be 900 nc and 800 c
@DataMites Жыл бұрын
Oversampling is increasing the samples for minority class to match with the majority class. Undersampling is reducing the samples for majority class to match with minority class.
@michaelpanashemudimbu74053 жыл бұрын
Awesome video
@DataMites3 жыл бұрын
Glad you enjoyed it
@swastiknayak51734 жыл бұрын
At 8.15 you have said it is taking the average of centroids which is completely wrong. SMOTE is calculated over the feature space...it goes like this 1. we take the feature vector of the minority class point. 2. we calculate the distance between the neighbours (neighbours=5). 3. we multiply the distance between the neighbours with a random number that is created between 0 &1. 4. Then we create the synthesized point. hope you got it 😀
@DataMites3 жыл бұрын
SMOTE works by selecting examples that are close in the feature space, drawing a line between the examples in the feature space and drawing a new sample at a point along that line. Specifically, a random example from the minority class is first chosen. Then k of the nearest neighbours for that example are found (typically k=5). A randomly selected neighbour is chosen and a synthetic example is created at a randomly selected point between the two examples in feature space.
@petersq55322 жыл бұрын
how split stratify solves the problem?
@MrMehshankhan3 жыл бұрын
thank you so much man. great thumbs up...
@DataMites3 жыл бұрын
You're welcome!
@seeutube88602 жыл бұрын
Nice video. After applying smote, balanced data was obtained. But balanced data (X_smote,y_smote) was not split (80:20) in to train n test data sets before reapplying classification model? Is it necessary or not to split the data again? Or orginal dataset itself was considered as test dataset.
@DataMites2 жыл бұрын
We have already split and then we balanced the data. So not required to split again.
@younesgasmi85186 ай бұрын
Thanks so much bro..i have shown some data scientists used undersampling and oversampling before Splitting the dataset into training and testing..in my research paper we heve used NEARMISS technique to balance the dataset..i have got a good results with using cross validation Splitting and Extra tree classifier as model and also the same model to select the best importance features where my results are : (ACC 0.97 , F1 0.97 and AUC 0.99) are there results may be accepted for publishing?
@DataMites6 ай бұрын
You achieved good results. However, whether your results are acceptable for publishing depends on several other factors too.
@mozaffarhussain54964 жыл бұрын
Best Explanation sir ..............!
@DataMites3 жыл бұрын
Keep watching
@HarishKumar-qj9pp3 жыл бұрын
getting attribute error: 'SMOTE' object has no attribute 'fit_sample' but I have all the packages requirement satisfied still showing the error
@DataMites3 жыл бұрын
Hi please check imbalanced-learn.org/stable/over_sampling.html for any update in imbalance learn package
@tahanics9012 жыл бұрын
Very good explanation Thanks. but this code, is applicable with text data (tweets) or not?
@DataMites2 жыл бұрын
yes after converting text to numerical vectors. use fit_resample()
@kurniawandk50782 жыл бұрын
Very informative, i have a question sir, it is possible to set how many synthetic data created by smote ? in example i want to set n_sample increase to 200% so, how to put this parameters in pyhton code ?
@DataMites2 жыл бұрын
Your question is not clear. Can you elaborate plz?
@amruthakommu46952 жыл бұрын
Great Ashok. That was a well explained video. I tried the same thing on my data set but my accuracy came down from 94 to 86. What could be the cause?
@DataMites2 жыл бұрын
Hi, we cannot comment until we look in your data and all the approaches that you have taken. One of the possibility might be your prediction was previously overfitted.
@ishan74913 жыл бұрын
Can you please explain this part of the code in the label encoder section:
@DataMites3 жыл бұрын
Hi Ishan, please reframe your query.
@samhugh98913 жыл бұрын
great video, thank you!
@DataMites3 жыл бұрын
You are welcome!
@canancetin78973 жыл бұрын
Great video! Thanks a lot!!!
@DataMites3 жыл бұрын
Glad you liked it!
@sasidharansathiyamoorthy69183 жыл бұрын
Thank you for the informative video! In this video, you have used SMOTE to rectify imbalance in target label. What methods can we use to deal with class imbalance in categorical features( input) in order to make the model more robust?
@DataMites3 жыл бұрын
Hi Sasidharan Sathiyamoorthy, Its property of input so if u balance the input it might affect the target variable. Make 2 models with and without balancing n check the performance
@RoyalRealReview2 жыл бұрын
@@DataMites sir if we have 54% persons cancer patients and 46% non-cancer patients then do we need balancing? If yes then which balancing technique should be selected?
@cliffordtarimo15113 жыл бұрын
Great video on SMOTE. Do you have a video on undersampling? Can someone perform both undersampling and oversampling in one line of code??? THANKS.
@DataMites3 жыл бұрын
The other flavor of SMOTE is SMOTETOMEK which uses undersampling of majority class and upsamping of minority class.
@sushmithajanapati77852 жыл бұрын
Does Smote algorithm support Multi output classification?
@DataMites2 жыл бұрын
Yes, you can use SMOTE.
@sandeshbapu15674 жыл бұрын
Nicely explained
@DataMites3 жыл бұрын
Thank you so much 🙂
@JainmiahSk4 жыл бұрын
you haven't encoded the target variable?
@DataMites4 жыл бұрын
Target variable needn't require encoding
@lavanyanayak87073 жыл бұрын
Thank you very much for this video. I have a precipitation dataset containing 4 columns and 8000 rows, each of them has a lot of zeros and only a few continuous values. I would like to know if I can use smote in this case?
@DataMites3 жыл бұрын
Hi Lavanya Nayak , Github link is provided in the description. please check it out.
@chinedumjoseph98753 жыл бұрын
Oh! I got it. Don't worry. Thanks
@DataMites3 жыл бұрын
You're welcome
@inspiritlashi99943 жыл бұрын
Hi, can I know how did you correct it? i got the same error message
@rukaiyaa1912 жыл бұрын
which module is used for alternative module of imblearn in python sir(for handling imbalance dataset)
@DataMites2 жыл бұрын
For balancing the dataset we have only imblearn module. But there are other ways to deal with the imbalanced dataset.
@abhimynampati29292 жыл бұрын
Hey Ashok, can u make a video on dsste algorithm for removing class imbalance?
@DataMites2 жыл бұрын
Will do in future session.
@abhimynampati29292 жыл бұрын
@@DataMites awesome! Will be waiting.
@zakariaabderrahmanesadelao30484 жыл бұрын
what a crystal clear explanation. thank you.
@DataMites3 жыл бұрын
You're very welcome!
@jongcheulkim72842 жыл бұрын
Thank you so much. ^^
@DataMites2 жыл бұрын
You're welcome 😊
@insidiousmaximus3 жыл бұрын
great video thank you. I am trying to figure out how to use this with a generator flowing from directory?
@DataMites3 жыл бұрын
"Hi insidiousmaximus, thanks for reaching us with your query. Can you please put your query more precisely so that we can help you?"
@OriginalBernieBro4 жыл бұрын
Running into a problem with sklearn 'support' column still looking unbalanced after smoting on print(classification_report(y_test, y_pred)) what gives?
@DataMites3 жыл бұрын
The support is the number of samples of the true response that lie in that class.
@ShubhamKumar-id6pf4 жыл бұрын
SIr, I went on as per the recommended procedures but my jupyter environment giving an AttributeError that SMOTE object has no attribute '_validate_data'. Can you please help me with the.
@DataMites3 жыл бұрын
You need to upgrade scikit-learn to version 0.23.1.
@oumaimasouid52293 жыл бұрын
i find this error >> plz help !
@DataMites3 жыл бұрын
Hi, please use fit_resample
@snehasamadder37902 жыл бұрын
after I resample an imbalance dataset how can I download the resampled dataset from colab?
@DataMites2 жыл бұрын
Combine the resampled x and y and create a new dataframe, then convert that dataframe to a csv file using to_csv()
@wajeehanaz91152 жыл бұрын
Hello Sir! can you please tell me how to generate images using smote technique ??? Thanks in advance...
@DataMites2 жыл бұрын
For image generation we have a different method called Data Augmentation it will newly create synthetic data from existing data.
@kunalgoyal85294 жыл бұрын
While dividing training and test data shouldn't you be doing "stratify=y" ? To ensure test data and training data set have equal proportion of outcome variable?
@mr.techwhiz44074 жыл бұрын
that would be undersampling
@DataMites3 жыл бұрын
The aim of machine learning model is to generalization on training set so that performance on unseen Data is good.We don't care what the test data consist instead we try to given more generalized pattern to the algorithms.
@wenshanpan87263 жыл бұрын
Excellent!
@DataMites3 жыл бұрын
Thank You!
@inspiritlashi99943 жыл бұрын
Sir, Can I know how to run a logistic regression on the oversampled dataset?
@DataMites3 жыл бұрын
Hi Inspirit Lashi, you can use SMOGN for preprocessing of your dataset. More more information: proceedings.mlr.press/v74/branco17a/branco17a.pdf
@hendripriyambowo14274 жыл бұрын
hi sir i have question how did we implement those resampling technique in neural network, let say if we implement embedding layer and work with multiple kind of data is that resampling technique make our data losing such information?
@DataMites3 жыл бұрын
You can use mini-batch SGD optimizer to handle imbalance dataset.
@dkandasamypandian7193 жыл бұрын
Good
@DataMites3 жыл бұрын
Thank You!
@sunnyarora49163 жыл бұрын
Any video where we use SMOTE for regression??
@DataMites3 жыл бұрын
Hi Sunny Arora, you can use SMOGN for it. More more information: proceedings.mlr.press/v74/branco17a/branco17a.pdf
@sunnyarora49163 жыл бұрын
@@DataMites Thank you, is it less likely to use SMOGN?
@aiswaryalakshmi1349 Жыл бұрын
Cannot install imblearn. Kindly help me with this
@DataMites Жыл бұрын
Once you install imblearn, restart the kernel. If it doesn't work try any of these codes: "!pip install delayed" or "pip install --user imblearn"
@anaghadamame1963 жыл бұрын
Thank you sir...👍
@anaghadamame1963 жыл бұрын
Can you explain which algorithm should be selected for regression problem....it will help me alot
@DataMites3 жыл бұрын
All the best
@vivekuk43293 жыл бұрын
hi sir need to join in ur classes how to approach you
@DataMites3 жыл бұрын
Hi Vivek uk , please share your email id and contact number. Our educational counselor will share the details. You can contact our counselor directly at 18003133434. For more info datamites.com/
@chinedumjoseph98753 жыл бұрын
Thank you for this nice explanation. I was making progress with the codes but when I tried to fit using the command X_train_smote, y_train_smote = smote.fit_sample(X_train.astype('float'),y_train), I got error saying AttributeError: 'SMOTE' object has no attribute 'fit_sample'. I need urgent help please. Thank you
@DataMites3 жыл бұрын
Hi Chinedum Joseph, can you please list the version of python and scikit learn in your system?
@ObaidoGeorge2 жыл бұрын
Use smote.fit_resample instead of smote.fit_sample.
@AbdulLatif-fu9jz Жыл бұрын
@@ObaidoGeorge Tqvm for your help
@Adinasa2 Жыл бұрын
AttributeError: 'SMOTE' object has no attribute 'fit_sample'
@DataMites Жыл бұрын
Use smote.fit_resample
@patrickbormann81034 жыл бұрын
Amazing!
@DataMites3 жыл бұрын
Thanks!
@terryterry37333 жыл бұрын
Hi sir what is the data type for outcome ? i think it is in object . Did u convert that into float or int?
@DataMites3 жыл бұрын
"Hi Terry, thanks for reaching to us regarding your queries. Outcome datatype is in the string and we label encoded it to an integer."
@shivki234 жыл бұрын
subscribed for ur content
@DataMites3 жыл бұрын
Thank you
@patelajay10103 жыл бұрын
I have one doubt. What if data contains Nan values and you want to do under_sampling? If you impute Nan values with Mean() then there will be information leakage because we impute data before splitting it into train and test dataset. Could you please tell me what should be the possible solution in this case?
@DataMites3 жыл бұрын
Hi Ajay Patel, if you have a large dataset, you can certainly drop the Nan Values
@patelajay10103 жыл бұрын
@@DataMites Sir I have continuous data coming from sensors. Dropping few rows will lead to break a pattern.
@DataMites3 жыл бұрын
@@patelajay1010 In that case without knowing the source and significance of your nan value, we cannot comment on anything.
@patelajay10103 жыл бұрын
@@DataMites ok sir. Thank you for your response.
@mohan250s2 жыл бұрын
ur awesome
@DataMites2 жыл бұрын
Thank you.
@sanyajain21274 жыл бұрын
Getting an error: ValueError: Unknown label type: 'continuous-multioutput'
@DataMites3 жыл бұрын
It can due to multiple reasons like in logistic-regression doing classification more than 2 classes. Or due to the use of classifier if the target variable is continuous.
@parthasarathyk54762 жыл бұрын
Hi, did anyone applied this concept to image dataset. please anyone let me know...
@DataMites2 жыл бұрын
For image generation you can use method called Data Augmentation it will newly create synthetic data from existing data.
@Adinasa2 Жыл бұрын
Pls share the notebook and input file
@DataMites Жыл бұрын
@Aditya Gupta Please Check in Description. Its available there.