No video

Machine Learning with Imbalanced Data - Part 2 (Cost-sensitive Learning)

  Рет қаралды 8,486

Dr. Data Science

Dr. Data Science

Күн бұрын

In this video, we discuss the class imbalance problem and several strategies to address this problem. Existing methods can be divided into data-level preprocessing methods (resampling), cost-sensitive learning, and ensemble learning. The main focus of this video is cost-sensitive little with an emphasis on logistic regression classifier. We use the scikit-learn implementation of logistic regression to improve the performance of logistic regression on the thyroid data set. We use precision and recall scores as evaluation metrics in Python.
#ImbalancedData #CostSensitiveLearning #Classification

Пікірлер: 12
@robyattoillah6272
@robyattoillah6272 Жыл бұрын
Thanks for the explanation. Great for cost-sensitive introductory
@DrDataScience
@DrDataScience Жыл бұрын
Thanks!
@SP-db6sh
@SP-db6sh 3 жыл бұрын
Thank you sir for such a simple explanation.
@parisahajibabaee2893
@parisahajibabaee2893 3 жыл бұрын
very useful video, highly recommended!
@teamtom
@teamtom 2 жыл бұрын
thank you for this video! can you please make it clearer for me how does changing the threshold value of logistic regression relates to this topic? can it be an optional way to tackle imbalanced dataset problem?
@DrDataScience
@DrDataScience 2 жыл бұрын
Changing the threshold allows you to control how confident you are when labeling a new data point. For example, you can say that a data point is positive when the corresponding probability exceeds 0.90 rather than 0.50.
@uditg
@uditg 2 жыл бұрын
Changing threshold is a way to further calibrate your recall/ precision based on business requirement. Remember there is an offset between recall/ precision - so if you improve one, the other will degrade. F-score is a metric that is based on both precision and recall - and in some cases, you pick a threshold that maximizes F-score.
@teamtom
@teamtom 2 жыл бұрын
@@uditg i wonder if it is reasonable to change the threshold during the training process? i mean if we have an imbalanced dataset and a simple logistic regression can we calculate the confusion matrix using eg. threshold=0.1 during the training?
@uditg
@uditg 2 жыл бұрын
@@teamtom Yes, you should certainly determine best cutoff/threshold during training. There is no reason to use the default 0.5 cutoff - it's just there be default, because objectively (from bayesian standpoint) it makes most sense. I'll add one more thing, finding threshold by optimizing F-score is one approach which basically assigns same 'weight' to false positives and false negatives. But in a specific business use case - you might have low cost for (say) false positives, but high cost for false negatives - in that case, instead of calculating and optimizing F-score, you can calculate a cost function and find a cutoff that optimizes that cost function.
@maryamzeinolabedini1515
@maryamzeinolabedini1515 2 жыл бұрын
Hi, Thank you for this video. I have a question. If we want to use cost sensitive for prediction with imbalanced data and bayesian network, how can we implement it? Because you fit logistic regression on the data while our main model for prediction is bayesian and it mus be fitted on data.
@piaolmedoruiz8518
@piaolmedoruiz8518 2 жыл бұрын
thanks for your videos, could you recommend me a text to learn more about this topics?
@DrDataScience
@DrDataScience 2 жыл бұрын
Sure, this is a good one: link.springer.com/article/10.1007/s13748-016-0094-0
ISSEI & yellow girl 💛
00:33
ISSEI / いっせい
Рет қаралды 19 МЛН
Они так быстро убрались!
01:00
Аришнев
Рет қаралды 3 МЛН
Advanced Machine Learning- Imbalanced Learning - Cost-Sensitive Learning 2
18:03
Statistical Learning and Data Science
Рет қаралды 350
🧪Основы Топологии.
17:46
Уже Наступило
Рет қаралды 107 М.
Logistic Regression with Imbalanced data: A Geometric View
11:07
Applied AI Course
Рет қаралды 18 М.
Advanced Machine Learning- Imbalanced Learning - Cost-Sensitive Learning 1
24:22
Statistical Learning and Data Science
Рет қаралды 612
Active (Machine) Learning - Computerphile
6:11
Computerphile
Рет қаралды 114 М.
ISSEI & yellow girl 💛
00:33
ISSEI / いっせい
Рет қаралды 19 МЛН