Difference-in-differences | Synthetic Control | Causal Inference in Data Science Part 2

  Рет қаралды 17,723

Emma Ding

Emma Ding

Күн бұрын

This video is the second part of our mini course on application of Causal Inference in data science. We are going to discuss what kind of methods you can use to do Causal Inference with just a few treated units. Two methods are introduced: difference-in-differences and synthetic control.
🔗 Regression and Matching • Regression and Matchin...
📃Yuan's blog post on causal inference www.yuan-meng.com/posts/causa...
📚 Resources recommended by Yuan
- Abadie, A. (2021). Using synthetic controls: Feasibility, data requirements, and methodological aspects. Journal of Economic Literature, 59(2), 391-425.
- Jones, N., & Barrows, S. (2019, July 24). Uber’s synthetic control. PyData Amsterdam 2019. • Nick Jones, Sam Barrow...
- Python/R/Stata code from The Effect: An Introduction to Research Design and Causality: github.com/NickCH-K/causaldata
- The “synth” package for synthetic control: rpubs.com/danilofreire/synth
🟢Get all my free data science interview resources
www.emmading.com/resources
🟡 Product Case Interview Cheatsheet www.emmading.com/product-case...
🟠 Statistics Interview Cheatsheet www.emmading.com/statistics-i...
🟣 Behavioral Interview Cheatsheet www.emmading.com/behavioral-i...
🔵 Data Science Resume Checklist www.emmading.com/data-science...
✅ We work with Experienced Data Scientists to help them land their next dream jobs. Apply now: www.emmading.com/coaching
// Comment
Got any questions? Something to add?
Write a comment below to chat.
// Let's connect on LinkedIn:
/ emmading001
====================
Contents of this video:
====================
00:00 How to measure COVID's Impact on the Economy
08:13 Difference-in-Differences
14:47 Synthetic Control
24:17 Summary

Пікірлер: 31
@junqichen6241
@junqichen6241 2 жыл бұрын
This video clears a lot of questions in my mind. Thank you!
@sophial.4488
@sophial.4488 2 жыл бұрын
This channel and this video is sooo under-rated!
@jacobdsk1381
@jacobdsk1381 Ай бұрын
amazing thank you!
@pavlobilinskyi775
@pavlobilinskyi775 7 ай бұрын
Fantastic lecture! The introduction to DiD method is really very intuitive. It is one of the best explanations, to my experience.
@yungetong634
@yungetong634 2 жыл бұрын
great great video! Thank you guys!
@dadmehrdidgar4971
@dadmehrdidgar4971 Жыл бұрын
Loved this video! Thank you both! :)
@xinyaohui1919
@xinyaohui1919 2 ай бұрын
fantastic lecture! thanks Yuan and Emma!
@itsBlu4e
@itsBlu4e 2 жыл бұрын
Oh my god was this useful. Thank you so much for planning it out and recording it! Amazing job.
@emma_ding
@emma_ding 2 жыл бұрын
You're so welcome!
@houmlackmbp3075
@houmlackmbp3075 Жыл бұрын
Thanks for your good information
@escargot8854
@escargot8854 2 жыл бұрын
Wouldn't covid be a bad use of DD because it was worldwide? There are limited economies that were unaffected that can be used as a counterfactual
@TheBjjninja
@TheBjjninja Жыл бұрын
You would use an event study design to measure effects of COVID
@jaredgreathouse3672
@jaredgreathouse3672 2 жыл бұрын
Synthetic controls are pretty much the big brother of difference-in-differences. You can do so much more with SCM that you can't really do with DD. For example.... I'm writing a synthetic control command for Stata, and it uses LASSO or Ridge to automate donor/variable selection, and this method already outperforms classic SCM. I've even gotten it to do staggered implementation as well as placebo inference, and the best thing is that you only need outcome data, you don't need a long list of covariates to measure the counterfactual.
@brotherbig4651
@brotherbig4651 2 жыл бұрын
It seems you are using endogenous outcome variables on the right hand side of your regression.
@brotherbig4651
@brotherbig4651 2 жыл бұрын
The variables you choose to construct the synthetic group are subjective. Endogeneity, omitted variable bias, the pre-treatment trend you have are all hidden in the process.
@jaredgreathouse3672
@jaredgreathouse3672 2 жыл бұрын
@@brotherbig4651 Yeah you're right, the variables we choose are subjective. And you're also right that the pre-treatment regression uses the donor outcomes to predict the outcomes of the actually treated unit. And in fact, the algorithm can also use other covariates, it just doesn't need to. The cross validation procedure, in addition to combating overfitting, also attempts to ensure we have the best out of sample predictors "k" time periods ahead of a point in the training data. Initially, I was super skeptical about the approach when I read about it for Python and R, I pretty much couldn't believe it. Well, I wrote the routine myself for Stata, roughly based off their code, and it works pretty well, even under suboptimal conditions (short pre-intervention periods, 100s or thousands of donors) and that kinda thing
@the_teemo1
@the_teemo1 2 жыл бұрын
for the uber case, what is the argument of NOT using A/B test? (or is it just for the example's case) thanks!
@dataseance4041
@dataseance4041 2 жыл бұрын
because riders in the same market share drivers. only treatment users had to walk (if they requested express pool), but that would reduce the average trip duration for all pool riders, even the control users who didn't walk. as a result, an a/b test can't detect the treatment effect.
@PeakWuNeverSurrender
@PeakWuNeverSurrender Жыл бұрын
By using synthetic control, we target to meet the common trend assumption as required by Difference in Differences.
@percytaabazuing4554
@percytaabazuing4554 Жыл бұрын
Good Job Guys!!!is it possible you do a vedio on the commands used in SCM?
@emma_ding
@emma_ding Жыл бұрын
Hi, Percy! Thanks for your comment. I've added your suggestion to my list of potential content ideas. 😊
@andreaxue376
@andreaxue376 2 жыл бұрын
one question i had is why we need to do the counterfactual prediction on the donor pool (similar cities) instead of using the treatment city's own historical data before the treatment to predict the counterfactuals for the period of interest?
@yangsong7864
@yangsong7864 2 жыл бұрын
Hey Andrea, I think it's mainly because the donor pool could better capture the seasonality/trend/environment changes and makes the counterfactual prediction on the treated unit more accurate (especially for irregular time series). Imagine when Pandemic starts, there is no way for the treatment city to be able to estimate its own counterfactual by using its own historical data (prior to Pandemic). On the other hand, the donor pool are also affected by the pandemic, their weighted post treatment metric/values would be a better counterfactual to the treated unit.
@nanlinr
@nanlinr Жыл бұрын
You need both is my understanding. Donor pool data should represent a world where treatment isn't implemented and you find it by modeling prior data of donor pool to best represent the treatment city's. Then you track how that synthetic control performs after treatment started and use that as a baseline to see how the treatment city's behavior differs from it
@rikki146
@rikki146 Жыл бұрын
I guess you can, but comparison of this kind is hardly convincing. Sometimes temporal data make better predictions and sometimes cross-sectional data make better predictions. For example, say I am interested in the effect of tax on investment gains in the US market, I would rather base my estimation on counterfactual derived from JP/EU market etc than historical data.
@jaden2582
@jaden2582 2 жыл бұрын
I have a question that many people may be confused as well: Other than cases where one event being estimated happened in the history, in what else cases do we feel that it is better to use DID than AB testing to estimate an effect?
@jaredgreathouse3672
@jaredgreathouse3672 2 жыл бұрын
We usually can't do controlled experiments/AB testing for an intervention; using DD is what we do practically when experiments aren't possible, and they have quite a lot of pitfalls that many economists don't address when writing about their methods. SCM however is the supreme variant of DD, a generalized version of it which offers a principled way to select donors. My variant of SCM explicitly combats overfitting and noise, for example, with machine learning estimators. DD isn't quite as capable of this, yet
@jaden2582
@jaden2582 2 жыл бұрын
@@jaredgreathouse3672 Thanks for the reply!
@TheBjjninja
@TheBjjninja Жыл бұрын
I think we forgot to answer the original question of "how did COVID impact our economy"? I'd probably not use Diff-in-diff to answer that but use an event study design. The whole world was impacted by COVID so it's difficult to find an appropriate control. For example what country is matchable to USA that was not impacted by COVID? An event study allows us to predict the counterfactual in this case and then compare with actual. The residual is our effect size.
@rikki146
@rikki146 Жыл бұрын
yeah i think it is unanswered in the video. found your comment when i was looking for answer in comment section
@McDreamyn_mdphd
@McDreamyn_mdphd 7 ай бұрын
I would kindly argue that DiD and Synthetic Controls suffer from the same pitfalls as standard statistical controls. When these two methods are employed within observational designs, confounding can be introduced if the two groups of interest are not balanced on key covariates. We employ methods like Counterfactuals (Propensity Score adjustments) as a way to balance or equal the two groups, which then can be analyzed within the eye toward providing supportive or disconfirming evidence. Synthetic controls also can suffer from confounding likely unobserved. Because the confounding is unobserved, you cannot use Propensity methods, and instead must use something more like instrumental variable methods.
Khóa ly biệt
01:00
Đào Nguyễn Ánh - Hữu Hưng
Рет қаралды 19 МЛН
СНЕЖКИ ЛЕТОМ?? #shorts
00:30
Паша Осадчий
Рет қаралды 7 МЛН
Difference-in-differences methods
16:18
Mikko Rönkkö
Рет қаралды 41 М.
Andrew Gelman: Better than difference-in-differences
1:15:58
Online Causal Inference Seminar
Рет қаралды 5 М.
Susan Athey: Synthetic Difference in Differences
1:07:09
Online Causal Inference Seminar
Рет қаралды 15 М.
Causal Inference: A Simple Difference-in-Difference Model
26:38
Mike Jonas Econometrics
Рет қаралды 54 М.
Synthetic Control: Math Explained
9:16
FinanceAndEconomics
Рет қаралды 8 М.
Difference-in-Difference estimation in R
32:35
Ralf Becker
Рет қаралды 8 М.
Difference in difference analysis using python
20:24
Hans Olav Melberg
Рет қаралды 10 М.
Lecture 14   Difference in Differences
1:20:35
Richard Gallenstein
Рет қаралды 20 М.