Difference-in-Difference 101

7 min read

13 hours ago

What is Difference-in-difference (DiD or DD or diff-in-diff)? Why do we care about DiD? Today I will answer all the questions about one of the most popular methods in econometrics to study a policy effect.

Image created by the author using DALL.E. The image attempts to show the introduction of technology such as tablets in the classroom and its effect on grades.

DiD is a widely used econometric technique that estimates causal relationships by comparing the changes in outcomes over time between a treatment group and a control group. The question is what are treatment and control? Treatment is a policy intervention that affects a specific group due to a policy or change. Control is a group that receives no intervention. Causal relationships mean cause and effect relations.

We care about this method because it is useful in evaluating the effects of policy changes or interventions when randomized experiments are not feasible. This means that sometimes, experiments are targeted to a particular group, implying the people who received the treatment are not random. DiD will help to isolate the impact of intervention even without randomization.

This article will delve into the concepts, assumptions, implementation, and examples.

What is DiD

Our research question is: what is the effect of treatment D on outcome y? DiD allows us to estimate what would have happened to the treatment group if the intervention had not occurred. This counterfactual scenario is essential for understanding the true effect of the treatment. Every job or work revolves around answering similar questions like the effect of interventions, policy changes, or treatments across various fields. In economics, it assesses the impact of tax cuts on economic growth, while in public policy, it evaluates the effects of new traffic laws on accident rates. In marketing, DiD analyzes the influence of advertising campaigns on sales.

Diagram created by the author

For example, in the diagram above, we have population data in our sample. We will divide the data into treatment and control where the treatment received the intervention. We can observe post and pre-variables for both groups.

How to do DiD

Simple Treatment/Control Difference Estimator

This equation will calculate the treatment effect by comparing the changes in the outcome over time between the treatment and control groups.

I have created a fake example to help understand the math.

The DiD coefficient would be 9 using the formula mentioned above.

DiD Estimator: Calculation using a regression

DiD helps to control for time-invariant characteristics that might bias the estimation of treatment effects. This means that it removes the influence of variables that are constant over time (eg., geographical location, gender, ethnicity, innate ability, etc.). It can do so because these characteristics affect both pre-treatment and post-treatment periods equally for each group.

The core equation for a basic DiD model is:

where:

  • y​ is the outcome variable for individual 𝑖 in group j at time 𝑡.
  • 𝐴𝑓𝑡𝑒𝑟​ is a dummy variable equal to 1 if the observation is in the post-treatment period.
  • 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 is a dummy variable equal to 1 if the observation belongs to the treatment group.
  • 𝐴𝑓𝑡𝑒𝑟 × 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡​ is the interaction term, with the coefficient β capturing the DiD estimate.

The coefficient for the interaction term is the DiD estimator in y. The regression is more popular among researchers because it helps to give standard errors and control for additional variables.

Parallel Trend Assumption

This is one of the key assumptions in DiD. It is based on the idea that, in the absence of treatment, the difference between the treatment and control groups would remain constant over time. In other words, in the absence of treatment, β (DiD estimate) = 0.

Formally, this means:

Another way to think about this is that the difference between the two groups would have remained the same over time without the policy change. If the trends are not parallel before the treatment, the DiD estimates may be biased.

How to check this assumption

Now the next question is: how to check for it? The validity of the parallel trend assumption can be assessed through graphical analysis and placebo tests.

Created by the author

The assumption is that, in the absence of treatment, the treatment group (orange line) and the control group (blue dashed line) would follow parallel paths over time. The intervention (vertical line) marks the point at which the treatment is applied, allowing the comparison of the differences in trends between the two groups before and after the intervention to estimate the treatment effect.

Examples which violate Parallel Trends Assumption

In simple words, we look for two things in the treatment which are the following:

  1. Change in the slope
Graph: Part (a)
Graph: Part (b)

In both of the above cases, the Parallel trend assumption is not satisfied. Treatment group outcome is either growing faster (part a) or slower (part b) than control group outcome. The mathematical way of saying this is:

DiD = true effect + differential trend (Differential trend should be 0)

Differential trend could be positive (part a) or negative ( part b)

DiD won’t be able to isolate the impact of the intervention (true effect) since we have a differential trend in it as well.

2. Jump in the treatment line (either up or down) after the intervention

In the above image, the treatment group’s trend changed differently from the control group’s trend, which should have remained consistent without the intervention. A jump is not allowed in the study of DiD.

Placebo Tests

Placebo tests are used to verify whether observed treatment effects are truly due to the treatment and not due to other confounding factors. They involve applying the same analysis to a period or group where no treatment effect is expected. If a significant effect is found in these placebo tests, it suggests that the original results may be spurious.

For example, an intervention study of giving tablets to high schools was done in 2019. We can do a placebo test meaning that we can create a fake year of intervention say 2017 where we know no policy change occurred. If applying the treatment effect analysis to the placebo date (2017) shows no significant change, it will suggest that the observed effect in 2019 (if any) is likely due to the actual policy intervention.

Extensions and Variations of DiD

  1. Event Study DiD: Estimates year-specific treatment effects, which is useful for assessing the timing of treatment effects and checking for pre-trends. The model allows the treatment effect to vary by year. We can study the effect at time t+1, t+2, …, t+n
  2. Synthetic Control Method (SCM): SCM constructs a synthetic control group by weighting multiple untreated units to create a composite that approximates the characteristics of the treated unit before the intervention. This method is particularly useful when a single treated unit is compared to a pool of untreated units. It provides a more credible counterfactual by combining information from several units.

There are many more, but I will limit it to only two. I might write a post later explaining in detail all the rest.

Conclusion

In this post, I have analyzed the Difference-in-Differences (DiD) estimator, a popular method for estimating average treatment effects. DiD is widely used to study policy effects by comparing changes over time between treatment and control groups. The key advantage of DiD is its ability to control for unobserved confounders that remain constant over time, thereby isolating the true impact of an intervention.

We also explored key concepts like the parallel trends assumption, the importance of pre-treatment data, and how to check for assumption violations using graphical analysis and placebo tests. Additionally, I discussed extensions and variations of DiD, such as the Event Study DiD and the Synthetic Control Method, which offer further insights and robustness in different scenarios.

References and Further Reads

[1] Wing, C., Simon, K., & Bello-Gomez, R. A. (2018). Designing difference in difference studies: best practices for public health policy research. Annual review of public health, 39, 453–469.

[2] Callaway, B., & Sant’Anna, P. H. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2), 200–230.

[3] Donald, S. G., & Lang, K. (2007). Inference with difference-in-differences and other panel data. The review of Economics and Statistics, 89(2), 221–233.

Thank you for reading!

Thank you for reading! 🤗 If you enjoyed this post and want to see more, consider following me. You can also follow me on LinkedIn. I plan to write blogs about causal inference and data analysis, always aiming to keep things simple.

A small disclaimer: I write to learn, so mistakes might happen despite my best efforts. If you spot any errors, please let me know. I also welcome suggestions for new topics!