Machine Learning for Causal Inference Workshop

Taught by Dr. William Duncan

August 2-4, 2024

This 3-day workshop is designed to introduce machine learning and causal inference, and their combined use. Across disciplines in social science and business, there is an increasing interest in performing causal analysis of policies and treatments. Several advances in machine learning enable its use to evaluate cause and effect relationships. Both machine learning and causal inference techniques can require strong intuitive and statistical insights, making their use sometimes prone to error.

When used appropriately, however, the combination of these tools have tremendous potential to yield data-driven causal effect estimates, sometimes reducing the modeling assumptions needed for interpretation. This can enhance the use of non-experimental data for cause and effect studies and this seminar will provide guidelines for how to implement some of these methods.

Through practical data and coding examples, you will learn to use cutting-edge “double-robust” machine learning methods (targeted minimum loss-based estimation, augmented inverse probability weighting) to estimate different treatment effects in real and simulated data. The course will focus on building intuition, with numerous coding examples to gain practical experience.

Schedule & Topics

  August 2  9:00 - 12:00Machine learning for effect estimation: the curse of dimensionality. Regularization bias with single robust estimators
   12:00 - 1:00Lunch break
   1:00 - 4:00Double robust methods. Augmented inverse probability weighting. Targeted minimum loss-based estimation
  August 3  9:00 - 12:00Introduction to stacking: the Super Learner. Illustration and manual coding examples. Super learning for a dose-response function.
   12:00 - 1:00Lunch break
   1:00 - 4:00Super Learner and sl3 packages. Tuning parameter grids.        Screening algorithms
  August 4  9:00 - 12:00Practical guidance on estimating effects in example datasets. Combining double robust methods with sl3 packages.


Resources & Background Knowledge

This workshop will use Stata and R for all examples and exercises. Code for both programs will be provided and participants can choose to follow either version. Some familiarity with one of these programs is desirable, but even novice coders should be able to follow the presentation and do the exercises as the example code is provided in advance.

If you would like to take this course but are concerned that you don't know enough Stata or R, there are excellent online resources available for learning the basics. Here are two recommended options:

A Practical Introduction to Stata. Mark McGovern. 2012

R and RStudio Basics. Tufts Data Lab. 2018

Participants should have a sound working knowledge of applied statistical analysis and interpretation, and the use and interpretation of linear and generalized linear regression modeling. Prior experience with machine learning and the counterfactual approach to causal inference will be helpful, but is not required.


Questions about workshop content can be directed to

Questions about registration issues can be directed to