Machine learning has been widely adopted to predict outcomes based on existing data. This blog post links to resources on the most common machine learning algorithms.

Table of Contents

Random Forest

Splitting criterion minimizes the Gini impurity (in the case of classification) and the SSE (in case of regression). (Reference)

Splitting criterion optimizes for finding splits associated with group heterogeneity.

Random Forest in Python (by Will Koehrsen)

Grid Search Cross Validation (by Will Koehrsen)

Causal Random Forest

Splitting criterion optimizes for finding splits associated with treatment effect heterogeneity.

Oblique Regression Tree

Cattaneo, Chandak, and Klusowski (2022): Convergence Rates of Oblique Regression Trees for Flexible Function Libraries

  • Splits are based on linear combinations of the covariates
  • Oracle inequality allows them to be compared with projection pursuit regression and newral networks
  • Under suitable conditions, oblique decision trees achieve similar predictive accuracy as neural networks for the same library of regression models - we do not need to always trade-off interpretability with accuracy