Machine learning has been widely adopted to predict outcomes based on existing data. This blog post links to resources on the most common machine learning algorithms.

Table of Contents

Random Forest

Splitting criterion minimizes the Gini impurity (in the case of classification) and the SSE (in case of regression). (Reference)

Splitting criterion optimizes for finding splits associated with group heterogeneity.

Random Forest in Python (by Will Koehrsen)

Grid Search Cross Validation (by Will Koehrsen)

Causal Random Forest

Splitting criterion optimizes for finding splits associated with treatment effect heterogeneity.

Oblique Regression Tree

Cattaneo, Chandak, and Klusowski (2022): Convergence Rates of Oblique Regression Trees for Flexible Function Libraries

Splits are based on linear combinations of the covariates
Oracle inequality allows them to be compared with projection pursuit regression and newral networks
Under suitable conditions, oblique decision trees achieve similar predictive accuracy as neural networks for the same library of regression models - we do not need to always trade-off interpretability with accuracy

Research

Writing

Methods

Python-related

R-related

Latex-related

Stata-related

SQL

Github

Linux-related

Conda-related

AWS-related

Webscraping

Interview Prep

Other

Machine Learning Algorithms

Random Forest

Causal Random Forest

Oblique Regression Tree