Simula@BI: SMARTboost: Efficient Boosting of Smooth Regression Trees

Speaker: Paolo Giordani

  • Starts:13:30, 2 December 2021
  • Ends:14:30, 2 December 2021
  • Location:B2-040
  • Contact:Siri Johnsen (siri.johnsen@bi.no)

We introduce SMARTboost (boosting of symmetric smooth additive regression trees), a machine learning model capable of fitting arbitrarily complex functions, yet designed for good performance in low signal-to-noise environments.

SMARTboost inherits many of the qualities that have made boosted trees the most widely used machine learning tool for tabular data; it automatically adjusts model complexity, handles continuous and discrete explanatory variables, can capture nonlinear functions in high dimensions without systematically overfitting, performs variable selection, can handle high collinearity and highly non-Gaussian features, and is well-suited for parallelization.

The combination of smooth symmetric trees and of carefully designed priors gives SMARTboost superior performance (in comparison with a state-of-the-art tool like XGBoost) in most settings with continuous explanatory variables, particularly when the signal-to-noise ratio is low and/or the sample size is small. SMARTboost also outperforms other tree-based methods in a large and interesting set of data-generating-processes, including several well-known in econometrics and machine learning, such as linear models, additive models, smooth threshold models, linear and nonlinear factor models, neural networks, and the Friedman function.

SMARTboost is well suited for cross-sectional, time series, and panel data, and, unlike its tree-based competitors, can compute marginal effects.