Stacking#

Given a list of models trained on the same task we train a ‘meta-model’ to combine predictions of all models. The meta-model usually is an ANN.

Training data should be split:

either one subset for training the weak models and one for training the meta-model
or individual subsets for all models.

Stacking is typically used with heterogenous weak models. For instance we could combine results from an ANN, from a decision tree, and from \(k\)-NN.

Stacking is not widely used, but may become more important in future, because stacking allows to combine small specialized models trained an very different tasks (e.g., speech recognition and object detection in images) to one large model (e.g., detection of humans in combined video/audio signals). See, for instance, Google Pathways

Scikit-Learn supports stacking with StackingRegressor from the ensemble module.