Multiple Outputs

Added in version 1.6.

Starting from version 1.6, XGBoost has experimental support for multi-output regression and multi-label classification with Python package. Multi-label classification usually refers to targets that have multiple non-exclusive class labels. For instance, a movie can be simultaneously classified as both sci-fi and comedy. For detailed explanation of terminologies related to different multi-output models please refer to the scikit-learn user guide.

Note

As of XGBoost 3.0, the feature is experimental and has limited features. Only the Python package is tested. In addition, glinear is not supported.

Training with One-Model-Per-Target

By default, XGBoost builds one model for each target similar to sklearn meta estimators, with the added benefit of reusing data and other integrated features like SHAP. For a worked example of regression, see A demo for multi-output regression. For multi-label classification, the binary relevance strategy is used. Input y should be of shape (n_samples, n_classes) with each column having a value of 0 or 1 to specify whether the sample is labeled as positive for respective class. Given a sample with 3 output classes and 2 labels, the corresponding y should be encoded as [1, 0, 1] with the second class labeled as negative and the rest labeled as positive. At the moment XGBoost supports only dense matrix for labels.

from sklearn.datasets import make_multilabel_classification
import numpy as np

X, y = make_multilabel_classification(
    n_samples=32, n_classes=5, n_labels=3, random_state=0
)
clf = xgb.XGBClassifier(tree_method="hist")
clf.fit(X, y)
np.testing.assert_allclose(clf.predict(X), y)

The feature is still under development with limited support from objectives and metrics.

Training with Vector Leaf

Added in version 2.0.

Note

This is still working-in-progress, and most features are missing.

XGBoost can optionally build multi-output trees with the size of leaf equals to the number of targets when the tree method hist is used. The behavior can be controlled by the multi_strategy training parameter, which can take the value one_output_per_tree (the default) for building one model per-target or multi_output_tree for building multi-output trees.

clf = xgb.XGBClassifier(tree_method="hist", multi_strategy="multi_output_tree")

See A demo for multi-output regression for a worked example with regression.