Parse a LightGBM model json dump — lgb.model.dt.tree • lightgbm

Parse a LightGBM model json dump into a data.table structure.

lgb.model.dt.tree(model, num_iteration = NULL, start_iteration = 1L)

Arguments

model

object of class lgb.Booster.

num_iteration

Number of iterations to include. NULL or <= 0 means use best iteration.

start_iteration

Index (1-based) of the first boosting round to include in the output. For example, passing start_iteration=5, num_iteration=3 for a regression model means "return information about the fifth, sixth, and seventh trees".

New in version 4.4.0

Value

A data.table with detailed information about model trees' nodes and leaves.

The columns of the data.table are:

tree_index: ID of a tree in a model (integer)
split_index: ID of a node in a tree (integer)
split_feature: for a node, it's a feature name (character); for a leaf, it simply labels it as "NA"
node_parent: ID of the parent node for current node (integer)
leaf_index: ID of a leaf in a tree (integer)
leaf_parent: ID of the parent node for current leaf (integer)
split_gain: Split gain of a node
threshold: Splitting threshold value of a node
decision_type: Decision type of a node
default_left: Determine how to handle NA value, TRUE -> Left, FALSE -> Right
internal_value: Node value
internal_count: The number of observation collected by a node
leaf_value: Leaf value
leaf_count: The number of observation collected by a leaf

Examples

# \donttest{
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)

params <- list(
  objective = "binary"
  , learning_rate = 0.01
  , num_leaves = 63L
  , max_depth = -1L
  , min_data_in_leaf = 1L
  , min_sum_hessian_in_leaf = 1.0
  , num_threads = 2L
)
model <- lgb.train(params, dtrain, 10L)
#> [LightGBM] [Info] Number of positive: 3140, number of negative: 3373
#> [LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000436 seconds.
#> You can set `force_row_wise=true` to remove the overhead.
#> And if memory is not enough, you can set `force_col_wise=true`.
#> [LightGBM] [Info] Total Bins 232
#> [LightGBM] [Info] Number of data points in the train set: 6513, number of used features: 116
#> [LightGBM] [Info] [binary:BoostFromScore]: pavg=0.482113 -> initscore=-0.071580
#> [LightGBM] [Info] Start training from score -0.071580
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf
#> [LightGBM] [Warning] No further splits with positive gain, best gain: -inf

tree_dt <- lgb.model.dt.tree(model)
# }