Train XGBoost on Zarr Data with Cross-Validation

A solution that trains an XGBoost model using data from a Zarr zip store, filters runs with only one label, and performs 10-fold cross-validation.

Arguments

--input_zarr_path

Path to the input Zarr zip store containing the features and labels. (default value: PARAMETER_VALUE)

--output_model_path

Path for the output joblib file containing the trained XGBoost model. (default value: PARAMETER_VALUE)

--eta

Step size shrinkage used in update to prevents overfitting. (default value: 0.3)

--gamma

Minimum loss reduction required to make a further partition on a leaf node of the tree. (default value: 0.0)

--max_depth

The maximum depth of the trees. (default value: 6)

--min_child_weight

Minimum sum of instance weight needed in a child. (default value: 1.0)

--max_delta_step

Maximum delta step we allow each leaf output to be. (default value: 0.0)

--subsample

Subsample ratio of the training instances. (default value: 1.0)

--colsample_bytree

Subsample ratio of columns when constructing each tree. (default value: 1.0)

--reg_lambda

L2 regularization term on weights. (default value: 1.0)

--reg_alpha

L1 regularization term on weights. (default value: 0.0)

--max_bin

Maximum number of discrete bins to bucket continuous features. (default value: 256)

--class_weights

Class weights for the XGBoost model as a comma-separated list. (default value: )

Please follow this link for details on how to install and run this solution.