cellcanvas / train-model-xgboost / 0.0.9

Train XGBoost on Zarr Data with Cross-Validation

A solution that trains an XGBoost model using data from a Zarr zip store, filters runs with only one label, and performs 10-fold cross-validation.
Tags
imagingcryoetPythonnapari
Solution written by
Kyle Harrington
License of solution
MIT
Source Code

Arguments

--input_zarr_path
Path to the input Zarr zip store containing the features and labels. (default value: PARAMETER_VALUE)
--output_model_path
Path for the output joblib file containing the trained XGBoost model. (default value: PARAMETER_VALUE)
--eta
Step size shrinkage used in update to prevents overfitting. (default value: 0.3)
--gamma
Minimum loss reduction required to make a further partition on a leaf node of the tree. (default value: 0.0)
--max_depth
The maximum depth of the trees. (default value: 6)
--min_child_weight
Minimum sum of instance weight needed in a child. (default value: 1.0)
--max_delta_step
Maximum delta step we allow each leaf output to be. (default value: 0.0)
--subsample
Subsample ratio of the training instances. (default value: 1.0)
--colsample_bytree
Subsample ratio of columns when constructing each tree. (default value: 1.0)
--reg_lambda
L2 regularization term on weights. (default value: 1.0)
--reg_alpha
L1 regularization term on weights. (default value: 0.0)
--max_bin
Maximum number of discrete bins to bucket continuous features. (default value: 256)
--class_weights
Class weights for the XGBoost model as a comma-separated list. (default value: )

Usage instructions

Please follow this link for details on how to install and run this solution.