copick / train-model-xgboost-copick / 0.0.4

Train XGBoost on Copick Data with Cross-Validation

A solution that processes Copick runs, filters runs with only one label, and trains an XGBoost model with 10-fold cross-validation.
Tags
imagingcryoetPythonnaparicellcanvas
Solution written by
Kyle Harrington
License of solution
MIT
Source Code

Arguments

--copick_config_path
Path to the Copick configuration JSON file. (default value: PARAMETER_VALUE)
--painting_segmentation_names
Comma-separated list of names for the painting segmentations. Rightmost segmentation has highest precedence. (default value: PARAMETER_VALUE)
--session_id
Session ID for the segmentation. (default value: PARAMETER_VALUE)
--user_id
User ID for segmentation creation. (default value: PARAMETER_VALUE)
--voxel_spacing
Voxel spacing used to scale pick locations. (default value: PARAMETER_VALUE)
--tomo_type
Tomogram type to use for each tomogram, e.g. denoised. (default value: PARAMETER_VALUE)
--feature_types
Comma-separated list of feature types to use for each tomogram, e.g. cellcanvas01,cellcanvas02. (default value: PARAMETER_VALUE)
--run_names
Comma-separated list of run names to process. If not provided, all runs will be processed. (default value: PARAMETER_VALUE)
--eta
Step size shrinkage used in update to prevents overfitting. (default value: 0.3)
--gamma
Minimum loss reduction required to make a further partition on a leaf node of the tree. (default value: 0.0)
--max_depth
The maximum depth of the trees. (default value: 6)
--min_child_weight
Minimum sum of instance weight needed in a child. (default value: 1.0)
--max_delta_step
Maximum delta step we allow each leaf output to be. (default value: 0.0)
--subsample
Subsample ratio of the training instances. (default value: 1.0)
--colsample_bytree
Subsample ratio of columns when constructing each tree. (default value: 1.0)
--reg_lambda
L2 regularization term on weights. (default value: 1.0)
--reg_alpha
L1 regularization term on wights. (default value: 0.0)
--max_bin
Maximum number of discrete bins to bucket continuous features. (default value: 256)
--class_weights
Class weights for the XGBoost model as a comma-separated list. (default value: )
--output_model_path
Path for the output joblib file containing the trained XGBoost model. (default value: PARAMETER_VALUE)

Usage instructions

Please follow this link for details on how to install and run this solution.