geoxgboost package

geoxgboost module

This module implements Geographical-XGBoost for spatially local regression.

The module contains the following functions:

  • create_param_grid - Returns the grid of up to three hyperparameters.

  • nestedCV - Returns the optimized hyperparameters’ values and generalization error of XGBoost.

  • global_xgb - Returns the global XGBoost model.

  • optimize_bw - Returns the optimized bandwidth value.

  • gxgb - Returns geographical XGBoost, local prediction and related statistics.

  • predict_gxgb - Returns prediction in unseen data.

geoxgboost.geoxgboost.create_param_grid(Param1, Param1_Values, Param2=None, Param2_Values=None, Param3=None, Param3_Values=None)[source]

Creates a grid of up to three hyperparameters for tuning.

Examples

>>> Param1='n_estimators'
>>> Param1_Values = [100, 200, 300,500]
>>> Param2='learning_rate'
>>> Param2_Values = [0.1, 0.05,0.01]
>>> Param3='max_depth'
>>> Param3_Values = [2,3,4,6]
>>> create_param_grid(Param1,Param1_Values,Param2,Param2_Values,Param3,Param3_Values)
Parameters:
  • Param1 – 1st hyperparameter name e.g., ‘n_estimators’

  • Param1_Values – values for search e.g., [100, 200, 500]

  • Param2 – 2nd hyperparameter name e.g., ‘learning_rate’. Default=None.

  • Param2_Values – values for search e.g., [0.1, 0.05,0.01]. Default=None.

  • Param3 – 3rd hyperparameter name e.g., ‘max_depth’. Default=None.

  • Param3_Values – values for search e.g., [2,3,4,6]. Default=None.

Returns:

param_grid. Can be used in nestedCV function to fine tune hyperparameters

Change the argument of Param1, Param2, or Param3 with other hyperparamters available to tune XGBoost, such as: subsample, colsample_bytree, lambda, alpha etc.

For example:

Param1= ‘subsmample’

Param1_Values = [0.5, 0.7, 0.9]

A complete list of hyperparameters can be found here: https://xgboost.readthedocs.io/en/stable/parameter.html

Tip: This function can be iteratively repeated with different sets of hyperparameters. See an example in GXGB_call_demo.py at the DemoGXGBoost in GitHub.

geoxgboost.geoxgboost.global_xgb(X, y, params, feat_importance='gain', test_size=0.33, seed=7, path_save=False)[source]

Calculates global XGBoost

Parameters:
  • X – dataframe with the independent variables values

  • y – dataframe with the dependent variable values

  • params – hyperparameter values. Type:dictionary (NestedCV can be used to produce params)

  • feat_importance – type of feature importance: ‘gain’,weight’,cover’,‘total gain’,‘total cover’.Default=’gain’

  • test_size – size test (%). Default=0.33.

  • seed – seed value.Default=7

  • path_save – output folder. Default=False.

Returns:

global xgboost performance

geoxgboost.geoxgboost.gxgb(X, y, Coords, params, bw, Kernel='Adaptive', spatial_weights=False, feat_importance='gain', alpha_wt_type='varying', alpha_wt=1, test_size=0.3, seed=7, n_splits=5, path_save=False)[source]

Implements GeoXGBoost

Parameters:
  • X – dataframe with the independent variables values

  • y – dataframe with the dependent variable values

  • Coords – dataframe with the coordinates of spatial units

  • params – hyperparameter values

  • bw – bandwidth value

  • Kernel – ‘Adaptive’ or ‘Fixed’ kernel type to be used. Default= ‘Adaptive’.

  • spatial_weights – spatial weights matrix. Default= True.

  • feat_importance – type of feature importance. Available methods: ‘gain’,weight’,cover’,‘total gain’,‘total cover’.Default=’gain’

  • alpha_wt_type – type of alpha_wt. Available methods: ‘varying’, fixed’. Default=’varying’

  • alpha_wt – aplha weight value. It takes values between 0 and 1. Default=1.

  • test_size – size test (%). Default=0.33.

  • seed – seed value. Default=7.

  • n_splits – k-fold grid CV number of split, Default=5.

  • path_save – output folder. Default=False.

Returns:

local prediction and related statistics

geoxgboost.geoxgboost.nestedCV(X, y, param_grid, Param1, Param2=None, Param3=None, params=None, path_save=False, n_OuterSplits=5, n_InnerSplits=3)[source]

Applies nested cross validation for tuning up to three hyperparameters and calculating model generalization error.

Parameters:
  • X – dataframe with the independent variables values

  • y – dataframe with the dependent variable values

  • params – initial hyperparameter values. Type:dictionary

  • param_grid – grid values - output of param_grid function

  • Param1 – name of 1st hyperparameter used (same as param_grid function)

  • Param2 – name of 2nd hyperparameter used in param_grid function. Default=None.

  • Param3 – name of 3rd hyperparameter used in param_grid function. Default=None.

  • path_save – output folder. Default=False.

  • n_OuterSplits – number of outer splits. Default=5.

  • n_InnerSplits – number if inner splits Default=3.

Returns:

optimized hyperparameters’ values and generalization error of model through nestedCV

geoxgboost.geoxgboost.optimize_bw(X, y, Coords, params, bw_min, bw_max, step=1, Kernel='Adaptive', spatial_weights=True, n_splits=3, path_save=False)[source]

Finds optimal bandwidth value for defining spatial kernels

Examples

>>> optimize_bw(X,y, Coords, params, bw_min=30, bw_max=100,step=10)
Parameters:
  • X – dataframe with the independent variables values

  • y – dataframe with the dependent variable values

  • Coords – dataframe with the coordinates of spatial units

  • params – hyperparameter values

  • bw_min – min bandwidth value

  • bw_max – max bandwidth value

  • step – incremental step. Default=1.

  • Kernel – ‘Adaptive’ or ‘Fixed’ kernel type to be used. Default= ‘Adaptive’.

  • spatial_weights – spatial weights matrix. Default= True.

  • n_splits – k-fold grid CV number of split, Default=3.

  • path_save – output folder. Default=False.

Returns:

optimal bandwidth value

geoxgboost.geoxgboost.predict_gxgb(DataPredict, CoordsPredict, Coords, Output_GXGB_LocalModel, alpha_wt=0.5, alpha_wt_type='varying', path_save=False)[source]

Prediction in unseen data

Parameters:
  • DataPredict – dataframe containing the values of the independent variables referring to the spatial units in which the prediction will take place.

  • CoordsPredict – dataframe containing the coordinates of the spatial units in which the prediction will take place.

  • Coords – dataframe of coordinates of all spatial units that the original GXGB model was trained

  • Output_GXGB_LocalModel – the trained model that has been created through gxgb function

  • alpha_wt – the value of alpha weight. It ranges from 0 to 1. Default=0.5

  • alpha_wt_type – type of alpha_wt. Available methods: ‘varying’, fixed’. Default=’varying’

  • path_save – output folder. Default=False.

Returns:

prediction in unseen data.

Module contents