biopsykit.classification.model_selection.nested_cv module

Module with functions for model selection using “nested” cross-validation.

Perform a cross-validated parameter search with hyperparameter optimization within a outer cross-validation.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Training vector, where n_samples is the number of samples and n_features is the number of features.

  • y (array-like of shape (n_samples, n_output) or (n_samples,)) – Target (i.e., class labels) relative to X for classification or regression.

  • param_dict (dict or list of dicts) – Dictionary with parameters names (str) as keys and lists of parameter settings to try as values, or a list of such dictionaries, in which case the grids spanned by each dictionary in the list are explored. This enables searching over any sequence of parameter settings.

  • pipeline (Pipeline) – Pipeline of sklearn transforms and estimators to perform hyperparameter search with.

  • outer_cv (CV splitter) – Cross-validation object determining the cross-validation splitting strategy of the outer cross-validation.

  • inner_cv (CV splitter) – Cross-validation object determining the cross-validation splitting strategy of the hyperparameter search.

  • groups (array-like of shape (n_samples,)) – Group labels for the samples used while splitting the dataset into train/test set. Only used in conjunction with a “Group”cv instance (e.g., GroupKFold). Default: None

  • hyper_search_params (dict, optional) –

    Dictionary specifying which hyperparameter search method to use (or None to use grid-search).

    • ”grid” (GridSearchCV): To perform a grid-search pass a dict in the form of {"search_method": "grid"}.

    • ”random” (RandomizedSearchCV): To perform a randomized-search pass a dict in the form of {"search_method": "random", "n_iter": xx}, where "n_iter" corresponds to the number of parameter settings that are sampled.

  • kwargs (Additional arguments to be passed to the hyperparameter search class instance) – (e.g., GridSearchCV or RandomizedSearchCV).

Returns

Dictionary with hyperparameter search results. The result dictionary has the following entries:

  • ”param_search”: list with hyperparameter search class instances (e.g., GridSearchCV) used for hyperparameter search for each outer fold (determined by outer_cv).

  • ”test_score”: list with test scores of the best estimator on the respective test set for each outer fold.

  • ”cv_results”: list of cv_results_ attributes of hyperparameter search class (e.g., GridSearchCV). Each entry of “cv_results” is a results dictionary of the respective fold with keys as column headers and values as columns, that can be imported into a pandas DataFrame.

  • ”best_estimator” list of best_estimator_ attributes of hyperparameter search class (e.g., GridSearchCV). Each entry of “best_estimator” is the estimator that was chosen by the hyperparameter in the respective fold, i.e. the estimator which gave the highest average score (or smallest loss if specified) on the test data.

  • ”conf_matrix”: list of confusion matrices from test scores for each outer fold

Return type

dict

See also

GridSearchCV

sklearn grid-search

RandomizedSearchCV

sklearn randomized-search