biopsykit.classification.model_selection.nested_cv module¶

Module with functions for model selection using “nested” cross-validation.

biopsykit.classification.model_selection.nested_cv.nested_cv_param_search(X, y, param_dict, pipeline, outer_cv, inner_cv, groups=None, hyper_search_params=None, **kwargs)[source]¶

Perform a cross-validated parameter search with hyperparameter optimization within a outer cross-validation.

Parameters

X (array-like of shape (n_samples, n_features)) – Training vector, where n_samples is the number of samples and n_features is the number of features.
y (array-like of shape (n_samples, n_output) or (n_samples,)) – Target (i.e., class labels) relative to X for classification or regression.
param_dict (dict or list of dicts) – Dictionary with parameters names (str) as keys and lists of parameter settings to try as values, or a list of such dictionaries, in which case the grids spanned by each dictionary in the list are explored. This enables searching over any sequence of parameter settings.
pipeline (Pipeline) – Pipeline of sklearn transforms and estimators to perform hyperparameter search with.
outer_cv (CV splitter) – Cross-validation object determining the cross-validation splitting strategy of the outer cross-validation.
inner_cv (CV splitter) – Cross-validation object determining the cross-validation splitting strategy of the hyperparameter search.
groups (array-like of shape (n_samples,)) – Group labels for the samples used while splitting the dataset into train/test set. Only used in conjunction with a “Group”cv instance (e.g., GroupKFold). Default: None
hyper_search_params (dict, optional) –
Dictionary specifying which hyperparameter search method to use (or None to use grid-search).
- ”grid” (GridSearchCV): To perform a grid-search pass a dict in the form of {"search_method": "grid"}.
- ”random” (RandomizedSearchCV): To perform a randomized-search pass a dict in the form of {"search_method": "random", "n_iter": xx}, where "n_iter" corresponds to the number of parameter settings that are sampled.
kwargs (Additional arguments to be passed to the hyperparameter search class instance) – (e.g., GridSearchCV or RandomizedSearchCV).

Returns

Dictionary with hyperparameter search results. The result dictionary has the following entries:

”param_search”: list with hyperparameter search class instances (e.g., GridSearchCV) used for hyperparameter search for each outer fold (determined by outer_cv).
”test_score”: list with test scores of the best estimator on the respective test set for each outer fold.
”cv_results”: list of cv_results_ attributes of hyperparameter search class (e.g., GridSearchCV). Each entry of “cv_results” is a results dictionary of the respective fold with keys as column headers and values as columns, that can be imported into a pandas DataFrame.
”best_estimator” list of best_estimator_ attributes of hyperparameter search class (e.g., GridSearchCV). Each entry of “best_estimator” is the estimator that was chosen by the hyperparameter in the respective fold, i.e. the estimator which gave the highest average score (or smallest loss if specified) on the test data.
”conf_matrix”: list of confusion matrices from test scores for each outer fold

Return type

dict