biopsykit.classification.model_selection.nested_cv module¶
Module with functions for model selection using “nested” cross-validation.
- biopsykit.classification.model_selection.nested_cv.nested_cv_param_search(X, y, param_dict, pipeline, outer_cv, inner_cv, groups=None, hyper_search_params=None, **kwargs)[source]¶
Perform a cross-validated parameter search with hyperparameter optimization within a outer cross-validation.
- Parameters
X (array-like of shape (n_samples, n_features)) – Training vector, where n_samples is the number of samples and n_features is the number of features.
y (array-like of shape (n_samples, n_output) or (n_samples,)) – Target (i.e., class labels) relative to X for classification or regression.
param_dict (dict or list of dicts) – Dictionary with parameters names (str) as keys and lists of parameter settings to try as values, or a list of such dictionaries, in which case the grids spanned by each dictionary in the list are explored. This enables searching over any sequence of parameter settings.
pipeline (
Pipeline
) – Pipeline of sklearn transforms and estimators to perform hyperparameter search with.outer_cv (CV splitter) – Cross-validation object determining the cross-validation splitting strategy of the outer cross-validation.
inner_cv (CV splitter) – Cross-validation object determining the cross-validation splitting strategy of the hyperparameter search.
groups (array-like of shape (n_samples,)) – Group labels for the samples used while splitting the dataset into train/test set. Only used in conjunction with a “Group”
cv
instance (e.g.,GroupKFold
). Default:None
hyper_search_params (dict, optional) –
Dictionary specifying which hyperparameter search method to use (or
None
to use grid-search).”grid” (
GridSearchCV
): To perform a grid-search pass a dict in the form of{"search_method": "grid"}
.”random” (
RandomizedSearchCV
): To perform a randomized-search pass a dict in the form of{"search_method": "random", "n_iter": xx}
, where"n_iter"
corresponds to the number of parameter settings that are sampled.
kwargs (Additional arguments to be passed to the hyperparameter search class instance) – (e.g.,
GridSearchCV
orRandomizedSearchCV
).
- Returns
Dictionary with hyperparameter search results. The result dictionary has the following entries:
”param_search”: list with hyperparameter search class instances (e.g.,
GridSearchCV
) used for hyperparameter search for each outer fold (determined byouter_cv
).”test_score”: list with test scores of the best estimator on the respective test set for each outer fold.
”cv_results”: list of
cv_results_
attributes of hyperparameter search class (e.g.,GridSearchCV
). Each entry of “cv_results” is a results dictionary of the respective fold with keys as column headers and values as columns, that can be imported into a pandas DataFrame.”best_estimator” list of
best_estimator_
attributes of hyperparameter search class (e.g.,GridSearchCV
). Each entry of “best_estimator” is the estimator that was chosen by the hyperparameter in the respective fold, i.e. the estimator which gave the highest average score (or smallest loss if specified) on the test data.”conf_matrix”: list of confusion matrices from test scores for each outer fold
- Return type
See also
GridSearchCV
sklearn grid-search
RandomizedSearchCV
sklearn randomized-search