Sklearn Pipeline Permuter Example¶

This example shows how to systematically evaluate different machine learning pipelines.

This is, for instance, useful if combinations of different feature selection methods with different estimators want to be evaluated in one step.

Imports and Helper Functions¶

[1]:

from pathlib import Path
from shutil import rmtree

import pandas as pd
import numpy as np

# Utils
from sklearn.datasets import load_breast_cancer, load_diabetes

# Preprocessing & Feature Selection
from sklearn.feature_selection import SelectKBest, RFE
from sklearn.preprocessing import MinMaxScaler, StandardScaler

# Classification
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier

# Regression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.svm import SVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import AdaBoostRegressor


# Cross-Validation
from sklearn.model_selection import KFold

from biopsykit.classification.model_selection import SklearnPipelinePermuter

%load_ext autoreload
%autoreload 2

Classification¶

Create temporary directory

[2]:

tmpdir = Path("tmpdir")
tmpdir.mkdir(exist_ok=True)

Load Example Dataset¶

[3]:

breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target

Specify Estimator Combinations and Parameters for Hyperparameter Search¶

[4]:

model_dict = {
    "scaler": {"StandardScaler": StandardScaler(), "MinMaxScaler": MinMaxScaler()},
    "reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
    "clf": {
        "KNeighborsClassifier": KNeighborsClassifier(),
        "DecisionTreeClassifier": DecisionTreeClassifier(),
        # "SVC": SVC(),
        # "AdaBoostClassifier": AdaBoostClassifier(),
    },
}

[5]:

params_dict = {
    "StandardScaler": None,
    "MinMaxScaler": None,
    "SelectKBest": {"k": [2, 4, "all"]},
    "RFE": {"n_features_to_select": [2, 4, None]},
    "KNeighborsClassifier": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
    "DecisionTreeClassifier": {"criterion": ["gini", "entropy"], "max_depth": [2, 4]},
    # "SVC": [
    #    {
    #        "kernel": ["linear"],
    #        "C": np.logspace(start=-2, stop=2, num=5)
    #    },
    #    {
    #        "kernel": ["rbf"],
    #        "C": np.logspace(start=-2, stop=2, num=5),
    #        "gamma": np.logspace(start=-2, stop=2, num=5)
    #    }
    # ],
    # "AdaBoostClassifier": {
    #    "base_estimator": [DecisionTreeClassifier(max_depth=1)],
    #    "n_estimators": np.arange(20, 110, 10),
    #    "learning_rate": np.arange(0.6, 1.1, 0.1)
    # },
}


# use randomized-search for decision tree classifier, use grid-search (the default) for all other estimators
hyper_search_dict = {"DecisionTreeClassifier": {"search_method": "random", "n_iter": 2}}

Setup PipelinePermuter and Cross-Validations for Model Evaluation¶

Note: For further information please visit the documentation of SklearnPipelinePermuter.

[6]:

pipeline_permuter = SklearnPipelinePermuter(
    model_dict, params_dict, hyper_search_dict=hyper_search_dict, random_state=42
)

outer_cv = KFold(5)
inner_cv = KFold(5)

Fit all Parameter Combinations¶

[7]:

pipeline_permuter.fit(X=X, y=y, outer_cv=outer_cv, inner_cv=inner_cv)

### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__k': [2, 4, 'all'], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits



### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'reduce_dim__k': [2, 4, 'all'], 'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4]}

Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits

Fitting 5 folds for each of 2 candidates, totalling 10 fits

Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits



### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__n_features_to_select': [2, 4, None], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits



### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'reduce_dim__n_features_to_select': [2, 4, None], 'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4]}

Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits

Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits

Fitting 5 folds for each of 2 candidates, totalling 10 fits


### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__k': [2, 4, 'all'], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits



### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'reduce_dim__k': [2, 4, 'all'], 'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4]}

Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits

Fitting 5 folds for each of 2 candidates, totalling 10 fits

Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits



### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__n_features_to_select': [2, 4, None], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits



### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'reduce_dim__n_features_to_select': [2, 4, None], 'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4]}

Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits

Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits

Fitting 5 folds for each of 2 candidates, totalling 10 fits

Display Results¶

Metric Summary for Classification Pipelines¶

The summary of all relevant metrics (performance scores, confusion matrix, true and predicted labels) of the best-performing pipelines for each fold (i.e., the best_pipeline() parameter of each inner cv object), evaluated for each evaluated pipeline combination.

[8]:

pipeline_permuter.metric_summary()

[8]:

			conf_matrix	conf_matrix_folds	true_labels	true_labels_folds	predicted_labels	predicted_labels_folds	train_indices	train_indices_folds	test_indices	test_indices_folds	mean_test_accuracy	std_test_accuracy	test_accuracy_fold_0	test_accuracy_fold_1	test_accuracy_fold_2	test_accuracy_fold_3	test_accuracy_fold_4
pipeline_scaler	pipeline_reduce_dim	pipeline_clf
StandardScaler	SelectKBest	KNeighborsClassifier	[195, 17, 6, 351]	[[62, 6, 1, 45], [46, 3, 1, 64], [35, 5, 0, 74...	[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...	[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...	[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...	[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...	[114, 115, 116, 117, 118, 119, 120, 121, 122, ...	[[114, 115, 116, 117, 118, 119, 120, 121, 122,...	[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...	[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...	0.959556	0.018127	0.938596	0.964912	0.956140	0.991228	0.946903
	SelectKBest	DecisionTreeClassifier	[184, 28, 22, 335]	[[54, 14, 2, 44], [43, 6, 4, 61], [37, 3, 1, 7...	[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...	[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...	[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, ...	[[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,...	[114, 115, 116, 117, 118, 119, 120, 121, 122, ...	[[114, 115, 116, 117, 118, 119, 120, 121, 122,...	[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...	[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...	0.912079	0.037361	0.859649	0.912281	0.964912	0.938596	0.884956
	RFE	KNeighborsClassifier	[201, 11, 11, 346]	[[64, 4, 0, 46], [47, 2, 5, 60], [37, 3, 1, 73...	[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...	[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...	[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...	[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...	[114, 115, 116, 117, 118, 119, 120, 121, 122, ...	[[114, 115, 116, 117, 118, 119, 120, 121, 122,...	[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...	[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...	0.961326	0.014282	0.964912	0.938596	0.964912	0.982456	0.955752
	RFE	DecisionTreeClassifier	[177, 35, 15, 342]	[[43, 25, 0, 46], [46, 3, 6, 59], [38, 2, 2, 7...	[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...	[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...	[0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, ...	[[0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1,...	[114, 115, 116, 117, 118, 119, 120, 121, 122, ...	[[114, 115, 116, 117, 118, 119, 120, 121, 122,...	[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...	[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...	0.912141	0.069276	0.780702	0.921053	0.964912	0.973684	0.920354
MinMaxScaler	SelectKBest	KNeighborsClassifier	[198, 14, 9, 348]	[[63, 5, 2, 44], [46, 3, 1, 64], [36, 4, 0, 74...	[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...	[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...	[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...	[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...	[114, 115, 116, 117, 118, 119, 120, 121, 122, ...	[[114, 115, 116, 117, 118, 119, 120, 121, 122,...	[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...	[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...	0.959571	0.011923	0.938596	0.964912	0.964912	0.973684	0.955752
	SelectKBest	DecisionTreeClassifier	[182, 30, 24, 333]	[[54, 14, 2, 44], [40, 9, 6, 59], [36, 4, 1, 7...	[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...	[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...	[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, ...	[[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,...	[114, 115, 116, 117, 118, 119, 120, 121, 122, ...	[[114, 115, 116, 117, 118, 119, 120, 121, 122,...	[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...	[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...	0.905123	0.036065	0.859649	0.868421	0.956140	0.921053	0.920354
	RFE	KNeighborsClassifier	[199, 13, 10, 347]	[[61, 7, 2, 44], [47, 2, 3, 62], [37, 3, 1, 73...	[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...	[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...	[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, ...	[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,...	[114, 115, 116, 117, 118, 119, 120, 121, 122, ...	[[114, 115, 116, 117, 118, 119, 120, 121, 122,...	[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...	[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...	0.959603	0.021168	0.921053	0.956140	0.964912	0.982456	0.973451
	RFE	DecisionTreeClassifier	[186, 26, 14, 343]	[[54, 14, 2, 44], [44, 5, 3, 62], [36, 4, 1, 7...	[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...	[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...	[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, ...	[[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,...	[114, 115, 116, 117, 118, 119, 120, 121, 122, ...	[[114, 115, 116, 117, 118, 119, 120, 121, 122,...	[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...	[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...	0.929731	0.036335	0.859649	0.929825	0.956140	0.956140	0.946903

List of Pipeline objects for the best pipeline for each evaluated pipeline combination.

[9]:

pipeline_permuter.best_estimator_summary()

[9]:

			best_estimator
pipeline_scaler	pipeline_reduce_dim	pipeline_clf
StandardScaler	SelectKBest	KNeighborsClassifier	[Pipeline(memory=Memory(location=cachedir/jobl...
	SelectKBest	DecisionTreeClassifier	[Pipeline(memory=Memory(location=cachedir/jobl...
	RFE	KNeighborsClassifier	[Pipeline(memory=Memory(location=cachedir/jobl...
	RFE	DecisionTreeClassifier	[Pipeline(memory=Memory(location=cachedir/jobl...
MinMaxScaler	SelectKBest	KNeighborsClassifier	[Pipeline(memory=Memory(location=cachedir/jobl...
	SelectKBest	DecisionTreeClassifier	[Pipeline(memory=Memory(location=cachedir/jobl...
	RFE	KNeighborsClassifier	[Pipeline(memory=Memory(location=cachedir/jobl...
	RFE	DecisionTreeClassifier	[Pipeline(memory=Memory(location=cachedir/jobl...

Mean Performance Scores for Individual Hyperparameter Combinations¶

The performance scores for each pipeline and parameter combinations, respectively, averaged over all outer CV folds using SklearnPipelinePermuter.mean_pipeline_score_results().

NOTE: * The summary of these pipelines does not necessarily correspond to the best-performing pipeline as returned by SklearnPipelinePermuter.metric_summary() or SklearnPipelinePermuter.best_estimator_summary() because the best-performing pipelines are determined by averaging the best_estimator instances, as determined by scikit-learn, over all folds. Hence, all best_estimator instances can have a different set of hyperparameters, whereas in this function, it is explicitely averaged over the same set of hyperparameters. * Thus, this function should only be used if you want to gain a deeper understanding of the different hyperparameter combinations and their performance. If you want to get the best-performing pipeline(s) to report in a paper, use SklearnPipelinePermuter.metric_summary() or SklearnPipelinePermuter.best_estimator_summary() instead.

[10]:

pipeline_permuter.mean_pipeline_score_results()

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/sklearn_pipeline_permuter.py:679: FutureWarning: ['param_clf__criterion', 'param_clf__weights', 'param_reduce_dim__k', 'params'] did not aggregate successfully. If any error is raised this will raise in a future version of pandas. Drop these columns/ops to avoid this warning.
  score_results.groupby(score_results.index.names[:-1])

[10]:

				mean_test_accuracy		param_clf__max_depth		param_clf__n_neighbors		param_reduce_dim__n_features_to_select		rank_test_accuracy		...	split1_test_accuracy		split2_test_accuracy		split3_test_accuracy		split4_test_accuracy		std_test_accuracy
				mean	std	mean	std	mean	std	mean	std	mean	std	...	mean	std	mean	std	mean	std	mean	std	mean	std
pipeline_scaler	pipeline_reduce_dim	pipeline_clf	parameter_combination_id
StandardScaler	SelectKBest	KNeighborsClassifier	11	0.959150	0.004603	NaN	NaN	4.0	0.0	NaN	NaN	1.2	0.447214	...	0.960440	0.022787	0.960440	0.014743	0.980220	0.009194	0.949451	0.028656	0.019473	0.005345
MinMaxScaler	SelectKBest	KNeighborsClassifier	11	0.959150	0.001991	NaN	NaN	4.0	0.0	NaN	NaN	1.2	0.447214	...	0.951648	0.012529	0.967033	0.017375	0.986813	0.004914	0.951648	0.016666	0.019545	0.004257
	RFE	KNeighborsClassifier	11	0.958705	0.005482	NaN	NaN	4.0	0.0	NaN	NaN	1.2	0.447214	...	0.953846	0.009194	0.962637	0.016666	0.967033	0.010989	0.962637	0.016666	0.012098	0.002635
	RFE	KNeighborsClassifier	8	0.956507	0.006313	NaN	NaN	4.0	0.0	NaN	NaN	2.0	1.000000	...	0.953846	0.009194	0.967033	0.015541	0.958242	0.014328	0.956044	0.021978	0.013115	0.003993
StandardScaler	RFE	KNeighborsClassifier	11	0.956073	0.007155	NaN	NaN	4.0	0.0	NaN	NaN	1.6	0.547723	...	0.958242	0.023824	0.958242	0.009194	0.971429	0.012529	0.951648	0.036114	0.020734	0.003697
StandardScaler	SelectKBest	KNeighborsClassifier	8	0.955198	0.005524	NaN	NaN	4.0	0.0	NaN	NaN	2.0	0.707107	...	0.960440	0.014743	0.956044	0.017375	0.978022	0.010989	0.942857	0.025059	0.019446	0.004951
MinMaxScaler	SelectKBest	KNeighborsClassifier	8	0.953430	0.007843	NaN	NaN	4.0	0.0	NaN	NaN	2.6	1.949359	...	0.947253	0.009194	0.964835	0.014328	0.978022	0.013459	0.936264	0.025059	0.019030	0.005669
StandardScaler	RFE	KNeighborsClassifier	5	0.952117	0.005728	NaN	NaN	2.0	0.0	NaN	NaN	2.8	0.836660	...	0.956044	0.017375	0.951648	0.014743	0.958242	0.014328	0.949451	0.032599	0.017652	0.003116
StandardScaler	RFE	KNeighborsClassifier	8	0.951672	0.007497	NaN	NaN	4.0	0.0	NaN	NaN	2.8	1.303840	...	0.953846	0.016299	0.958242	0.009194	0.956044	0.019034	0.947253	0.039925	0.019830	0.002716
MinMaxScaler	RFE	KNeighborsClassifier	7	0.950354	0.007385	NaN	NaN	4.0	0.0	4.000000	0.000000	3.2	1.483240	...	0.951648	0.024076	0.953846	0.021138	0.960440	0.006019	0.960440	0.012529	0.018607	0.009022
MinMaxScaler	SelectKBest	KNeighborsClassifier	5	0.949474	0.003494	NaN	NaN	2.0	0.0	NaN	NaN	3.6	0.547723	...	0.949451	0.018388	0.953846	0.026236	0.969231	0.012038	0.936264	0.030493	0.020818	0.007043
StandardScaler	RFE	KNeighborsClassifier	2	0.949470	0.007481	NaN	NaN	2.0	0.0	NaN	NaN	3.6	1.816590	...	0.956044	0.015541	0.958242	0.018057	0.938462	0.022787	0.936264	0.028444	0.017660	0.008590
MinMaxScaler	RFE	KNeighborsClassifier	5	0.947286	0.008802	NaN	NaN	2.0	0.0	NaN	NaN	4.2	1.303840	...	0.945055	0.023311	0.960440	0.009829	0.964835	0.014328	0.936264	0.028444	0.019275	0.005979
StandardScaler	SelectKBest	KNeighborsClassifier	5	0.946398	0.006357	NaN	NaN	2.0	0.0	NaN	NaN	3.8	0.836660	...	0.953846	0.012038	0.949451	0.026465	0.958242	0.009194	0.934066	0.026917	0.017432	0.008160
MinMaxScaler	RFE	KNeighborsClassifier	10	0.945079	0.011841	NaN	NaN	4.0	0.0	4.000000	0.000000	5.6	2.073644	...	0.940659	0.022787	0.947253	0.021138	0.969231	0.012038	0.960440	0.012529	0.025049	0.009478
	RFE	KNeighborsClassifier	2	0.944195	0.009172	NaN	NaN	2.0	0.0	NaN	NaN	5.2	1.643168	...	0.949451	0.019963	0.971429	0.006019	0.938462	0.027582	0.923077	0.026917	0.022622	0.003527
	SelectKBest	KNeighborsClassifier	2	0.944190	0.007720	NaN	NaN	2.0	0.0	NaN	NaN	4.8	1.303840	...	0.945055	0.021978	0.951648	0.027582	0.962637	0.018388	0.914286	0.026236	0.023213	0.009375
StandardScaler	SelectKBest	KNeighborsClassifier	2	0.943311	0.009131	NaN	NaN	2.0	0.0	NaN	NaN	5.4	1.949359	...	0.949451	0.018388	0.951648	0.027582	0.962637	0.012529	0.894505	0.032599	0.029359	0.009596
MinMaxScaler	SelectKBest	KNeighborsClassifier	7	0.942451	0.013020	NaN	NaN	4.0	0.0	NaN	NaN	3.8	1.303840	...	0.938462	0.036114	0.962637	0.012529	0.953846	0.014328	0.951648	0.018388	0.024840	0.003868
StandardScaler	RFE	KNeighborsClassifier	7	0.941128	0.015268	NaN	NaN	4.0	0.0	4.000000	0.000000	4.0	2.000000	...	0.945055	0.021978	0.953846	0.018057	0.942857	0.035946	0.940659	0.031659	0.021137	0.012182
	SelectKBest	KNeighborsClassifier	7	0.938939	0.011950	NaN	NaN	4.0	0.0	NaN	NaN	3.8	1.303840	...	0.931868	0.029487	0.956044	0.019034	0.951648	0.016666	0.949451	0.016666	0.022968	0.004169
	RFE	KNeighborsClassifier	10	0.938930	0.016775	NaN	NaN	4.0	0.0	4.000000	0.000000	4.8	1.643168	...	0.945055	0.010989	0.931868	0.023824	0.956044	0.032038	0.945055	0.036446	0.022850	0.004537
MinMaxScaler	RFE	KNeighborsClassifier	1	0.936732	0.007675	NaN	NaN	2.0	0.0	4.000000	0.000000	7.2	2.049390	...	0.945055	0.038067	0.936264	0.028444	0.938462	0.012529	0.938462	0.016666	0.021214	0.004668
StandardScaler	SelectKBest	KNeighborsClassifier	10	0.936302	0.011498	NaN	NaN	4.0	0.0	NaN	NaN	4.8	1.095445	...	0.920879	0.033331	0.949451	0.009829	0.967033	0.007770	0.951648	0.012529	0.028909	0.006367
MinMaxScaler	SelectKBest	KNeighborsClassifier	10	0.934988	0.013670	NaN	NaN	4.0	0.0	NaN	NaN	5.6	1.140175	...	0.923077	0.025772	0.951648	0.012529	0.964835	0.012038	0.951648	0.016666	0.030625	0.004991
	RFE	KNeighborsClassifier	4	0.934553	0.014500	NaN	NaN	2.0	0.0	4.000000	0.000000	7.2	2.280351	...	0.931868	0.025059	0.927473	0.029691	0.964835	0.019658	0.951648	0.016666	0.027440	0.002901
		DecisionTreeClassifier	0	0.933669	0.009354	3.6	0.894427	NaN	NaN	2.666667	1.154701	1.6	0.547723	...	0.925275	0.027362	0.949451	0.016666	0.958242	0.023824	0.923077	0.025772	0.023792	0.008042
		KNeighborsClassifier	6	0.931472	0.009943	NaN	NaN	4.0	0.0	2.000000	0.000000	8.4	2.073644	...	0.938462	0.016666	0.945055	0.010989	0.925275	0.014328	0.936264	0.026236	0.016603	0.005262
	SelectKBest	KNeighborsClassifier	1	0.929704	0.014237	NaN	NaN	2.0	0.0	NaN	NaN	6.6	2.302173	...	0.920879	0.043542	0.949451	0.029691	0.923077	0.034750	0.936264	0.032413	0.027202	0.008209
	RFE	DecisionTreeClassifier	1	0.929279	0.009507	2.8	1.095445	NaN	NaN	3.333333	1.154701	1.4	0.547723	...	0.920879	0.021138	0.949451	0.022787	0.947253	0.023824	0.934066	0.013459	0.024440	0.013629
	RFE	KNeighborsClassifier	9	0.928844	0.011586	NaN	NaN	4.0	0.0	2.000000	0.000000	8.6	1.673320	...	0.931868	0.018057	0.931868	0.018057	0.936264	0.004914	0.942857	0.012038	0.016126	0.004329
StandardScaler	SelectKBest	KNeighborsClassifier	1	0.927946	0.014775	NaN	NaN	2.0	0.0	NaN	NaN	7.4	1.816590	...	0.923077	0.047266	0.942857	0.034225	0.923077	0.034750	0.931868	0.025059	0.026160	0.008606
	RFE	KNeighborsClassifier	4	0.926613	0.012894	NaN	NaN	2.0	0.0	4.000000	0.000000	7.2	0.447214	...	0.934066	0.017375	0.912088	0.036446	0.942857	0.023824	0.936264	0.022521	0.025616	0.007075
	RFE	KNeighborsClassifier	1	0.923540	0.013857	NaN	NaN	2.0	0.0	4.000000	0.000000	8.0	0.707107	...	0.912088	0.021978	0.918681	0.029691	0.927473	0.033512	0.916484	0.034401	0.023453	0.009031
	SelectKBest	KNeighborsClassifier	4	0.923125	0.017392	NaN	NaN	2.0	0.0	NaN	NaN	7.8	0.836660	...	0.916484	0.027582	0.923077	0.033870	0.940659	0.024076	0.940659	0.034401	0.024832	0.005817
	SelectKBest	DecisionTreeClassifier	1	0.921366	0.006643	3.6	0.894427	NaN	NaN	NaN	NaN	1.4	0.547723	...	0.909890	0.028444	0.929670	0.019963	0.947253	0.023824	0.923077	0.013459	0.023829	0.006901
MinMaxScaler	SelectKBest	KNeighborsClassifier	4	0.920487	0.018497	NaN	NaN	2.0	0.0	NaN	NaN	8.2	1.095445	...	0.914286	0.029487	0.925275	0.031468	0.938462	0.022787	0.936264	0.034225	0.025134	0.003906
	SelectKBest	DecisionTreeClassifier	1	0.920038	0.026502	3.6	0.894427	NaN	NaN	NaN	NaN	1.2	0.447214	...	0.907692	0.019963	0.931868	0.021138	0.947253	0.026236	0.927473	0.036940	0.028544	0.010968
	RFE	KNeighborsClassifier	3	0.918724	0.013124	NaN	NaN	2.0	0.0	2.000000	0.000000	10.8	0.447214	...	0.916484	0.025299	0.920879	0.030493	0.912088	0.015541	0.936264	0.012038	0.016863	0.005096
StandardScaler	RFE	DecisionTreeClassifier	1	0.914792	0.011007	3.2	1.095445	NaN	NaN	2.000000	0.000000	1.4	0.547723	...	0.907692	0.035268	0.923077	0.032038	0.940659	0.026465	0.923077	0.023311	0.032693	0.010808
StandardScaler	SelectKBest	KNeighborsClassifier	6	0.913908	0.011346	NaN	NaN	4.0	0.0	NaN	NaN	9.0	0.707107	...	0.905495	0.053044	0.940659	0.019963	0.927473	0.016666	0.925275	0.012038	0.033909	0.013583
MinMaxScaler	SelectKBest	KNeighborsClassifier	6	0.912150	0.011100	NaN	NaN	4.0	0.0	NaN	NaN	8.8	0.447214	...	0.907692	0.044366	0.934066	0.013459	0.927473	0.012529	0.918681	0.012529	0.029312	0.014397
StandardScaler	RFE	DecisionTreeClassifier	0	0.908247	0.020262	2.0	0.000000	NaN	NaN	3.000000	1.414214	1.6	0.547723	...	0.883516	0.012529	0.931868	0.022521	0.962637	0.018388	0.923077	0.038852	0.047392	0.025852
MinMaxScaler	RFE	KNeighborsClassifier	0	0.906866	0.011413	NaN	NaN	2.0	0.0	2.000000	0.000000	12.0	0.000000	...	0.925275	0.034225	0.912088	0.010989	0.890110	0.032967	0.898901	0.027362	0.021153	0.012873
StandardScaler	SelectKBest	KNeighborsClassifier	9	0.906455	0.020163	NaN	NaN	4.0	0.0	NaN	NaN	9.2	1.303840	...	0.890110	0.050954	0.929670	0.021422	0.938462	0.012529	0.925275	0.021138	0.039798	0.014725
MinMaxScaler	SelectKBest	KNeighborsClassifier	9	0.903817	0.019961	NaN	NaN	4.0	0.0	NaN	NaN	9.4	1.341641	...	0.892308	0.052930	0.918681	0.022787	0.936264	0.009194	0.920879	0.021138	0.038196	0.015052
StandardScaler	RFE	KNeighborsClassifier	6	0.902494	0.022087	NaN	NaN	4.0	0.0	2.000000	0.000000	8.8	1.095445	...	0.896703	0.030691	0.931868	0.030493	0.923077	0.034750	0.892308	0.035946	0.031191	0.009188
StandardScaler	SelectKBest	DecisionTreeClassifier	0	0.902069	0.031317	3.2	1.095445	NaN	NaN	NaN	NaN	1.6	0.547723	...	0.898901	0.043542	0.945055	0.019034	0.929670	0.025299	0.901099	0.028017	0.040945	0.016336
MinMaxScaler	SelectKBest	DecisionTreeClassifier	0	0.902045	0.014146	3.2	1.095445	NaN	NaN	NaN	NaN	1.8	0.447214	...	0.903297	0.026236	0.929670	0.024076	0.912088	0.028017	0.912088	0.032038	0.033928	0.015318
StandardScaler	RFE	KNeighborsClassifier	9	0.897664	0.021655	NaN	NaN	4.0	0.0	2.000000	0.000000	9.8	0.447214	...	0.890110	0.043264	0.925275	0.036776	0.931868	0.023824	0.907692	0.026465	0.040752	0.007431
StandardScaler	SelectKBest	KNeighborsClassifier	3	0.889756	0.023370	NaN	NaN	2.0	0.0	NaN	NaN	11.2	0.447214	...	0.863736	0.059483	0.925275	0.030493	0.909890	0.014328	0.905495	0.019963	0.039302	0.010323
MinMaxScaler	SelectKBest	KNeighborsClassifier	3	0.889756	0.022528	NaN	NaN	2.0	0.0	NaN	NaN	11.4	0.547723	...	0.865934	0.061381	0.918681	0.033512	0.909890	0.014328	0.907692	0.016666	0.038451	0.010350
StandardScaler	RFE	KNeighborsClassifier	3	0.888887	0.021979	NaN	NaN	2.0	0.0	2.000000	0.000000	11.0	0.000000	...	0.885714	0.041555	0.916484	0.041555	0.905495	0.032599	0.905495	0.027582	0.040177	0.007849
StandardScaler	SelectKBest	KNeighborsClassifier	0	0.888404	0.018839	NaN	NaN	2.0	0.0	NaN	NaN	11.6	0.547723	...	0.859341	0.061381	0.912088	0.021978	0.887912	0.018057	0.892308	0.031468	0.031946	0.010743
MinMaxScaler	SelectKBest	KNeighborsClassifier	0	0.888404	0.020371	NaN	NaN	2.0	0.0	NaN	NaN	11.6	0.547723	...	0.857143	0.065475	0.916484	0.026465	0.885714	0.018388	0.890110	0.032038	0.033604	0.014043
StandardScaler	RFE	KNeighborsClassifier	0	0.877000	0.027891	NaN	NaN	2.0	0.0	2.000000	0.000000	12.0	0.000000	...	0.872527	0.048277	0.901099	0.040376	0.879121	0.030095	0.872527	0.035268	0.023341	0.008768

56 rows × 22 columns

Best Hyperparameter Pipeline¶

The pipeline with the hyperparameter combination which achieved the highest average test score over all outer CV folds (i.e., the parameter combination which represents the first row of mean_pipeline_score_results()).

[11]:

pipeline_permuter.best_hyperparameter_pipeline()

[11]:

	mean_test_accuracy	param_clf__n_neighbors	param_clf__weights	param_reduce_dim__k	params	rank_test_accuracy	split0_test_accuracy	split1_test_accuracy	split2_test_accuracy	split3_test_accuracy	split4_test_accuracy	std_test_accuracy
outer_fold
0	0.953846	4.0	distance	all	{'clf__n_neighbors': 4, 'clf__weights': 'dista...	2	0.956044	0.923077	0.967033	0.967033	0.956044	0.016150
1	0.956044	4.0	distance	all	{'clf__n_neighbors': 4, 'clf__weights': 'dista...	1	0.945055	0.956044	0.978022	0.989011	0.912088	0.026917
2	0.962637	4.0	distance	all	{'clf__n_neighbors': 4, 'clf__weights': 'dista...	1	0.945055	0.978022	0.967033	0.989011	0.934066	0.020382
3	0.958242	4.0	distance	all	{'clf__n_neighbors': 4, 'clf__weights': 'dista...	1	0.945055	0.967033	0.945055	0.978022	0.956044	0.012815
4	0.964978	4.0	distance	all	{'clf__n_neighbors': 4, 'clf__weights': 'dista...	1	0.934783	0.978022	0.945055	0.978022	0.989011	0.021102

Regression¶

Load Example Dataset¶

[12]:

diabetes_data = load_diabetes()
X_reg = diabetes_data.data
y_reg = diabetes_data.target

Specify Estimator Combinations and Parameters for Hyperparameter Search¶

[13]:

model_dict_reg = {
    "scaler": {"StandardScaler": StandardScaler(), "MinMaxScaler": MinMaxScaler()},
    "reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVR(kernel="linear", C=1))},
    "clf": {
        "KNeighborsRegressor": KNeighborsRegressor(),
        "DecisionTreeRegressor": DecisionTreeRegressor(),
        # "SVR": SVR(),
        # "AdaBoostRegressor": AdaBoostRegressor(),
    },
}

[14]:

params_dict_reg = {
    "StandardScaler": None,
    "MinMaxScaler": None,
    "SelectKBest": {"k": [2, 4, "all"]},
    "RFE": {"n_features_to_select": [2, 4]},
    "KNeighborsRegressor": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
    "DecisionTreeRegressor": {"max_depth": [2, 4]},
    # "SVR": [
    #    {
    #        "kernel": ["linear"],
    #        "C": np.logspace(start=-2, stop=2, num=5)
    #    },
    #    {
    #        "kernel": ["rbf"],
    #        "C": np.logspace(start=-2, stop=2, num=5),
    #        "gamma": np.logspace(start=-2, stop=2, num=5)
    #    }
    # ],
    # "AdaBoostRegressor": {
    #    "base_estimator": [DecisionTreeClassifier(max_depth=1)],
    #    "n_estimators": np.arange(20, 110, 10),
    #    "learning_rate": np.arange(0.6, 1.1, 0.1)
    # },
}


# use randomized-search for decision tree classifier, use grid-search (the default) for all other estimators
hyper_search_dict_reg = {"DecisionTreeRegressor": {"search_method": "random", "n_iter": 2}}

Setup PipelinePermuter and Cross-Validations for Model Evaluation¶

Note: For further information please visit the documentatin of SklearnPipelinePermuter.

[15]:

pipeline_permuter_regression = SklearnPipelinePermuter(
    model_dict_reg, params_dict_reg, hyper_search_dict=hyper_search_dict_reg
)

[16]:

outer_cv = KFold(5)
inner_cv = KFold(5)

pipeline_permuter_regression.fit(X_reg, y_reg, outer_cv=outer_cv, inner_cv=inner_cv, scoring="r2")

### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__k': [2, 4, 'all'], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}

Fitting 5 folds for each of 12 candidates, totalling 60 fits

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Fitting 5 folds for each of 12 candidates, totalling 60 fits

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Fitting 5 folds for each of 12 candidates, totalling 60 fits

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Fitting 5 folds for each of 12 candidates, totalling 60 fits

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Fitting 5 folds for each of 12 candidates, totalling 60 fits



### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'reduce_dim__k': [2, 4, 'all'], 'clf__max_depth': [2, 4]}

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits


### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__n_features_to_select': [2, 4], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Fitting 5 folds for each of 8 candidates, totalling 40 fits

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Fitting 5 folds for each of 8 candidates, totalling 40 fits

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Fitting 5 folds for each of 8 candidates, totalling 40 fits

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Fitting 5 folds for each of 8 candidates, totalling 40 fits

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Fitting 5 folds for each of 8 candidates, totalling 40 fits



### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'reduce_dim__n_features_to_select': [2, 4], 'clf__max_depth': [2, 4]}

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits



### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__k': [2, 4, 'all'], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Fitting 5 folds for each of 12 candidates, totalling 60 fits

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Fitting 5 folds for each of 12 candidates, totalling 60 fits

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Fitting 5 folds for each of 12 candidates, totalling 60 fits

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Fitting 5 folds for each of 12 candidates, totalling 60 fits

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Fitting 5 folds for each of 12 candidates, totalling 60 fits



### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'reduce_dim__k': [2, 4, 'all'], 'clf__max_depth': [2, 4]}

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits

Fitting 5 folds for each of 2 candidates, totalling 10 fits

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits



### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__n_features_to_select': [2, 4], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Fitting 5 folds for each of 8 candidates, totalling 40 fits

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Fitting 5 folds for each of 8 candidates, totalling 40 fits

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Fitting 5 folds for each of 8 candidates, totalling 40 fits

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Fitting 5 folds for each of 8 candidates, totalling 40 fits

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Fitting 5 folds for each of 8 candidates, totalling 40 fits



### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'reduce_dim__n_features_to_select': [2, 4], 'clf__max_depth': [2, 4]}

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Display Results¶

This works analogously to the classification example.

Further Functions¶

Export Results as LaTeX Table¶

[17]:

print(pipeline_permuter.metric_summary_to_latex())

\begin{table}[ht!]
\centering
\sisetup{table-format = 2.1(2)}

\begin{tabular}{lllS}
\toprule
{} & {} & {} & {\makecell{Accuracy [\%]}} \\
{Scaler} & {\makecell[lc]{Feature\\ Selection}} & {Classifier} & {} \\
\midrule
\multirow[c]{4}{*}{Standard} & \multirow[c]{2}{*}{SkB} & kNN & 96.0(1.8) \\
 &  & DT & 91.2(3.7) \\
\cline{2-4}
 & \multirow[c]{2}{*}{RFE} & kNN & 96.1(1.4) \\
 &  & DT & 91.2(6.9) \\
\cline{1-4} \cline{2-4}
\multirow[c]{4}{*}{Min-Max} & \multirow[c]{2}{*}{SkB} & kNN & 96.0(1.2) \\
 &  & DT & 90.5(3.6) \\
\cline{2-4}
 & \multirow[c]{2}{*}{RFE} & kNN & 96.0(2.1) \\
 &  & DT & 93.0(3.6) \\
\cline{1-4} \cline{2-4}
\bottomrule
\end{tabular}
\end{table}

Save and Load `PipelinePermuter` results¶

Save to Pickle File¶

[18]:

pipeline_permuter.to_pickle(tmpdir.joinpath("test.pkl"))

Load from Pickle File¶

[19]:

pipeline_permuter_load = SklearnPipelinePermuter.from_pickle(tmpdir.joinpath("test.pkl"))

Fit pipeline combinations and save intermediate results¶

This saves the current state after successfully evaluating one pipeline combination.

[20]:

pipeline_permuter.fit_and_save_intermediate(
    X=X, y=y, outer_cv=outer_cv, inner_cv=inner_cv, file_path=tmpdir.joinpath("test.pkl")
)

Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) since this combination was already fitted!
Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) since this combination was already fitted!
Skipping (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
Skipping (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) since this combination was already fitted!
Skipping (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
Skipping (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) since this combination was already fitted!

Merge multiple `PipelinePermuter` instances¶

In the case the evaluation of different classification pipelines had to be split (e.g., due to runtime reasons), the PipelinePermuter instances can be saved separately and afterwards merged back into one joint PipelinePermuter instance.

The following example provides a minimal working example, consisting of the steps:

* Initializing, fitting, and saving different PipelinePermuter instances * Loading saved PipelinePermuter instances from disk * Merging multiple PipelinePermuter instances into one instance for joint evaluation

Load Example Dataset¶

[21]:

breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target

Fit and Save Different `PipelinePermuter` instances¶

[22]:

model_dict_01 = {
    "scaler": {"StandardScaler": StandardScaler()},
    "reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
    "clf": {
        "KNeighborsClassifier": KNeighborsClassifier(),
    },
}
params_dict_01 = {
    "StandardScaler": None,
    "SelectKBest": {"k": [2, 4, "all"]},
    "RFE": {"n_features_to_select": [2, 4, None]},
    "KNeighborsClassifier": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
}

pipeline_permuter_01 = SklearnPipelinePermuter(model_dict_01, params_dict_01, random_state=42)

pipeline_permuter_01.fit(X, y, outer_cv=KFold(5), inner_cv=KFold(5), verbose=0)
pipeline_permuter_01.to_pickle(tmpdir.joinpath("permuter_01.pkl"))

[23]:

model_dict_02 = {
    "scaler": {"MinMaxScaler": MinMaxScaler()},
    "reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
    "clf": {
        "KNeighborsClassifier": KNeighborsClassifier(),
    },
}
params_dict_02 = {
    "MinMaxScaler": None,
    "SelectKBest": {"k": [2, 4, "all"]},
    "RFE": {"n_features_to_select": [2, 4, None]},
    "KNeighborsClassifier": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
}

pipeline_permuter_02 = SklearnPipelinePermuter(model_dict_02, params_dict_02, random_state=42)

pipeline_permuter_02.fit(X, y, outer_cv=KFold(5), inner_cv=KFold(5), verbose=0)
pipeline_permuter_02.to_pickle(tmpdir.joinpath("permuter_02.pkl"))

[24]:

model_dict_03 = {
    "scaler": {"StandardScaler": StandardScaler(), "MinMaxScaler": MinMaxScaler()},
    "reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
    "clf": {
        "DecisionTreeClassifier": DecisionTreeClassifier(),
    },
}
params_dict_03 = {
    "StandardScaler": None,
    "MinMaxScaler": None,
    "SelectKBest": {"k": [2, 4, "all"]},
    "RFE": {"n_features_to_select": [2, 4, None]},
    "DecisionTreeClassifier": {"criterion": ["gini", "entropy"], "max_depth": [2, 4]},
}

pipeline_permuter_03 = SklearnPipelinePermuter(model_dict_03, params_dict_03, random_state=42)

pipeline_permuter_03.fit(X, y, outer_cv=KFold(5), inner_cv=KFold(5), verbose=0)
pipeline_permuter_03.to_pickle(tmpdir.joinpath("permuter_03.pkl"))

Load and Merge `PipelinePermuter` instances¶

[25]:

permuter_file_list = sorted(tmpdir.glob("permuter_*.pkl"))
print(permuter_file_list)

[PosixPath('tmpdir/permuter_01.pkl'), PosixPath('tmpdir/permuter_02.pkl'), PosixPath('tmpdir/permuter_03.pkl')]

[26]:

permuter_list = [SklearnPipelinePermuter.from_pickle(p) for p in permuter_file_list]
permuter_list

[26]:

[<biopsykit.classification.model_selection.sklearn_pipeline_permuter.SklearnPipelinePermuter at 0x7f1f704903d0>,
 <biopsykit.classification.model_selection.sklearn_pipeline_permuter.SklearnPipelinePermuter at 0x7f1f70490b80>,
 <biopsykit.classification.model_selection.sklearn_pipeline_permuter.SklearnPipelinePermuter at 0x7f1f70509cd0>]

[27]:

merged_permuter = SklearnPipelinePermuter.merge_permuter_instances(permuter_list)

Double-check if permuters were correcrtly merged:

[28]:

for p in permuter_list:
    display(p.best_estimator_summary())

			best_estimator
pipeline_scaler	pipeline_reduce_dim	pipeline_clf
StandardScaler	SelectKBest	KNeighborsClassifier	[Pipeline(memory=Memory(location=cachedir/jobl...
StandardScaler	RFE	KNeighborsClassifier	[Pipeline(memory=Memory(location=cachedir/jobl...

			best_estimator
pipeline_scaler	pipeline_reduce_dim	pipeline_clf
MinMaxScaler	SelectKBest	KNeighborsClassifier	[Pipeline(memory=Memory(location=cachedir/jobl...
MinMaxScaler	RFE	KNeighborsClassifier	[Pipeline(memory=Memory(location=cachedir/jobl...

			best_estimator
pipeline_scaler	pipeline_reduce_dim	pipeline_clf
StandardScaler	SelectKBest	DecisionTreeClassifier	[Pipeline(memory=Memory(location=cachedir/jobl...
StandardScaler	RFE	DecisionTreeClassifier	[Pipeline(memory=Memory(location=cachedir/jobl...
MinMaxScaler	SelectKBest	DecisionTreeClassifier	[Pipeline(memory=Memory(location=cachedir/jobl...
MinMaxScaler	RFE	DecisionTreeClassifier	[Pipeline(memory=Memory(location=cachedir/jobl...

[29]:

merged_permuter.best_estimator_summary()

[29]:

			best_estimator
pipeline_scaler	pipeline_reduce_dim	pipeline_clf
StandardScaler	SelectKBest	KNeighborsClassifier	[Pipeline(memory=Memory(location=cachedir/jobl...
StandardScaler	RFE	KNeighborsClassifier	[Pipeline(memory=Memory(location=cachedir/jobl...
MinMaxScaler	SelectKBest	KNeighborsClassifier	[Pipeline(memory=Memory(location=cachedir/jobl...
MinMaxScaler	RFE	KNeighborsClassifier	[Pipeline(memory=Memory(location=cachedir/jobl...
StandardScaler	SelectKBest	DecisionTreeClassifier	[Pipeline(memory=Memory(location=cachedir/jobl...
StandardScaler	RFE	DecisionTreeClassifier	[Pipeline(memory=Memory(location=cachedir/jobl...
MinMaxScaler	SelectKBest	DecisionTreeClassifier	[Pipeline(memory=Memory(location=cachedir/jobl...
MinMaxScaler	RFE	DecisionTreeClassifier	[Pipeline(memory=Memory(location=cachedir/jobl...

Updated partially fitted `SklearnPipelinePermuter` with additional Parameters¶

For this example, we perform an experiment using a partial hyperparameter set. We save this object as pickle file, load it in the next step, update the parameter sets, and continue with our experiments. This is useful for incremental experiments without having to run multiple experiments and merge different SklearnPipelinePermuter instances.

[30]:

model_dict_partial = {
    "scaler": {"StandardScaler": StandardScaler()},
    "reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
    "clf": {
        "KNeighborsClassifier": KNeighborsClassifier(),
    },
}
params_dict_partial = {
    "StandardScaler": None,
    "SelectKBest": {"k": [2, 4, "all"]},
    "RFE": {"n_features_to_select": [2, 4, None]},
    "KNeighborsClassifier": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
}

pipeline_permuter_partial = SklearnPipelinePermuter(model_dict_partial, params_dict_partial, random_state=42)

pipeline_permuter_partial.fit(X, y, outer_cv=KFold(5), inner_cv=KFold(5))
pipeline_permuter_partial.to_pickle(tmpdir.joinpath("permuter_partial.pkl"))

### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__k': [2, 4, 'all'], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits



### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__n_features_to_select': [2, 4, None], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

[31]:

model_dict_total = {
    "scaler": {"StandardScaler": StandardScaler(), "MinMaxScaler": MinMaxScaler()},
    "reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
    "clf": {
        "KNeighborsClassifier": KNeighborsClassifier(),
        "DecisionTreeClassifier": DecisionTreeClassifier(),
    },
}

params_dict_total = {
    "StandardScaler": None,
    "MinMaxScaler": None,
    "SelectKBest": {"k": [2, 4, "all"]},
    "RFE": {"n_features_to_select": [2, 4, None]},
    "KNeighborsClassifier": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
    "DecisionTreeClassifier": {"criterion": ["gini", "entropy"], "max_depth": [2, 4]},
}

[32]:

pipeline_permuter_total = SklearnPipelinePermuter.from_pickle(tmpdir.joinpath("permuter_partial.pkl"))
pipeline_permuter_total = pipeline_permuter_total.update_permuter(model_dict_total, params_dict_total)

[33]:

pipeline_permuter_total.fit(X, y, outer_cv=KFold(5), inner_cv=KFold(5))

### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__k': ['all', 2, 4], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits



Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__n_features_to_select': [2, 4, None], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits



Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__k': ['all', 2, 4], 'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4]}

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits



### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__k': ['all', 2, 4], 'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4]}

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits



### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__n_features_to_select': [2, 4, None], 'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4]}

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits



### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__n_features_to_select': [2, 4, None], 'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4]}

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Fitting 5 folds for each of 12 candidates, totalling 60 fits

Cleanup¶

[34]:

rmtree(tmpdir)

Download Notebook
(Right-Click -> Save Link As...)

StatsPipeline & Plotting Example Contributing to BioPsyKit

Sklearn Pipeline Permuter Example¶

Imports and Helper Functions¶

Classification¶

Load Example Dataset¶

Specify Estimator Combinations and Parameters for Hyperparameter Search¶

Setup PipelinePermuter and Cross-Validations for Model Evaluation¶

Fit all Parameter Combinations¶

Display Results¶

Metric Summary for Classification Pipelines¶

Mean Performance Scores for Individual Hyperparameter Combinations¶

Best Hyperparameter Pipeline¶

Regression¶

Load Example Dataset¶

Specify Estimator Combinations and Parameters for Hyperparameter Search¶

Setup PipelinePermuter and Cross-Validations for Model Evaluation¶

Display Results¶

Further Functions¶

Export Results as LaTeX Table¶

Save and Load PipelinePermuter results¶

Save to Pickle File¶

Load from Pickle File¶

Fit pipeline combinations and save intermediate results¶

Merge multiple PipelinePermuter instances¶

Load Example Dataset¶

Fit and Save Different PipelinePermuter instances¶

Load and Merge PipelinePermuter instances¶

Updated partially fitted SklearnPipelinePermuter with additional Parameters¶

Cleanup¶

Save and Load `PipelinePermuter` results¶

Merge multiple `PipelinePermuter` instances¶

Fit and Save Different `PipelinePermuter` instances¶

Load and Merge `PipelinePermuter` instances¶

Updated partially fitted `SklearnPipelinePermuter` with additional Parameters¶