Sklearn Pipeline Permuter Example

This example shows how to systematically evaluate different machine learning pipelines.

This is, for instance, useful if combinations of different feature selection methods with different estimators want to be evaluated in one step.

Imports and Helper Functions

[1]:
from pathlib import Path
from shutil import rmtree

# Utils
from sklearn.datasets import load_breast_cancer, load_diabetes

# Preprocessing & Feature Selection
from sklearn.feature_selection import RFE, SelectKBest

# Cross-Validation
from sklearn.model_selection import KFold

# Classification
# Regression
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.svm import SVC, SVR
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor

from biopsykit.classification.model_selection import SklearnPipelinePermuter

%load_ext autoreload
%autoreload 2

Classification

Create temporary directory

[2]:
tmpdir = Path("tmpdir")
tmpdir.mkdir(exist_ok=True)

Load Example Dataset

[3]:
breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target

Setup PipelinePermuter and Cross-Validations for Model Evaluation

Note: For further information please visit the documentation of SklearnPipelinePermuter.

[6]:
pipeline_permuter = SklearnPipelinePermuter(
    model_dict, params_dict, hyper_search_dict=hyper_search_dict, random_state=42
)

outer_cv = KFold(5)
inner_cv = KFold(5)

Fit all Parameter Combinations

[7]:
pipeline_permuter.fit(X=X, y=y, outer_cv=outer_cv, inner_cv=inner_cv)
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance'], 'reduce_dim__k': [2, 4, 'all']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits


### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4], 'reduce_dim__k': [2, 4, 'all']}
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits


### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance'], 'reduce_dim__n_features_to_select': [2, 4, None]}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits


### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4], 'reduce_dim__n_features_to_select': [2, 4, None]}
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits


### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance'], 'reduce_dim__k': [2, 4, 'all']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits


### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4], 'reduce_dim__k': [2, 4, 'all']}
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits


### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance'], 'reduce_dim__n_features_to_select': [2, 4, None]}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits


### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4], 'reduce_dim__n_features_to_select': [2, 4, None]}
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits


Display Results

Metric Summary for Classification Pipelines

The summary of all relevant metrics (performance scores, confusion matrix, true and predicted labels) of the best-performing pipelines for each fold (i.e., the best_pipeline() parameter of each inner cv object), evaluated for each evaluated pipeline combination.

[8]:
pipeline_permuter.metric_summary()
[8]:
conf_matrix conf_matrix_folds true_labels true_labels_folds predicted_labels predicted_labels_folds train_indices train_indices_folds test_indices test_indices_folds mean_test_accuracy std_test_accuracy test_accuracy_fold_0 test_accuracy_fold_1 test_accuracy_fold_2 test_accuracy_fold_3 test_accuracy_fold_4
pipeline_scaler pipeline_reduce_dim pipeline_clf
StandardScaler SelectKBest KNeighborsClassifier [195, 17, 6, 351] [[62, 6, 1, 45], [46, 3, 1, 64], [35, 5, 0, 74... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... [114, 115, 116, 117, 118, 119, 120, 121, 122, ... [[114, 115, 116, 117, 118, 119, 120, 121, 122,... [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... 0.959556 0.018127 0.938596 0.964912 0.956140 0.991228 0.946903
DecisionTreeClassifier [184, 28, 22, 335] [[54, 14, 2, 44], [43, 6, 4, 61], [37, 3, 1, 7... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, ... [[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,... [114, 115, 116, 117, 118, 119, 120, 121, 122, ... [[114, 115, 116, 117, 118, 119, 120, 121, 122,... [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... 0.912079 0.037361 0.859649 0.912281 0.964912 0.938596 0.884956
RFE KNeighborsClassifier [201, 11, 11, 346] [[64, 4, 0, 46], [47, 2, 5, 60], [37, 3, 1, 73... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... [114, 115, 116, 117, 118, 119, 120, 121, 122, ... [[114, 115, 116, 117, 118, 119, 120, 121, 122,... [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... 0.961326 0.014282 0.964912 0.938596 0.964912 0.982456 0.955752
DecisionTreeClassifier [177, 35, 15, 342] [[43, 25, 0, 46], [46, 3, 6, 59], [38, 2, 2, 7... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... [0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, ... [[0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1,... [114, 115, 116, 117, 118, 119, 120, 121, 122, ... [[114, 115, 116, 117, 118, 119, 120, 121, 122,... [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... 0.912141 0.069276 0.780702 0.921053 0.964912 0.973684 0.920354
MinMaxScaler SelectKBest KNeighborsClassifier [198, 14, 9, 348] [[63, 5, 2, 44], [46, 3, 1, 64], [36, 4, 0, 74... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... [114, 115, 116, 117, 118, 119, 120, 121, 122, ... [[114, 115, 116, 117, 118, 119, 120, 121, 122,... [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... 0.959571 0.011923 0.938596 0.964912 0.964912 0.973684 0.955752
DecisionTreeClassifier [182, 30, 24, 333] [[54, 14, 2, 44], [40, 9, 6, 59], [36, 4, 1, 7... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, ... [[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,... [114, 115, 116, 117, 118, 119, 120, 121, 122, ... [[114, 115, 116, 117, 118, 119, 120, 121, 122,... [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... 0.905123 0.036065 0.859649 0.868421 0.956140 0.921053 0.920354
RFE KNeighborsClassifier [199, 13, 10, 347] [[61, 7, 2, 44], [47, 2, 3, 62], [37, 3, 1, 73... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, ... [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,... [114, 115, 116, 117, 118, 119, 120, 121, 122, ... [[114, 115, 116, 117, 118, 119, 120, 121, 122,... [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... 0.959603 0.021168 0.921053 0.956140 0.964912 0.982456 0.973451
DecisionTreeClassifier [186, 26, 14, 343] [[54, 14, 2, 44], [44, 5, 3, 62], [36, 4, 1, 7... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, ... [[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,... [114, 115, 116, 117, 118, 119, 120, 121, 122, ... [[114, 115, 116, 117, 118, 119, 120, 121, 122,... [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... 0.929731 0.036335 0.859649 0.929825 0.956140 0.956140 0.946903

List of Pipeline objects for the best pipeline for each evaluated pipeline combination.

[9]:
pipeline_permuter.best_estimator_summary()
[9]:
best_estimator
pipeline_scaler pipeline_reduce_dim pipeline_clf
StandardScaler SelectKBest KNeighborsClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
DecisionTreeClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
RFE KNeighborsClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
DecisionTreeClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
MinMaxScaler SelectKBest KNeighborsClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
DecisionTreeClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
RFE KNeighborsClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
DecisionTreeClassifier [Pipeline(memory=Memory(location=cachedir/jobl...

Mean Performance Scores for Individual Hyperparameter Combinations

The performance scores for each pipeline and parameter combinations, respectively, averaged over all outer CV folds using SklearnPipelinePermuter.mean_pipeline_score_results().

NOTE: * The summary of these pipelines does not necessarily correspond to the best-performing pipeline as returned by SklearnPipelinePermuter.metric_summary() or SklearnPipelinePermuter.best_estimator_summary() because the best-performing pipelines are determined by averaging the best_estimator instances, as determined by scikit-learn, over all folds. Hence, all best_estimator instances can have a different set of hyperparameters, whereas in this function, it is explicitely averaged over the same set of hyperparameters. * Thus, this function should only be used if you want to gain a deeper understanding of the different hyperparameter combinations and their performance. If you want to get the best-performing pipeline(s) to report in a paper, use SklearnPipelinePermuter.metric_summary() or SklearnPipelinePermuter.best_estimator_summary() instead.

[10]:
pipeline_permuter.mean_pipeline_score_results()
[10]:
mean_test_accuracy rank_test_accuracy split0_test_accuracy split1_test_accuracy split2_test_accuracy split3_test_accuracy split4_test_accuracy std_test_accuracy
mean std mean std mean std mean std mean std mean std mean std mean std
pipeline_scaler pipeline_reduce_dim pipeline_clf parameter_combination_id
StandardScaler SelectKBest KNeighborsClassifier 11 0.959150 0.004603 1.2 0.447214 0.945198 0.007520 0.960440 0.022787 0.960440 0.014743 0.980220 0.009194 0.949451 0.028656 0.019473 0.005345
MinMaxScaler SelectKBest KNeighborsClassifier 11 0.959150 0.001991 1.2 0.447214 0.938605 0.005895 0.951648 0.012529 0.967033 0.017375 0.986813 0.004914 0.951648 0.016666 0.019545 0.004257
RFE KNeighborsClassifier 11 0.958705 0.005482 1.2 0.447214 0.947372 0.004855 0.953846 0.009194 0.962637 0.016666 0.967033 0.010989 0.962637 0.016666 0.012098 0.002635
8 0.956507 0.006313 2.0 1.000000 0.947372 0.009162 0.953846 0.009194 0.967033 0.015541 0.958242 0.014328 0.956044 0.021978 0.013115 0.003993
StandardScaler RFE KNeighborsClassifier 11 0.956073 0.007155 1.6 0.547723 0.940803 0.016598 0.958242 0.023824 0.958242 0.009194 0.971429 0.012529 0.951648 0.036114 0.020734 0.003697
SelectKBest KNeighborsClassifier 8 0.955198 0.005524 2.0 0.707107 0.938629 0.012276 0.960440 0.014743 0.956044 0.017375 0.978022 0.010989 0.942857 0.025059 0.019446 0.004951
MinMaxScaler SelectKBest KNeighborsClassifier 8 0.953430 0.007843 2.6 1.949359 0.940779 0.009899 0.947253 0.009194 0.964835 0.014328 0.978022 0.013459 0.936264 0.025059 0.019030 0.005669
StandardScaler RFE KNeighborsClassifier 5 0.952117 0.005728 2.8 0.836660 0.945198 0.015417 0.956044 0.017375 0.951648 0.014743 0.958242 0.014328 0.949451 0.032599 0.017652 0.003116
8 0.951672 0.007497 2.8 1.303840 0.942977 0.016322 0.953846 0.016299 0.958242 0.009194 0.956044 0.019034 0.947253 0.039925 0.019830 0.002716
MinMaxScaler RFE KNeighborsClassifier 7 0.950354 0.007385 3.2 1.483240 0.925394 0.023949 0.951648 0.024076 0.953846 0.021138 0.960440 0.006019 0.960440 0.012529 0.018607 0.009022
SelectKBest KNeighborsClassifier 5 0.949474 0.003494 3.6 0.547723 0.938581 0.006186 0.949451 0.018388 0.953846 0.026236 0.969231 0.012038 0.936264 0.030493 0.020818 0.007043
StandardScaler RFE KNeighborsClassifier 2 0.949470 0.007481 3.6 1.816590 0.958337 0.009168 0.956044 0.015541 0.958242 0.018057 0.938462 0.022787 0.936264 0.028444 0.017660 0.008590
MinMaxScaler RFE KNeighborsClassifier 5 0.947286 0.008802 4.2 1.303840 0.929838 0.009695 0.945055 0.023311 0.960440 0.009829 0.964835 0.014328 0.936264 0.028444 0.019275 0.005979
StandardScaler SelectKBest KNeighborsClassifier 5 0.946398 0.006357 3.8 0.836660 0.936383 0.014422 0.953846 0.012038 0.949451 0.026465 0.958242 0.009194 0.934066 0.026917 0.017432 0.008160
MinMaxScaler RFE KNeighborsClassifier 10 0.945079 0.011841 5.6 2.073644 0.907812 0.031836 0.940659 0.022787 0.947253 0.021138 0.969231 0.012038 0.960440 0.012529 0.025049 0.009478
2 0.944195 0.009172 5.2 1.643168 0.938557 0.012698 0.949451 0.019963 0.971429 0.006019 0.938462 0.027582 0.923077 0.026917 0.022622 0.003527
SelectKBest KNeighborsClassifier 2 0.944190 0.007720 4.8 1.303840 0.947324 0.014452 0.945055 0.021978 0.951648 0.027582 0.962637 0.018388 0.914286 0.026236 0.023213 0.009375
StandardScaler SelectKBest KNeighborsClassifier 2 0.943311 0.009131 5.4 1.949359 0.958313 0.009281 0.949451 0.018388 0.951648 0.027582 0.962637 0.012529 0.894505 0.032599 0.029359 0.009596
MinMaxScaler SelectKBest KNeighborsClassifier 7 0.942451 0.013020 3.8 1.303840 0.905662 0.016889 0.938462 0.036114 0.962637 0.012529 0.953846 0.014328 0.951648 0.018388 0.024840 0.003868
StandardScaler RFE KNeighborsClassifier 7 0.941128 0.015268 4.0 2.000000 0.923220 0.031147 0.945055 0.021978 0.953846 0.018057 0.942857 0.035946 0.940659 0.031659 0.021137 0.012182
SelectKBest KNeighborsClassifier 7 0.938939 0.011950 3.8 1.303840 0.905686 0.012662 0.931868 0.029487 0.956044 0.019034 0.951648 0.016666 0.949451 0.016666 0.022968 0.004169
RFE KNeighborsClassifier 10 0.938930 0.016775 4.8 1.643168 0.916627 0.024208 0.945055 0.010989 0.931868 0.023824 0.956044 0.032038 0.945055 0.036446 0.022850 0.004537
MinMaxScaler RFE KNeighborsClassifier 1 0.936732 0.007675 7.2 2.049390 0.925418 0.016399 0.945055 0.038067 0.936264 0.028444 0.938462 0.012529 0.938462 0.016666 0.021214 0.004668
StandardScaler SelectKBest KNeighborsClassifier 10 0.936302 0.011498 4.8 1.095445 0.892499 0.016592 0.920879 0.033331 0.949451 0.009829 0.967033 0.007770 0.951648 0.012529 0.028909 0.006367
MinMaxScaler SelectKBest KNeighborsClassifier 10 0.934988 0.013670 5.6 1.140175 0.883731 0.020204 0.923077 0.025772 0.951648 0.012529 0.964835 0.012038 0.951648 0.016666 0.030625 0.004991
RFE KNeighborsClassifier 4 0.934553 0.014500 7.2 2.280351 0.896942 0.016556 0.931868 0.025059 0.927473 0.029691 0.964835 0.019658 0.951648 0.016666 0.027440 0.002901
DecisionTreeClassifier 0 0.933669 0.009354 1.6 0.547723 0.912303 0.013246 0.925275 0.027362 0.949451 0.016666 0.958242 0.023824 0.923077 0.025772 0.023792 0.008042
KNeighborsClassifier 6 0.931472 0.009943 8.4 2.073644 0.912303 0.015357 0.938462 0.016666 0.945055 0.010989 0.925275 0.014328 0.936264 0.026236 0.016603 0.005262
SelectKBest KNeighborsClassifier 1 0.929704 0.014237 6.6 2.302173 0.918849 0.009929 0.920879 0.043542 0.949451 0.029691 0.923077 0.034750 0.936264 0.032413 0.027202 0.008209
RFE DecisionTreeClassifier 1 0.929279 0.009507 1.4 0.547723 0.894744 0.029651 0.920879 0.021138 0.949451 0.022787 0.947253 0.023824 0.934066 0.013459 0.024440 0.013629
KNeighborsClassifier 9 0.928844 0.011586 8.6 1.673320 0.901362 0.016965 0.931868 0.018057 0.931868 0.018057 0.936264 0.004914 0.942857 0.012038 0.016126 0.004329
StandardScaler SelectKBest KNeighborsClassifier 1 0.927946 0.014775 7.4 1.816590 0.918849 0.006181 0.923077 0.047266 0.942857 0.034225 0.923077 0.034750 0.931868 0.025059 0.026160 0.008606
RFE KNeighborsClassifier 4 0.926613 0.012894 7.2 0.447214 0.907788 0.035432 0.934066 0.017375 0.912088 0.036446 0.942857 0.023824 0.936264 0.022521 0.025616 0.007075
1 0.923540 0.013857 8.0 0.707107 0.942977 0.019676 0.912088 0.021978 0.918681 0.029691 0.927473 0.033512 0.916484 0.034401 0.023453 0.009031
SelectKBest KNeighborsClassifier 4 0.923125 0.017392 7.8 0.836660 0.894744 0.005821 0.916484 0.027582 0.923077 0.033870 0.940659 0.024076 0.940659 0.034401 0.024832 0.005817
DecisionTreeClassifier 1 0.921366 0.006643 1.4 0.547723 0.896942 0.014619 0.909890 0.028444 0.929670 0.019963 0.947253 0.023824 0.923077 0.013459 0.023829 0.006901
MinMaxScaler SelectKBest KNeighborsClassifier 4 0.920487 0.018497 8.2 1.095445 0.888151 0.009281 0.914286 0.029487 0.925275 0.031468 0.938462 0.022787 0.936264 0.034225 0.025134 0.003906
DecisionTreeClassifier 1 0.920038 0.026502 1.2 0.447214 0.885905 0.059082 0.907692 0.019963 0.931868 0.021138 0.947253 0.026236 0.927473 0.036940 0.028544 0.010968
RFE KNeighborsClassifier 3 0.918724 0.013124 10.8 0.447214 0.907907 0.014631 0.916484 0.025299 0.920879 0.030493 0.912088 0.015541 0.936264 0.012038 0.016863 0.005096
StandardScaler RFE DecisionTreeClassifier 1 0.914792 0.011007 1.4 0.547723 0.879455 0.041521 0.907692 0.035268 0.923077 0.032038 0.940659 0.026465 0.923077 0.023311 0.032693 0.010808
SelectKBest KNeighborsClassifier 6 0.913908 0.011346 9.0 0.707107 0.870640 0.037458 0.905495 0.053044 0.940659 0.019963 0.927473 0.016666 0.925275 0.012038 0.033909 0.013583
MinMaxScaler SelectKBest KNeighborsClassifier 6 0.912150 0.011100 8.8 0.447214 0.872838 0.040682 0.907692 0.044366 0.934066 0.013459 0.927473 0.012529 0.918681 0.012529 0.029312 0.014397
StandardScaler RFE DecisionTreeClassifier 0 0.908247 0.020262 1.6 0.547723 0.840134 0.075407 0.883516 0.012529 0.931868 0.022521 0.962637 0.018388 0.923077 0.038852 0.047392 0.025852
MinMaxScaler RFE KNeighborsClassifier 0 0.906866 0.011413 12.0 0.000000 0.907955 0.016103 0.925275 0.034225 0.912088 0.010989 0.890110 0.032967 0.898901 0.027362 0.021153 0.012873
StandardScaler SelectKBest KNeighborsClassifier 9 0.906455 0.020163 9.2 1.303840 0.848758 0.049647 0.890110 0.050954 0.929670 0.021422 0.938462 0.012529 0.925275 0.021138 0.039798 0.014725
MinMaxScaler SelectKBest KNeighborsClassifier 9 0.903817 0.019961 9.4 1.341641 0.850956 0.052693 0.892308 0.052930 0.918681 0.022787 0.936264 0.009194 0.920879 0.021138 0.038196 0.015052
StandardScaler RFE KNeighborsClassifier 6 0.902494 0.022087 8.8 1.095445 0.868514 0.036708 0.896703 0.030691 0.931868 0.030493 0.923077 0.034750 0.892308 0.035946 0.031191 0.009188
SelectKBest DecisionTreeClassifier 0 0.902069 0.031317 1.6 0.547723 0.835619 0.065100 0.898901 0.043542 0.945055 0.019034 0.929670 0.025299 0.901099 0.028017 0.040945 0.016336
MinMaxScaler SelectKBest DecisionTreeClassifier 0 0.902045 0.014146 1.8 0.447214 0.853082 0.045653 0.903297 0.026236 0.929670 0.024076 0.912088 0.028017 0.912088 0.032038 0.033928 0.015318
StandardScaler RFE KNeighborsClassifier 9 0.897664 0.021655 9.8 0.447214 0.833373 0.027008 0.890110 0.043264 0.925275 0.036776 0.931868 0.023824 0.907692 0.026465 0.040752 0.007431
SelectKBest KNeighborsClassifier 3 0.889756 0.023370 11.2 0.447214 0.844386 0.045090 0.863736 0.059483 0.925275 0.030493 0.909890 0.014328 0.905495 0.019963 0.039302 0.010323
MinMaxScaler SelectKBest KNeighborsClassifier 3 0.889756 0.022528 11.4 0.547723 0.846584 0.044116 0.865934 0.061381 0.918681 0.033512 0.909890 0.014328 0.907692 0.016666 0.038451 0.010350
StandardScaler RFE KNeighborsClassifier 3 0.888887 0.021979 11.0 0.000000 0.831247 0.041569 0.885714 0.041555 0.916484 0.041555 0.905495 0.032599 0.905495 0.027582 0.040177 0.007849
MinMaxScaler SelectKBest KNeighborsClassifier 0 0.888404 0.020371 11.6 0.547723 0.892570 0.039055 0.857143 0.065475 0.916484 0.026465 0.885714 0.018388 0.890110 0.032038 0.033604 0.014043
StandardScaler SelectKBest KNeighborsClassifier 0 0.888404 0.018839 11.6 0.547723 0.890373 0.037173 0.859341 0.061381 0.912088 0.021978 0.887912 0.018057 0.892308 0.031468 0.031946 0.010743
RFE KNeighborsClassifier 0 0.877000 0.027891 12.0 0.000000 0.859723 0.020345 0.872527 0.048277 0.901099 0.040376 0.879121 0.030095 0.872527 0.035268 0.023341 0.008768

Best Hyperparameter Pipeline

The pipeline with the hyperparameter combination which achieved the highest average test score over all outer CV folds (i.e., the parameter combination which represents the first row of mean_pipeline_score_results()).

NOTE: * The summary of these pipelines does not necessarily correspond to the best-performing pipeline as returned by SklearnPipelinePermuter.metric_summary() or SklearnPipelinePermuter.best_estimator_summary() because the best-performing pipelines are determined by averaging the best_estimator instances, as determined by scikit-learn, over all folds. Hence, all best_estimator instances can have a different set of hyperparameters, whereas in this function, it is explicitely averaged over the same set of hyperparameters. * Thus, this function should only be used if you want to gain a deeper understanding of the different hyperparameter combinations and their performance. If you want to get the best-performing pipeline(s) to report in a paper, use SklearnPipelinePermuter.metric_summary() or SklearnPipelinePermuter.best_estimator_summary() instead.

[11]:
pipeline_permuter.best_hyperparameter_pipeline()
[11]:
mean_test_accuracy param_clf__n_neighbors param_clf__weights param_reduce_dim__k params rank_test_accuracy split0_test_accuracy split1_test_accuracy split2_test_accuracy split3_test_accuracy split4_test_accuracy std_test_accuracy
outer_fold
0 0.953846 4.0 distance all {'clf__n_neighbors': 4, 'clf__weights': 'dista... 2 0.956044 0.923077 0.967033 0.967033 0.956044 0.016150
1 0.956044 4.0 distance all {'clf__n_neighbors': 4, 'clf__weights': 'dista... 1 0.945055 0.956044 0.978022 0.989011 0.912088 0.026917
2 0.962637 4.0 distance all {'clf__n_neighbors': 4, 'clf__weights': 'dista... 1 0.945055 0.978022 0.967033 0.989011 0.934066 0.020382
3 0.958242 4.0 distance all {'clf__n_neighbors': 4, 'clf__weights': 'dista... 1 0.945055 0.967033 0.945055 0.978022 0.956044 0.012815
4 0.964978 4.0 distance all {'clf__n_neighbors': 4, 'clf__weights': 'dista... 1 0.934783 0.978022 0.945055 0.978022 0.989011 0.021102

Regression

Load Example Dataset

[12]:
diabetes_data = load_diabetes()
X_reg = diabetes_data.data
y_reg = diabetes_data.target

Specify Estimator Combinations and Parameters for Hyperparameter Search

[13]:
model_dict_reg = {
    "scaler": {"StandardScaler": StandardScaler(), "MinMaxScaler": MinMaxScaler()},
    "reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVR(kernel="linear", C=1))},
    "clf": {
        "KNeighborsRegressor": KNeighborsRegressor(),
        "DecisionTreeRegressor": DecisionTreeRegressor(),
        # "SVR": SVR(),
        # "AdaBoostRegressor": AdaBoostRegressor(),
    },
}
[14]:
params_dict_reg = {
    "StandardScaler": None,
    "MinMaxScaler": None,
    "SelectKBest": {"k": [2, 4, "all"]},
    "RFE": {"n_features_to_select": [2, 4]},
    "KNeighborsRegressor": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
    "DecisionTreeRegressor": {"max_depth": [2, 4]},
    # "SVR": [
    #    {
    #        "kernel": ["linear"],
    #        "C": np.logspace(start=-2, stop=2, num=5)
    #    },
    #    {
    #        "kernel": ["rbf"],
    #        "C": np.logspace(start=-2, stop=2, num=5),
    #        "gamma": np.logspace(start=-2, stop=2, num=5)
    #    }
    # ],
    # "AdaBoostRegressor": {
    #    "base_estimator": [DecisionTreeClassifier(max_depth=1)],
    #    "n_estimators": np.arange(20, 110, 10),
    #    "learning_rate": np.arange(0.6, 1.1, 0.1)
    # },
}


# use randomized-search for decision tree classifier, use grid-search (the default) for all other estimators
hyper_search_dict_reg = {"DecisionTreeRegressor": {"search_method": "random", "n_iter": 2}}

Setup PipelinePermuter and Cross-Validations for Model Evaluation

Note: For further information please visit the documentatin of SklearnPipelinePermuter.

[15]:
pipeline_permuter_regression = SklearnPipelinePermuter(
    model_dict_reg, params_dict_reg, hyper_search_dict=hyper_search_dict_reg
)
[16]:
outer_cv = KFold(5)
inner_cv = KFold(5)

pipeline_permuter_regression.fit(X_reg, y_reg, outer_cv=outer_cv, inner_cv=inner_cv, scoring="r2")
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance'], 'reduce_dim__k': [2, 4, 'all']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits


### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'clf__max_depth': [2, 4], 'reduce_dim__k': [2, 4, 'all']}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits


### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance'], 'reduce_dim__n_features_to_select': [2, 4]}
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits


### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'clf__max_depth': [2, 4], 'reduce_dim__n_features_to_select': [2, 4]}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits


### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance'], 'reduce_dim__k': [2, 4, 'all']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits


### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'clf__max_depth': [2, 4], 'reduce_dim__k': [2, 4, 'all']}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits


### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance'], 'reduce_dim__n_features_to_select': [2, 4]}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits


### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'clf__max_depth': [2, 4], 'reduce_dim__n_features_to_select': [2, 4]}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits


Display Results

This works analogously to the classification example.

Further Functions

Export Results as LaTeX Table

[17]:
print(pipeline_permuter.metric_summary_to_latex())
\begin{table}[ht!]
\centering
\sisetup{table-format = 2.1(2)}

\begin{tabular}{lllS}
\toprule
{} & {} & {} & {\makecell{Accuracy [\%]}} \\
{Scaler} & {\makecell[lc]{Feature\\ Selection}} & {Classifier} & {} \\
\midrule
\multirow[c]{4}{*}{Standard} & \multirow[c]{2}{*}{SkB} & kNN & 96.0(1.8) \\
 &  & DT & 91.2(3.7) \\
\cline{2-4}
 & \multirow[c]{2}{*}{RFE} & kNN & 96.1(1.4) \\
 &  & DT & 91.2(6.9) \\
\cline{1-4} \cline{2-4}
\multirow[c]{4}{*}{Min-Max} & \multirow[c]{2}{*}{SkB} & kNN & 96.0(1.2) \\
 &  & DT & 90.5(3.6) \\
\cline{2-4}
 & \multirow[c]{2}{*}{RFE} & kNN & 96.0(2.1) \\
 &  & DT & 93.0(3.6) \\
\cline{1-4} \cline{2-4}
\bottomrule
\end{tabular}
\end{table}

Save and Load PipelinePermuter results

Save to Pickle File

[18]:
pipeline_permuter.to_pickle(tmpdir.joinpath("test.pkl"))

Load from Pickle File

[19]:
pipeline_permuter_load = SklearnPipelinePermuter.from_pickle(tmpdir.joinpath("test.pkl"))

Fit pipeline combinations and save intermediate results

This saves the current state after successfully evaluating one pipeline combination.

[20]:
pipeline_permuter.fit_and_save_intermediate(
    X=X, y=y, outer_cv=outer_cv, inner_cv=inner_cv, file_path=tmpdir.joinpath("test.pkl")
)
Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) since this combination was already fitted!
Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) since this combination was already fitted!
Skipping (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
Skipping (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) since this combination was already fitted!
Skipping (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
Skipping (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) since this combination was already fitted!

Merge multiple PipelinePermuter instances

In the case the evaluation of different classification pipelines had to be split (e.g., due to runtime reasons), the PipelinePermuter instances can be saved separately and afterwards merged back into one joint PipelinePermuter instance.

The following example provides a minimal working example, consisting of the steps:
* Initializing, fitting, and saving different PipelinePermuter instances * Loading saved PipelinePermuter instances from disk * Merging multiple PipelinePermuter instances into one instance for joint evaluation

Load Example Dataset

[21]:
breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target

Fit and Save Different PipelinePermuter instances

[22]:
model_dict_01 = {
    "scaler": {"StandardScaler": StandardScaler()},
    "reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
    "clf": {
        "KNeighborsClassifier": KNeighborsClassifier(),
    },
}
params_dict_01 = {
    "StandardScaler": None,
    "SelectKBest": {"k": [2, 4, "all"]},
    "RFE": {"n_features_to_select": [2, 4, None]},
    "KNeighborsClassifier": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
}

pipeline_permuter_01 = SklearnPipelinePermuter(model_dict_01, params_dict_01, random_state=42)

pipeline_permuter_01.fit(X, y, outer_cv=KFold(5), inner_cv=KFold(5), verbose=0)
pipeline_permuter_01.to_pickle(tmpdir.joinpath("permuter_01.pkl"))
[23]:
model_dict_02 = {
    "scaler": {"MinMaxScaler": MinMaxScaler()},
    "reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
    "clf": {
        "KNeighborsClassifier": KNeighborsClassifier(),
    },
}
params_dict_02 = {
    "MinMaxScaler": None,
    "SelectKBest": {"k": [2, 4, "all"]},
    "RFE": {"n_features_to_select": [2, 4, None]},
    "KNeighborsClassifier": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
}

pipeline_permuter_02 = SklearnPipelinePermuter(model_dict_02, params_dict_02, random_state=42)

pipeline_permuter_02.fit(X, y, outer_cv=KFold(5), inner_cv=KFold(5), verbose=0)
pipeline_permuter_02.to_pickle(tmpdir.joinpath("permuter_02.pkl"))
[24]:
model_dict_03 = {
    "scaler": {"StandardScaler": StandardScaler(), "MinMaxScaler": MinMaxScaler()},
    "reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
    "clf": {
        "DecisionTreeClassifier": DecisionTreeClassifier(),
    },
}
params_dict_03 = {
    "StandardScaler": None,
    "MinMaxScaler": None,
    "SelectKBest": {"k": [2, 4, "all"]},
    "RFE": {"n_features_to_select": [2, 4, None]},
    "DecisionTreeClassifier": {"criterion": ["gini", "entropy"], "max_depth": [2, 4]},
}

pipeline_permuter_03 = SklearnPipelinePermuter(model_dict_03, params_dict_03, random_state=42)

pipeline_permuter_03.fit(X, y, outer_cv=KFold(5), inner_cv=KFold(5), verbose=0)
pipeline_permuter_03.to_pickle(tmpdir.joinpath("permuter_03.pkl"))

Load and Merge PipelinePermuter instances

[25]:
permuter_file_list = sorted(tmpdir.glob("permuter_*.pkl"))
print(permuter_file_list)
[PosixPath('tmpdir/permuter_01.pkl'), PosixPath('tmpdir/permuter_02.pkl'), PosixPath('tmpdir/permuter_03.pkl')]
[26]:
permuter_list = [SklearnPipelinePermuter.from_pickle(p) for p in permuter_file_list]
permuter_list
[26]:
[<biopsykit.classification.model_selection.sklearn_pipeline_permuter.SklearnPipelinePermuter at 0x7b4746c11270>,
 <biopsykit.classification.model_selection.sklearn_pipeline_permuter.SklearnPipelinePermuter at 0x7b4746c10790>,
 <biopsykit.classification.model_selection.sklearn_pipeline_permuter.SklearnPipelinePermuter at 0x7b4736f4da80>]
[27]:
merged_permuter = SklearnPipelinePermuter.merge_permuter_instances(permuter_list)

Double-check if permuters were correcrtly merged:

[28]:
for p in permuter_list:
    display(p.best_estimator_summary())
best_estimator
pipeline_scaler pipeline_reduce_dim pipeline_clf
StandardScaler SelectKBest KNeighborsClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
RFE KNeighborsClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
best_estimator
pipeline_scaler pipeline_reduce_dim pipeline_clf
MinMaxScaler SelectKBest KNeighborsClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
RFE KNeighborsClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
best_estimator
pipeline_scaler pipeline_reduce_dim pipeline_clf
StandardScaler SelectKBest DecisionTreeClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
RFE DecisionTreeClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
MinMaxScaler SelectKBest DecisionTreeClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
RFE DecisionTreeClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
[29]:
merged_permuter.best_estimator_summary()
[29]:
best_estimator
pipeline_scaler pipeline_reduce_dim pipeline_clf
StandardScaler SelectKBest KNeighborsClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
RFE KNeighborsClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
MinMaxScaler SelectKBest KNeighborsClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
RFE KNeighborsClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
StandardScaler SelectKBest DecisionTreeClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
RFE DecisionTreeClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
MinMaxScaler SelectKBest DecisionTreeClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
RFE DecisionTreeClassifier [Pipeline(memory=Memory(location=cachedir/jobl...

Updated partially fitted SklearnPipelinePermuter with additional Parameters

For this example, we perform an experiment using a partial hyperparameter set. We save this object as pickle file, load it in the next step, update the parameter sets, and continue with our experiments. This is useful for incremental experiments without having to run multiple experiments and merge different SklearnPipelinePermuter instances.

[30]:
model_dict_partial = {
    "scaler": {"StandardScaler": StandardScaler()},
    "reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
    "clf": {
        "KNeighborsClassifier": KNeighborsClassifier(),
    },
}
params_dict_partial = {
    "StandardScaler": None,
    "SelectKBest": {"k": [2, 4, "all"]},
    "RFE": {"n_features_to_select": [2, 4, None]},
    "KNeighborsClassifier": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
}

pipeline_permuter_partial = SklearnPipelinePermuter(model_dict_partial, params_dict_partial, random_state=42)

pipeline_permuter_partial.fit(X, y, outer_cv=KFold(5), inner_cv=KFold(5))
pipeline_permuter_partial.to_pickle(tmpdir.joinpath("permuter_partial.pkl"))
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance'], 'reduce_dim__k': [2, 4, 'all']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits


### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance'], 'reduce_dim__n_features_to_select': [2, 4, None]}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits


[31]:
model_dict_total = {
    "scaler": {"StandardScaler": StandardScaler(), "MinMaxScaler": MinMaxScaler()},
    "reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
    "clf": {
        "KNeighborsClassifier": KNeighborsClassifier(),
        "DecisionTreeClassifier": DecisionTreeClassifier(),
    },
}

params_dict_total = {
    "StandardScaler": None,
    "MinMaxScaler": None,
    "SelectKBest": {"k": [2, 4, "all"]},
    "RFE": {"n_features_to_select": [2, 4, None]},
    "KNeighborsClassifier": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
    "DecisionTreeClassifier": {"criterion": ["gini", "entropy"], "max_depth": [2, 4]},
}
[32]:
pipeline_permuter_total = SklearnPipelinePermuter.from_pickle(tmpdir.joinpath("permuter_partial.pkl"))
pipeline_permuter_total = pipeline_permuter_total.update_permuter(model_dict_total, params_dict_total)
[33]:
pipeline_permuter_total.fit(X, y, outer_cv=KFold(5), inner_cv=KFold(5))
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4], 'reduce_dim__n_features_to_select': [None, 2, 4]}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits


Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4], 'reduce_dim__k': [2, 'all', 4]}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits


Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__n_neighbors': [2, 4], 'clf__weights': ['distance', 'uniform'], 'reduce_dim__k': [2, 'all', 4]}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits


### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4], 'reduce_dim__n_features_to_select': [None, 2, 4]}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits


### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__n_neighbors': [2, 4], 'clf__weights': ['distance', 'uniform'], 'reduce_dim__n_features_to_select': [None, 2, 4]}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits


### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4], 'reduce_dim__k': [2, 'all', 4]}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits


Cleanup

[34]:
rmtree(tmpdir)
Download Notebook
(Right-Click -> Save Link As...)