Sklearn Pipeline Permuter Example

This example shows how to systematically evaluate different machine learning pipelines.

This is, for instance, useful if combinations of different feature selection methods with different estimators want to be evaluated in one step.

Imports and Helper Functions

[1]:
from pathlib import Path
from shutil import rmtree

import pandas as pd
import numpy as np

# Utils
from sklearn.datasets import load_breast_cancer, load_diabetes

# Preprocessing & Feature Selection
from sklearn.feature_selection import SelectKBest, RFE
from sklearn.preprocessing import MinMaxScaler, StandardScaler

# Classification
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier

# Regression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.svm import SVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import AdaBoostRegressor


# Cross-Validation
from sklearn.model_selection import KFold

from biopsykit.classification.model_selection import SklearnPipelinePermuter

%load_ext autoreload
%autoreload 2

Classification

Create temporary directory

[2]:
tmpdir = Path("tmpdir")
tmpdir.mkdir(exist_ok=True)

Load Example Dataset

[3]:
breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target

Setup PipelinePermuter and Cross-Validations for Model Evaluation

Note: For further information please visit the documentation of SklearnPipelinePermuter.

[6]:
pipeline_permuter = SklearnPipelinePermuter(
    model_dict, params_dict, hyper_search_dict=hyper_search_dict, random_state=42
)

outer_cv = KFold(5)
inner_cv = KFold(5)

Fit all Parameter Combinations

[7]:
pipeline_permuter.fit(X=X, y=y, outer_cv=outer_cv, inner_cv=inner_cv)
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__k': [2, 4, 'all'], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits


### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'reduce_dim__k': [2, 4, 'all'], 'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4]}
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits


### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__n_features_to_select': [2, 4, None], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits


### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'reduce_dim__n_features_to_select': [2, 4, None], 'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4]}
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits


### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__k': [2, 4, 'all'], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits


### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'reduce_dim__k': [2, 4, 'all'], 'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4]}
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits


### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__n_features_to_select': [2, 4, None], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits


### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'reduce_dim__n_features_to_select': [2, 4, None], 'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4]}
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits


Display Results

Metric Summary for Classification Pipelines

The summary of all relevant metrics (performance scores, confusion matrix, true and predicted labels) of the best-performing pipelines for each fold (i.e., the best_pipeline() parameter of each inner cv object), evaluated for each evaluated pipeline combination.

[8]:
pipeline_permuter.metric_summary()
[8]:
conf_matrix conf_matrix_folds true_labels true_labels_folds predicted_labels predicted_labels_folds train_indices train_indices_folds test_indices test_indices_folds mean_test_accuracy std_test_accuracy test_accuracy_fold_0 test_accuracy_fold_1 test_accuracy_fold_2 test_accuracy_fold_3 test_accuracy_fold_4
pipeline_scaler pipeline_reduce_dim pipeline_clf
StandardScaler SelectKBest KNeighborsClassifier [195, 17, 6, 351] [[62, 6, 1, 45], [46, 3, 1, 64], [35, 5, 0, 74... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... [114, 115, 116, 117, 118, 119, 120, 121, 122, ... [[114, 115, 116, 117, 118, 119, 120, 121, 122,... [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... 0.959556 0.018127 0.938596 0.964912 0.956140 0.991228 0.946903
DecisionTreeClassifier [184, 28, 22, 335] [[54, 14, 2, 44], [43, 6, 4, 61], [37, 3, 1, 7... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, ... [[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,... [114, 115, 116, 117, 118, 119, 120, 121, 122, ... [[114, 115, 116, 117, 118, 119, 120, 121, 122,... [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... 0.912079 0.037361 0.859649 0.912281 0.964912 0.938596 0.884956
RFE KNeighborsClassifier [201, 11, 11, 346] [[64, 4, 0, 46], [47, 2, 5, 60], [37, 3, 1, 73... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... [114, 115, 116, 117, 118, 119, 120, 121, 122, ... [[114, 115, 116, 117, 118, 119, 120, 121, 122,... [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... 0.961326 0.014282 0.964912 0.938596 0.964912 0.982456 0.955752
DecisionTreeClassifier [177, 35, 15, 342] [[43, 25, 0, 46], [46, 3, 6, 59], [38, 2, 2, 7... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... [0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, ... [[0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1,... [114, 115, 116, 117, 118, 119, 120, 121, 122, ... [[114, 115, 116, 117, 118, 119, 120, 121, 122,... [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... 0.912141 0.069276 0.780702 0.921053 0.964912 0.973684 0.920354
MinMaxScaler SelectKBest KNeighborsClassifier [198, 14, 9, 348] [[63, 5, 2, 44], [46, 3, 1, 64], [36, 4, 0, 74... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... [114, 115, 116, 117, 118, 119, 120, 121, 122, ... [[114, 115, 116, 117, 118, 119, 120, 121, 122,... [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... 0.959571 0.011923 0.938596 0.964912 0.964912 0.973684 0.955752
DecisionTreeClassifier [182, 30, 24, 333] [[54, 14, 2, 44], [40, 9, 6, 59], [36, 4, 1, 7... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, ... [[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,... [114, 115, 116, 117, 118, 119, 120, 121, 122, ... [[114, 115, 116, 117, 118, 119, 120, 121, 122,... [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... 0.905123 0.036065 0.859649 0.868421 0.956140 0.921053 0.920354
RFE KNeighborsClassifier [199, 13, 10, 347] [[61, 7, 2, 44], [47, 2, 3, 62], [37, 3, 1, 73... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, ... [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,... [114, 115, 116, 117, 118, 119, 120, 121, 122, ... [[114, 115, 116, 117, 118, 119, 120, 121, 122,... [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... 0.959603 0.021168 0.921053 0.956140 0.964912 0.982456 0.973451
DecisionTreeClassifier [186, 26, 14, 343] [[54, 14, 2, 44], [44, 5, 3, 62], [36, 4, 1, 7... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, ... [[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,... [114, 115, 116, 117, 118, 119, 120, 121, 122, ... [[114, 115, 116, 117, 118, 119, 120, 121, 122,... [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... 0.929731 0.036335 0.859649 0.929825 0.956140 0.956140 0.946903

List of Pipeline objects for the best pipeline for each evaluated pipeline combination.

[9]:
pipeline_permuter.best_estimator_summary()
[9]:
best_estimator
pipeline_scaler pipeline_reduce_dim pipeline_clf
StandardScaler SelectKBest KNeighborsClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
DecisionTreeClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
RFE KNeighborsClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
DecisionTreeClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
MinMaxScaler SelectKBest KNeighborsClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
DecisionTreeClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
RFE KNeighborsClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
DecisionTreeClassifier [Pipeline(memory=Memory(location=cachedir/jobl...

Mean Performance Scores for Individual Hyperparameter Combinations

The performance scores for each pipeline and parameter combinations, respectively, averaged over all outer CV folds using SklearnPipelinePermuter.mean_pipeline_score_results().

NOTE: * The summary of these pipelines does not necessarily correspond to the best-performing pipeline as returned by SklearnPipelinePermuter.metric_summary() or SklearnPipelinePermuter.best_estimator_summary() because the best-performing pipelines are determined by averaging the best_estimator instances, as determined by scikit-learn, over all folds. Hence, all best_estimator instances can have a different set of hyperparameters, whereas in this function, it is explicitely averaged over the same set of hyperparameters. * Thus, this function should only be used if you want to gain a deeper understanding of the different hyperparameter combinations and their performance. If you want to get the best-performing pipeline(s) to report in a paper, use SklearnPipelinePermuter.metric_summary() or SklearnPipelinePermuter.best_estimator_summary() instead.

[10]:
pipeline_permuter.mean_pipeline_score_results()
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/sklearn_pipeline_permuter.py:679: FutureWarning: ['param_clf__criterion', 'param_clf__weights', 'param_reduce_dim__k', 'params'] did not aggregate successfully. If any error is raised this will raise in a future version of pandas. Drop these columns/ops to avoid this warning.
  score_results.groupby(score_results.index.names[:-1])
[10]:
mean_test_accuracy param_clf__max_depth param_clf__n_neighbors param_reduce_dim__n_features_to_select rank_test_accuracy ... split1_test_accuracy split2_test_accuracy split3_test_accuracy split4_test_accuracy std_test_accuracy
mean std mean std mean std mean std mean std ... mean std mean std mean std mean std mean std
pipeline_scaler pipeline_reduce_dim pipeline_clf parameter_combination_id
StandardScaler SelectKBest KNeighborsClassifier 11 0.959150 0.004603 NaN NaN 4.0 0.0 NaN NaN 1.2 0.447214 ... 0.960440 0.022787 0.960440 0.014743 0.980220 0.009194 0.949451 0.028656 0.019473 0.005345
MinMaxScaler SelectKBest KNeighborsClassifier 11 0.959150 0.001991 NaN NaN 4.0 0.0 NaN NaN 1.2 0.447214 ... 0.951648 0.012529 0.967033 0.017375 0.986813 0.004914 0.951648 0.016666 0.019545 0.004257
RFE KNeighborsClassifier 11 0.958705 0.005482 NaN NaN 4.0 0.0 NaN NaN 1.2 0.447214 ... 0.953846 0.009194 0.962637 0.016666 0.967033 0.010989 0.962637 0.016666 0.012098 0.002635
8 0.956507 0.006313 NaN NaN 4.0 0.0 NaN NaN 2.0 1.000000 ... 0.953846 0.009194 0.967033 0.015541 0.958242 0.014328 0.956044 0.021978 0.013115 0.003993
StandardScaler RFE KNeighborsClassifier 11 0.956073 0.007155 NaN NaN 4.0 0.0 NaN NaN 1.6 0.547723 ... 0.958242 0.023824 0.958242 0.009194 0.971429 0.012529 0.951648 0.036114 0.020734 0.003697
SelectKBest KNeighborsClassifier 8 0.955198 0.005524 NaN NaN 4.0 0.0 NaN NaN 2.0 0.707107 ... 0.960440 0.014743 0.956044 0.017375 0.978022 0.010989 0.942857 0.025059 0.019446 0.004951
MinMaxScaler SelectKBest KNeighborsClassifier 8 0.953430 0.007843 NaN NaN 4.0 0.0 NaN NaN 2.6 1.949359 ... 0.947253 0.009194 0.964835 0.014328 0.978022 0.013459 0.936264 0.025059 0.019030 0.005669
StandardScaler RFE KNeighborsClassifier 5 0.952117 0.005728 NaN NaN 2.0 0.0 NaN NaN 2.8 0.836660 ... 0.956044 0.017375 0.951648 0.014743 0.958242 0.014328 0.949451 0.032599 0.017652 0.003116
8 0.951672 0.007497 NaN NaN 4.0 0.0 NaN NaN 2.8 1.303840 ... 0.953846 0.016299 0.958242 0.009194 0.956044 0.019034 0.947253 0.039925 0.019830 0.002716
MinMaxScaler RFE KNeighborsClassifier 7 0.950354 0.007385 NaN NaN 4.0 0.0 4.000000 0.000000 3.2 1.483240 ... 0.951648 0.024076 0.953846 0.021138 0.960440 0.006019 0.960440 0.012529 0.018607 0.009022
SelectKBest KNeighborsClassifier 5 0.949474 0.003494 NaN NaN 2.0 0.0 NaN NaN 3.6 0.547723 ... 0.949451 0.018388 0.953846 0.026236 0.969231 0.012038 0.936264 0.030493 0.020818 0.007043
StandardScaler RFE KNeighborsClassifier 2 0.949470 0.007481 NaN NaN 2.0 0.0 NaN NaN 3.6 1.816590 ... 0.956044 0.015541 0.958242 0.018057 0.938462 0.022787 0.936264 0.028444 0.017660 0.008590
MinMaxScaler RFE KNeighborsClassifier 5 0.947286 0.008802 NaN NaN 2.0 0.0 NaN NaN 4.2 1.303840 ... 0.945055 0.023311 0.960440 0.009829 0.964835 0.014328 0.936264 0.028444 0.019275 0.005979
StandardScaler SelectKBest KNeighborsClassifier 5 0.946398 0.006357 NaN NaN 2.0 0.0 NaN NaN 3.8 0.836660 ... 0.953846 0.012038 0.949451 0.026465 0.958242 0.009194 0.934066 0.026917 0.017432 0.008160
MinMaxScaler RFE KNeighborsClassifier 10 0.945079 0.011841 NaN NaN 4.0 0.0 4.000000 0.000000 5.6 2.073644 ... 0.940659 0.022787 0.947253 0.021138 0.969231 0.012038 0.960440 0.012529 0.025049 0.009478
2 0.944195 0.009172 NaN NaN 2.0 0.0 NaN NaN 5.2 1.643168 ... 0.949451 0.019963 0.971429 0.006019 0.938462 0.027582 0.923077 0.026917 0.022622 0.003527
SelectKBest KNeighborsClassifier 2 0.944190 0.007720 NaN NaN 2.0 0.0 NaN NaN 4.8 1.303840 ... 0.945055 0.021978 0.951648 0.027582 0.962637 0.018388 0.914286 0.026236 0.023213 0.009375
StandardScaler SelectKBest KNeighborsClassifier 2 0.943311 0.009131 NaN NaN 2.0 0.0 NaN NaN 5.4 1.949359 ... 0.949451 0.018388 0.951648 0.027582 0.962637 0.012529 0.894505 0.032599 0.029359 0.009596
MinMaxScaler SelectKBest KNeighborsClassifier 7 0.942451 0.013020 NaN NaN 4.0 0.0 NaN NaN 3.8 1.303840 ... 0.938462 0.036114 0.962637 0.012529 0.953846 0.014328 0.951648 0.018388 0.024840 0.003868
StandardScaler RFE KNeighborsClassifier 7 0.941128 0.015268 NaN NaN 4.0 0.0 4.000000 0.000000 4.0 2.000000 ... 0.945055 0.021978 0.953846 0.018057 0.942857 0.035946 0.940659 0.031659 0.021137 0.012182
SelectKBest KNeighborsClassifier 7 0.938939 0.011950 NaN NaN 4.0 0.0 NaN NaN 3.8 1.303840 ... 0.931868 0.029487 0.956044 0.019034 0.951648 0.016666 0.949451 0.016666 0.022968 0.004169
RFE KNeighborsClassifier 10 0.938930 0.016775 NaN NaN 4.0 0.0 4.000000 0.000000 4.8 1.643168 ... 0.945055 0.010989 0.931868 0.023824 0.956044 0.032038 0.945055 0.036446 0.022850 0.004537
MinMaxScaler RFE KNeighborsClassifier 1 0.936732 0.007675 NaN NaN 2.0 0.0 4.000000 0.000000 7.2 2.049390 ... 0.945055 0.038067 0.936264 0.028444 0.938462 0.012529 0.938462 0.016666 0.021214 0.004668
StandardScaler SelectKBest KNeighborsClassifier 10 0.936302 0.011498 NaN NaN 4.0 0.0 NaN NaN 4.8 1.095445 ... 0.920879 0.033331 0.949451 0.009829 0.967033 0.007770 0.951648 0.012529 0.028909 0.006367
MinMaxScaler SelectKBest KNeighborsClassifier 10 0.934988 0.013670 NaN NaN 4.0 0.0 NaN NaN 5.6 1.140175 ... 0.923077 0.025772 0.951648 0.012529 0.964835 0.012038 0.951648 0.016666 0.030625 0.004991
RFE KNeighborsClassifier 4 0.934553 0.014500 NaN NaN 2.0 0.0 4.000000 0.000000 7.2 2.280351 ... 0.931868 0.025059 0.927473 0.029691 0.964835 0.019658 0.951648 0.016666 0.027440 0.002901
DecisionTreeClassifier 0 0.933669 0.009354 3.6 0.894427 NaN NaN 2.666667 1.154701 1.6 0.547723 ... 0.925275 0.027362 0.949451 0.016666 0.958242 0.023824 0.923077 0.025772 0.023792 0.008042
KNeighborsClassifier 6 0.931472 0.009943 NaN NaN 4.0 0.0 2.000000 0.000000 8.4 2.073644 ... 0.938462 0.016666 0.945055 0.010989 0.925275 0.014328 0.936264 0.026236 0.016603 0.005262
SelectKBest KNeighborsClassifier 1 0.929704 0.014237 NaN NaN 2.0 0.0 NaN NaN 6.6 2.302173 ... 0.920879 0.043542 0.949451 0.029691 0.923077 0.034750 0.936264 0.032413 0.027202 0.008209
RFE DecisionTreeClassifier 1 0.929279 0.009507 2.8 1.095445 NaN NaN 3.333333 1.154701 1.4 0.547723 ... 0.920879 0.021138 0.949451 0.022787 0.947253 0.023824 0.934066 0.013459 0.024440 0.013629
KNeighborsClassifier 9 0.928844 0.011586 NaN NaN 4.0 0.0 2.000000 0.000000 8.6 1.673320 ... 0.931868 0.018057 0.931868 0.018057 0.936264 0.004914 0.942857 0.012038 0.016126 0.004329
StandardScaler SelectKBest KNeighborsClassifier 1 0.927946 0.014775 NaN NaN 2.0 0.0 NaN NaN 7.4 1.816590 ... 0.923077 0.047266 0.942857 0.034225 0.923077 0.034750 0.931868 0.025059 0.026160 0.008606
RFE KNeighborsClassifier 4 0.926613 0.012894 NaN NaN 2.0 0.0 4.000000 0.000000 7.2 0.447214 ... 0.934066 0.017375 0.912088 0.036446 0.942857 0.023824 0.936264 0.022521 0.025616 0.007075
1 0.923540 0.013857 NaN NaN 2.0 0.0 4.000000 0.000000 8.0 0.707107 ... 0.912088 0.021978 0.918681 0.029691 0.927473 0.033512 0.916484 0.034401 0.023453 0.009031
SelectKBest KNeighborsClassifier 4 0.923125 0.017392 NaN NaN 2.0 0.0 NaN NaN 7.8 0.836660 ... 0.916484 0.027582 0.923077 0.033870 0.940659 0.024076 0.940659 0.034401 0.024832 0.005817
DecisionTreeClassifier 1 0.921366 0.006643 3.6 0.894427 NaN NaN NaN NaN 1.4 0.547723 ... 0.909890 0.028444 0.929670 0.019963 0.947253 0.023824 0.923077 0.013459 0.023829 0.006901
MinMaxScaler SelectKBest KNeighborsClassifier 4 0.920487 0.018497 NaN NaN 2.0 0.0 NaN NaN 8.2 1.095445 ... 0.914286 0.029487 0.925275 0.031468 0.938462 0.022787 0.936264 0.034225 0.025134 0.003906
DecisionTreeClassifier 1 0.920038 0.026502 3.6 0.894427 NaN NaN NaN NaN 1.2 0.447214 ... 0.907692 0.019963 0.931868 0.021138 0.947253 0.026236 0.927473 0.036940 0.028544 0.010968
RFE KNeighborsClassifier 3 0.918724 0.013124 NaN NaN 2.0 0.0 2.000000 0.000000 10.8 0.447214 ... 0.916484 0.025299 0.920879 0.030493 0.912088 0.015541 0.936264 0.012038 0.016863 0.005096
StandardScaler RFE DecisionTreeClassifier 1 0.914792 0.011007 3.2 1.095445 NaN NaN 2.000000 0.000000 1.4 0.547723 ... 0.907692 0.035268 0.923077 0.032038 0.940659 0.026465 0.923077 0.023311 0.032693 0.010808
SelectKBest KNeighborsClassifier 6 0.913908 0.011346 NaN NaN 4.0 0.0 NaN NaN 9.0 0.707107 ... 0.905495 0.053044 0.940659 0.019963 0.927473 0.016666 0.925275 0.012038 0.033909 0.013583
MinMaxScaler SelectKBest KNeighborsClassifier 6 0.912150 0.011100 NaN NaN 4.0 0.0 NaN NaN 8.8 0.447214 ... 0.907692 0.044366 0.934066 0.013459 0.927473 0.012529 0.918681 0.012529 0.029312 0.014397
StandardScaler RFE DecisionTreeClassifier 0 0.908247 0.020262 2.0 0.000000 NaN NaN 3.000000 1.414214 1.6 0.547723 ... 0.883516 0.012529 0.931868 0.022521 0.962637 0.018388 0.923077 0.038852 0.047392 0.025852
MinMaxScaler RFE KNeighborsClassifier 0 0.906866 0.011413 NaN NaN 2.0 0.0 2.000000 0.000000 12.0 0.000000 ... 0.925275 0.034225 0.912088 0.010989 0.890110 0.032967 0.898901 0.027362 0.021153 0.012873
StandardScaler SelectKBest KNeighborsClassifier 9 0.906455 0.020163 NaN NaN 4.0 0.0 NaN NaN 9.2 1.303840 ... 0.890110 0.050954 0.929670 0.021422 0.938462 0.012529 0.925275 0.021138 0.039798 0.014725
MinMaxScaler SelectKBest KNeighborsClassifier 9 0.903817 0.019961 NaN NaN 4.0 0.0 NaN NaN 9.4 1.341641 ... 0.892308 0.052930 0.918681 0.022787 0.936264 0.009194 0.920879 0.021138 0.038196 0.015052
StandardScaler RFE KNeighborsClassifier 6 0.902494 0.022087 NaN NaN 4.0 0.0 2.000000 0.000000 8.8 1.095445 ... 0.896703 0.030691 0.931868 0.030493 0.923077 0.034750 0.892308 0.035946 0.031191 0.009188
SelectKBest DecisionTreeClassifier 0 0.902069 0.031317 3.2 1.095445 NaN NaN NaN NaN 1.6 0.547723 ... 0.898901 0.043542 0.945055 0.019034 0.929670 0.025299 0.901099 0.028017 0.040945 0.016336
MinMaxScaler SelectKBest DecisionTreeClassifier 0 0.902045 0.014146 3.2 1.095445 NaN NaN NaN NaN 1.8 0.447214 ... 0.903297 0.026236 0.929670 0.024076 0.912088 0.028017 0.912088 0.032038 0.033928 0.015318
StandardScaler RFE KNeighborsClassifier 9 0.897664 0.021655 NaN NaN 4.0 0.0 2.000000 0.000000 9.8 0.447214 ... 0.890110 0.043264 0.925275 0.036776 0.931868 0.023824 0.907692 0.026465 0.040752 0.007431
SelectKBest KNeighborsClassifier 3 0.889756 0.023370 NaN NaN 2.0 0.0 NaN NaN 11.2 0.447214 ... 0.863736 0.059483 0.925275 0.030493 0.909890 0.014328 0.905495 0.019963 0.039302 0.010323
MinMaxScaler SelectKBest KNeighborsClassifier 3 0.889756 0.022528 NaN NaN 2.0 0.0 NaN NaN 11.4 0.547723 ... 0.865934 0.061381 0.918681 0.033512 0.909890 0.014328 0.907692 0.016666 0.038451 0.010350
StandardScaler RFE KNeighborsClassifier 3 0.888887 0.021979 NaN NaN 2.0 0.0 2.000000 0.000000 11.0 0.000000 ... 0.885714 0.041555 0.916484 0.041555 0.905495 0.032599 0.905495 0.027582 0.040177 0.007849
SelectKBest KNeighborsClassifier 0 0.888404 0.018839 NaN NaN 2.0 0.0 NaN NaN 11.6 0.547723 ... 0.859341 0.061381 0.912088 0.021978 0.887912 0.018057 0.892308 0.031468 0.031946 0.010743
MinMaxScaler SelectKBest KNeighborsClassifier 0 0.888404 0.020371 NaN NaN 2.0 0.0 NaN NaN 11.6 0.547723 ... 0.857143 0.065475 0.916484 0.026465 0.885714 0.018388 0.890110 0.032038 0.033604 0.014043
StandardScaler RFE KNeighborsClassifier 0 0.877000 0.027891 NaN NaN 2.0 0.0 2.000000 0.000000 12.0 0.000000 ... 0.872527 0.048277 0.901099 0.040376 0.879121 0.030095 0.872527 0.035268 0.023341 0.008768

56 rows × 22 columns

Best Hyperparameter Pipeline

The pipeline with the hyperparameter combination which achieved the highest average test score over all outer CV folds (i.e., the parameter combination which represents the first row of mean_pipeline_score_results()).

NOTE: * The summary of these pipelines does not necessarily correspond to the best-performing pipeline as returned by SklearnPipelinePermuter.metric_summary() or SklearnPipelinePermuter.best_estimator_summary() because the best-performing pipelines are determined by averaging the best_estimator instances, as determined by scikit-learn, over all folds. Hence, all best_estimator instances can have a different set of hyperparameters, whereas in this function, it is explicitely averaged over the same set of hyperparameters. * Thus, this function should only be used if you want to gain a deeper understanding of the different hyperparameter combinations and their performance. If you want to get the best-performing pipeline(s) to report in a paper, use SklearnPipelinePermuter.metric_summary() or SklearnPipelinePermuter.best_estimator_summary() instead.

[11]:
pipeline_permuter.best_hyperparameter_pipeline()
[11]:
mean_test_accuracy param_clf__n_neighbors param_clf__weights param_reduce_dim__k params rank_test_accuracy split0_test_accuracy split1_test_accuracy split2_test_accuracy split3_test_accuracy split4_test_accuracy std_test_accuracy
outer_fold
0 0.953846 4.0 distance all {'clf__n_neighbors': 4, 'clf__weights': 'dista... 2 0.956044 0.923077 0.967033 0.967033 0.956044 0.016150
1 0.956044 4.0 distance all {'clf__n_neighbors': 4, 'clf__weights': 'dista... 1 0.945055 0.956044 0.978022 0.989011 0.912088 0.026917
2 0.962637 4.0 distance all {'clf__n_neighbors': 4, 'clf__weights': 'dista... 1 0.945055 0.978022 0.967033 0.989011 0.934066 0.020382
3 0.958242 4.0 distance all {'clf__n_neighbors': 4, 'clf__weights': 'dista... 1 0.945055 0.967033 0.945055 0.978022 0.956044 0.012815
4 0.964978 4.0 distance all {'clf__n_neighbors': 4, 'clf__weights': 'dista... 1 0.934783 0.978022 0.945055 0.978022 0.989011 0.021102

Regression

Load Example Dataset

[12]:
diabetes_data = load_diabetes()
X_reg = diabetes_data.data
y_reg = diabetes_data.target

Specify Estimator Combinations and Parameters for Hyperparameter Search

[13]:
model_dict_reg = {
    "scaler": {"StandardScaler": StandardScaler(), "MinMaxScaler": MinMaxScaler()},
    "reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVR(kernel="linear", C=1))},
    "clf": {
        "KNeighborsRegressor": KNeighborsRegressor(),
        "DecisionTreeRegressor": DecisionTreeRegressor(),
        # "SVR": SVR(),
        # "AdaBoostRegressor": AdaBoostRegressor(),
    },
}
[14]:
params_dict_reg = {
    "StandardScaler": None,
    "MinMaxScaler": None,
    "SelectKBest": {"k": [2, 4, "all"]},
    "RFE": {"n_features_to_select": [2, 4]},
    "KNeighborsRegressor": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
    "DecisionTreeRegressor": {"max_depth": [2, 4]},
    # "SVR": [
    #    {
    #        "kernel": ["linear"],
    #        "C": np.logspace(start=-2, stop=2, num=5)
    #    },
    #    {
    #        "kernel": ["rbf"],
    #        "C": np.logspace(start=-2, stop=2, num=5),
    #        "gamma": np.logspace(start=-2, stop=2, num=5)
    #    }
    # ],
    # "AdaBoostRegressor": {
    #    "base_estimator": [DecisionTreeClassifier(max_depth=1)],
    #    "n_estimators": np.arange(20, 110, 10),
    #    "learning_rate": np.arange(0.6, 1.1, 0.1)
    # },
}


# use randomized-search for decision tree classifier, use grid-search (the default) for all other estimators
hyper_search_dict_reg = {"DecisionTreeRegressor": {"search_method": "random", "n_iter": 2}}

Setup PipelinePermuter and Cross-Validations for Model Evaluation

Note: For further information please visit the documentatin of SklearnPipelinePermuter.

[15]:
pipeline_permuter_regression = SklearnPipelinePermuter(
    model_dict_reg, params_dict_reg, hyper_search_dict=hyper_search_dict_reg
)
[16]:
outer_cv = KFold(5)
inner_cv = KFold(5)

pipeline_permuter_regression.fit(X_reg, y_reg, outer_cv=outer_cv, inner_cv=inner_cv, scoring="r2")
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__k': [2, 4, 'all'], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits


### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'reduce_dim__k': [2, 4, 'all'], 'clf__max_depth': [2, 4]}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits


### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__n_features_to_select': [2, 4], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits


### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'reduce_dim__n_features_to_select': [2, 4], 'clf__max_depth': [2, 4]}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits


### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__k': [2, 4, 'all'], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits


### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'reduce_dim__k': [2, 4, 'all'], 'clf__max_depth': [2, 4]}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits


### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__n_features_to_select': [2, 4], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits


### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'reduce_dim__n_features_to_select': [2, 4], 'clf__max_depth': [2, 4]}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits


/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
  warnings.warn("Cannot compute confusion matrix for regression tasks.")

Display Results

This works analogously to the classification example.

Further Functions

Export Results as LaTeX Table

[17]:
print(pipeline_permuter.metric_summary_to_latex())
\begin{table}[ht!]
\centering
\sisetup{table-format = 2.1(2)}

\begin{tabular}{lllS}
\toprule
{} & {} & {} & {\makecell{Accuracy [\%]}} \\
{Scaler} & {\makecell[lc]{Feature\\ Selection}} & {Classifier} & {} \\
\midrule
\multirow[c]{4}{*}{Standard} & \multirow[c]{2}{*}{SkB} & kNN & 96.0(1.8) \\
 &  & DT & 91.2(3.7) \\
\cline{2-4}
 & \multirow[c]{2}{*}{RFE} & kNN & 96.1(1.4) \\
 &  & DT & 91.2(6.9) \\
\cline{1-4} \cline{2-4}
\multirow[c]{4}{*}{Min-Max} & \multirow[c]{2}{*}{SkB} & kNN & 96.0(1.2) \\
 &  & DT & 90.5(3.6) \\
\cline{2-4}
 & \multirow[c]{2}{*}{RFE} & kNN & 96.0(2.1) \\
 &  & DT & 93.0(3.6) \\
\cline{1-4} \cline{2-4}
\bottomrule
\end{tabular}
\end{table}

Save and Load PipelinePermuter results

Save to Pickle File

[18]:
pipeline_permuter.to_pickle(tmpdir.joinpath("test.pkl"))

Load from Pickle File

[19]:
pipeline_permuter_load = SklearnPipelinePermuter.from_pickle(tmpdir.joinpath("test.pkl"))

Fit pipeline combinations and save intermediate results

This saves the current state after successfully evaluating one pipeline combination.

[20]:
pipeline_permuter.fit_and_save_intermediate(
    X=X, y=y, outer_cv=outer_cv, inner_cv=inner_cv, file_path=tmpdir.joinpath("test.pkl")
)
Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) since this combination was already fitted!
Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) since this combination was already fitted!
Skipping (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
Skipping (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) since this combination was already fitted!
Skipping (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
Skipping (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) since this combination was already fitted!

Merge multiple PipelinePermuter instances

In the case the evaluation of different classification pipelines had to be split (e.g., due to runtime reasons), the PipelinePermuter instances can be saved separately and afterwards merged back into one joint PipelinePermuter instance.

The following example provides a minimal working example, consisting of the steps:
* Initializing, fitting, and saving different PipelinePermuter instances * Loading saved PipelinePermuter instances from disk * Merging multiple PipelinePermuter instances into one instance for joint evaluation

Load Example Dataset

[21]:
breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target

Fit and Save Different PipelinePermuter instances

[22]:
model_dict_01 = {
    "scaler": {"StandardScaler": StandardScaler()},
    "reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
    "clf": {
        "KNeighborsClassifier": KNeighborsClassifier(),
    },
}
params_dict_01 = {
    "StandardScaler": None,
    "SelectKBest": {"k": [2, 4, "all"]},
    "RFE": {"n_features_to_select": [2, 4, None]},
    "KNeighborsClassifier": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
}

pipeline_permuter_01 = SklearnPipelinePermuter(model_dict_01, params_dict_01, random_state=42)

pipeline_permuter_01.fit(X, y, outer_cv=KFold(5), inner_cv=KFold(5), verbose=0)
pipeline_permuter_01.to_pickle(tmpdir.joinpath("permuter_01.pkl"))
[23]:
model_dict_02 = {
    "scaler": {"MinMaxScaler": MinMaxScaler()},
    "reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
    "clf": {
        "KNeighborsClassifier": KNeighborsClassifier(),
    },
}
params_dict_02 = {
    "MinMaxScaler": None,
    "SelectKBest": {"k": [2, 4, "all"]},
    "RFE": {"n_features_to_select": [2, 4, None]},
    "KNeighborsClassifier": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
}

pipeline_permuter_02 = SklearnPipelinePermuter(model_dict_02, params_dict_02, random_state=42)

pipeline_permuter_02.fit(X, y, outer_cv=KFold(5), inner_cv=KFold(5), verbose=0)
pipeline_permuter_02.to_pickle(tmpdir.joinpath("permuter_02.pkl"))
[24]:
model_dict_03 = {
    "scaler": {"StandardScaler": StandardScaler(), "MinMaxScaler": MinMaxScaler()},
    "reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
    "clf": {
        "DecisionTreeClassifier": DecisionTreeClassifier(),
    },
}
params_dict_03 = {
    "StandardScaler": None,
    "MinMaxScaler": None,
    "SelectKBest": {"k": [2, 4, "all"]},
    "RFE": {"n_features_to_select": [2, 4, None]},
    "DecisionTreeClassifier": {"criterion": ["gini", "entropy"], "max_depth": [2, 4]},
}

pipeline_permuter_03 = SklearnPipelinePermuter(model_dict_03, params_dict_03, random_state=42)

pipeline_permuter_03.fit(X, y, outer_cv=KFold(5), inner_cv=KFold(5), verbose=0)
pipeline_permuter_03.to_pickle(tmpdir.joinpath("permuter_03.pkl"))

Load and Merge PipelinePermuter instances

[25]:
permuter_file_list = sorted(tmpdir.glob("permuter_*.pkl"))
print(permuter_file_list)
[PosixPath('tmpdir/permuter_01.pkl'), PosixPath('tmpdir/permuter_02.pkl'), PosixPath('tmpdir/permuter_03.pkl')]
[26]:
permuter_list = [SklearnPipelinePermuter.from_pickle(p) for p in permuter_file_list]
permuter_list
[26]:
[<biopsykit.classification.model_selection.sklearn_pipeline_permuter.SklearnPipelinePermuter at 0x7f1f704903d0>,
 <biopsykit.classification.model_selection.sklearn_pipeline_permuter.SklearnPipelinePermuter at 0x7f1f70490b80>,
 <biopsykit.classification.model_selection.sklearn_pipeline_permuter.SklearnPipelinePermuter at 0x7f1f70509cd0>]
[27]:
merged_permuter = SklearnPipelinePermuter.merge_permuter_instances(permuter_list)

Double-check if permuters were correcrtly merged:

[28]:
for p in permuter_list:
    display(p.best_estimator_summary())
best_estimator
pipeline_scaler pipeline_reduce_dim pipeline_clf
StandardScaler SelectKBest KNeighborsClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
RFE KNeighborsClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
best_estimator
pipeline_scaler pipeline_reduce_dim pipeline_clf
MinMaxScaler SelectKBest KNeighborsClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
RFE KNeighborsClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
best_estimator
pipeline_scaler pipeline_reduce_dim pipeline_clf
StandardScaler SelectKBest DecisionTreeClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
RFE DecisionTreeClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
MinMaxScaler SelectKBest DecisionTreeClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
RFE DecisionTreeClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
[29]:
merged_permuter.best_estimator_summary()
[29]:
best_estimator
pipeline_scaler pipeline_reduce_dim pipeline_clf
StandardScaler SelectKBest KNeighborsClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
RFE KNeighborsClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
MinMaxScaler SelectKBest KNeighborsClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
RFE KNeighborsClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
StandardScaler SelectKBest DecisionTreeClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
RFE DecisionTreeClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
MinMaxScaler SelectKBest DecisionTreeClassifier [Pipeline(memory=Memory(location=cachedir/jobl...
RFE DecisionTreeClassifier [Pipeline(memory=Memory(location=cachedir/jobl...

Updated partially fitted SklearnPipelinePermuter with additional Parameters

For this example, we perform an experiment using a partial hyperparameter set. We save this object as pickle file, load it in the next step, update the parameter sets, and continue with our experiments. This is useful for incremental experiments without having to run multiple experiments and merge different SklearnPipelinePermuter instances.

[30]:
model_dict_partial = {
    "scaler": {"StandardScaler": StandardScaler()},
    "reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
    "clf": {
        "KNeighborsClassifier": KNeighborsClassifier(),
    },
}
params_dict_partial = {
    "StandardScaler": None,
    "SelectKBest": {"k": [2, 4, "all"]},
    "RFE": {"n_features_to_select": [2, 4, None]},
    "KNeighborsClassifier": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
}

pipeline_permuter_partial = SklearnPipelinePermuter(model_dict_partial, params_dict_partial, random_state=42)

pipeline_permuter_partial.fit(X, y, outer_cv=KFold(5), inner_cv=KFold(5))
pipeline_permuter_partial.to_pickle(tmpdir.joinpath("permuter_partial.pkl"))
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__k': [2, 4, 'all'], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits


### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__n_features_to_select': [2, 4, None], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits


[31]:
model_dict_total = {
    "scaler": {"StandardScaler": StandardScaler(), "MinMaxScaler": MinMaxScaler()},
    "reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
    "clf": {
        "KNeighborsClassifier": KNeighborsClassifier(),
        "DecisionTreeClassifier": DecisionTreeClassifier(),
    },
}

params_dict_total = {
    "StandardScaler": None,
    "MinMaxScaler": None,
    "SelectKBest": {"k": [2, 4, "all"]},
    "RFE": {"n_features_to_select": [2, 4, None]},
    "KNeighborsClassifier": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
    "DecisionTreeClassifier": {"criterion": ["gini", "entropy"], "max_depth": [2, 4]},
}
[32]:
pipeline_permuter_total = SklearnPipelinePermuter.from_pickle(tmpdir.joinpath("permuter_partial.pkl"))
pipeline_permuter_total = pipeline_permuter_total.update_permuter(model_dict_total, params_dict_total)
[33]:
pipeline_permuter_total.fit(X, y, outer_cv=KFold(5), inner_cv=KFold(5))
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__k': ['all', 2, 4], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits


Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__n_features_to_select': [2, 4, None], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits


Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__k': ['all', 2, 4], 'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4]}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits


### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__k': ['all', 2, 4], 'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4]}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits


### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__n_features_to_select': [2, 4, None], 'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4]}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits


### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__n_features_to_select': [2, 4, None], 'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4]}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits


Cleanup

[34]:
rmtree(tmpdir)
Download Notebook
(Right-Click -> Save Link As...)