Sklearn Pipeline Permuter Example¶
This example shows how to systematically evaluate different machine learning pipelines.
This is, for instance, useful if combinations of different feature selection methods with different estimators want to be evaluated in one step.
Imports and Helper Functions¶
[1]:
from pathlib import Path
from shutil import rmtree
import pandas as pd
import numpy as np
# Utils
from sklearn.datasets import load_breast_cancer, load_diabetes
# Preprocessing & Feature Selection
from sklearn.feature_selection import SelectKBest, RFE
from sklearn.preprocessing import MinMaxScaler, StandardScaler
# Classification
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
# Regression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.svm import SVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import AdaBoostRegressor
# Cross-Validation
from sklearn.model_selection import KFold
from biopsykit.classification.model_selection import SklearnPipelinePermuter
%load_ext autoreload
%autoreload 2
Classification¶
Create temporary directory
[2]:
tmpdir = Path("tmpdir")
tmpdir.mkdir(exist_ok=True)
Load Example Dataset¶
[3]:
breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target
Specify Estimator Combinations and Parameters for Hyperparameter Search¶
[4]:
model_dict = {
"scaler": {"StandardScaler": StandardScaler(), "MinMaxScaler": MinMaxScaler()},
"reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
"clf": {
"KNeighborsClassifier": KNeighborsClassifier(),
"DecisionTreeClassifier": DecisionTreeClassifier(),
# "SVC": SVC(),
# "AdaBoostClassifier": AdaBoostClassifier(),
},
}
[5]:
params_dict = {
"StandardScaler": None,
"MinMaxScaler": None,
"SelectKBest": {"k": [2, 4, "all"]},
"RFE": {"n_features_to_select": [2, 4, None]},
"KNeighborsClassifier": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
"DecisionTreeClassifier": {"criterion": ["gini", "entropy"], "max_depth": [2, 4]},
# "SVC": [
# {
# "kernel": ["linear"],
# "C": np.logspace(start=-2, stop=2, num=5)
# },
# {
# "kernel": ["rbf"],
# "C": np.logspace(start=-2, stop=2, num=5),
# "gamma": np.logspace(start=-2, stop=2, num=5)
# }
# ],
# "AdaBoostClassifier": {
# "base_estimator": [DecisionTreeClassifier(max_depth=1)],
# "n_estimators": np.arange(20, 110, 10),
# "learning_rate": np.arange(0.6, 1.1, 0.1)
# },
}
# use randomized-search for decision tree classifier, use grid-search (the default) for all other estimators
hyper_search_dict = {"DecisionTreeClassifier": {"search_method": "random", "n_iter": 2}}
Setup PipelinePermuter and Cross-Validations for Model Evaluation¶
Note: For further information please visit the documentation of SklearnPipelinePermuter.
[6]:
pipeline_permuter = SklearnPipelinePermuter(
model_dict, params_dict, hyper_search_dict=hyper_search_dict, random_state=42
)
outer_cv = KFold(5)
inner_cv = KFold(5)
Fit all Parameter Combinations¶
[7]:
pipeline_permuter.fit(X=X, y=y, outer_cv=outer_cv, inner_cv=inner_cv)
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__k': [2, 4, 'all'], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'reduce_dim__k': [2, 4, 'all'], 'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4]}
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__n_features_to_select': [2, 4, None], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'reduce_dim__n_features_to_select': [2, 4, None], 'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4]}
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__k': [2, 4, 'all'], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'reduce_dim__k': [2, 4, 'all'], 'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4]}
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__n_features_to_select': [2, 4, None], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'reduce_dim__n_features_to_select': [2, 4, None], 'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4]}
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Display Results¶
Metric Summary for Classification Pipelines¶
The summary of all relevant metrics (performance scores, confusion matrix, true and predicted labels) of the best-performing pipelines for each fold (i.e., the best_pipeline() parameter of each inner cv
object), evaluated for each evaluated pipeline combination.
[8]:
pipeline_permuter.metric_summary()
[8]:
conf_matrix | conf_matrix_folds | true_labels | true_labels_folds | predicted_labels | predicted_labels_folds | train_indices | train_indices_folds | test_indices | test_indices_folds | mean_test_accuracy | std_test_accuracy | test_accuracy_fold_0 | test_accuracy_fold_1 | test_accuracy_fold_2 | test_accuracy_fold_3 | test_accuracy_fold_4 | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
pipeline_scaler | pipeline_reduce_dim | pipeline_clf | |||||||||||||||||
StandardScaler | SelectKBest | KNeighborsClassifier | [195, 17, 6, 351] | [[62, 6, 1, 45], [46, 3, 1, 64], [35, 5, 0, 74... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... | [114, 115, 116, 117, 118, 119, 120, 121, 122, ... | [[114, 115, 116, 117, 118, 119, 120, 121, 122,... | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... | [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... | 0.959556 | 0.018127 | 0.938596 | 0.964912 | 0.956140 | 0.991228 | 0.946903 |
DecisionTreeClassifier | [184, 28, 22, 335] | [[54, 14, 2, 44], [43, 6, 4, 61], [37, 3, 1, 7... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... | [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,... | [114, 115, 116, 117, 118, 119, 120, 121, 122, ... | [[114, 115, 116, 117, 118, 119, 120, 121, 122,... | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... | [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... | 0.912079 | 0.037361 | 0.859649 | 0.912281 | 0.964912 | 0.938596 | 0.884956 | ||
RFE | KNeighborsClassifier | [201, 11, 11, 346] | [[64, 4, 0, 46], [47, 2, 5, 60], [37, 3, 1, 73... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... | [114, 115, 116, 117, 118, 119, 120, 121, 122, ... | [[114, 115, 116, 117, 118, 119, 120, 121, 122,... | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... | [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... | 0.961326 | 0.014282 | 0.964912 | 0.938596 | 0.964912 | 0.982456 | 0.955752 | |
DecisionTreeClassifier | [177, 35, 15, 342] | [[43, 25, 0, 46], [46, 3, 6, 59], [38, 2, 2, 7... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... | [0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, ... | [[0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1,... | [114, 115, 116, 117, 118, 119, 120, 121, 122, ... | [[114, 115, 116, 117, 118, 119, 120, 121, 122,... | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... | [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... | 0.912141 | 0.069276 | 0.780702 | 0.921053 | 0.964912 | 0.973684 | 0.920354 | ||
MinMaxScaler | SelectKBest | KNeighborsClassifier | [198, 14, 9, 348] | [[63, 5, 2, 44], [46, 3, 1, 64], [36, 4, 0, 74... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... | [114, 115, 116, 117, 118, 119, 120, 121, 122, ... | [[114, 115, 116, 117, 118, 119, 120, 121, 122,... | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... | [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... | 0.959571 | 0.011923 | 0.938596 | 0.964912 | 0.964912 | 0.973684 | 0.955752 |
DecisionTreeClassifier | [182, 30, 24, 333] | [[54, 14, 2, 44], [40, 9, 6, 59], [36, 4, 1, 7... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... | [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,... | [114, 115, 116, 117, 118, 119, 120, 121, 122, ... | [[114, 115, 116, 117, 118, 119, 120, 121, 122,... | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... | [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... | 0.905123 | 0.036065 | 0.859649 | 0.868421 | 0.956140 | 0.921053 | 0.920354 | ||
RFE | KNeighborsClassifier | [199, 13, 10, 347] | [[61, 7, 2, 44], [47, 2, 3, 62], [37, 3, 1, 73... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,... | [114, 115, 116, 117, 118, 119, 120, 121, 122, ... | [[114, 115, 116, 117, 118, 119, 120, 121, 122,... | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... | [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... | 0.959603 | 0.021168 | 0.921053 | 0.956140 | 0.964912 | 0.982456 | 0.973451 | |
DecisionTreeClassifier | [186, 26, 14, 343] | [[54, 14, 2, 44], [44, 5, 3, 62], [36, 4, 1, 7... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... | [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,... | [114, 115, 116, 117, 118, 119, 120, 121, 122, ... | [[114, 115, 116, 117, 118, 119, 120, 121, 122,... | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... | [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... | 0.929731 | 0.036335 | 0.859649 | 0.929825 | 0.956140 | 0.956140 | 0.946903 |
List of Pipeline
objects for the best pipeline for each evaluated pipeline combination.
[9]:
pipeline_permuter.best_estimator_summary()
[9]:
best_estimator | |||
---|---|---|---|
pipeline_scaler | pipeline_reduce_dim | pipeline_clf | |
StandardScaler | SelectKBest | KNeighborsClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
DecisionTreeClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... | ||
RFE | KNeighborsClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... | |
DecisionTreeClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... | ||
MinMaxScaler | SelectKBest | KNeighborsClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
DecisionTreeClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... | ||
RFE | KNeighborsClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... | |
DecisionTreeClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
Mean Performance Scores for Individual Hyperparameter Combinations¶
The performance scores for each pipeline and parameter combinations, respectively, averaged over all outer CV folds using SklearnPipelinePermuter.mean_pipeline_score_results().
NOTE: * The summary of these pipelines does not necessarily correspond to the best-performing pipeline as returned by SklearnPipelinePermuter.metric_summary() or
SklearnPipelinePermuter.best_estimator_summary() because the best-performing pipelines are determined by averaging the best_estimator
instances, as determined by scikit-learn
, over all folds. Hence, all best_estimator
instances can have a
different set of hyperparameters, whereas in this function, it is explicitely averaged over the same set of hyperparameters. * Thus, this function should only be used if you want to gain a deeper understanding of the different hyperparameter combinations and their performance. If you want to get the best-performing pipeline(s) to report in a paper, use
SklearnPipelinePermuter.metric_summary() or
SklearnPipelinePermuter.best_estimator_summary() instead.
[10]:
pipeline_permuter.mean_pipeline_score_results()
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/sklearn_pipeline_permuter.py:679: FutureWarning: ['param_clf__criterion', 'param_clf__weights', 'param_reduce_dim__k', 'params'] did not aggregate successfully. If any error is raised this will raise in a future version of pandas. Drop these columns/ops to avoid this warning.
score_results.groupby(score_results.index.names[:-1])
[10]:
mean_test_accuracy | param_clf__max_depth | param_clf__n_neighbors | param_reduce_dim__n_features_to_select | rank_test_accuracy | ... | split1_test_accuracy | split2_test_accuracy | split3_test_accuracy | split4_test_accuracy | std_test_accuracy | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
mean | std | mean | std | mean | std | mean | std | mean | std | ... | mean | std | mean | std | mean | std | mean | std | mean | std | ||||
pipeline_scaler | pipeline_reduce_dim | pipeline_clf | parameter_combination_id | |||||||||||||||||||||
StandardScaler | SelectKBest | KNeighborsClassifier | 11 | 0.959150 | 0.004603 | NaN | NaN | 4.0 | 0.0 | NaN | NaN | 1.2 | 0.447214 | ... | 0.960440 | 0.022787 | 0.960440 | 0.014743 | 0.980220 | 0.009194 | 0.949451 | 0.028656 | 0.019473 | 0.005345 |
MinMaxScaler | SelectKBest | KNeighborsClassifier | 11 | 0.959150 | 0.001991 | NaN | NaN | 4.0 | 0.0 | NaN | NaN | 1.2 | 0.447214 | ... | 0.951648 | 0.012529 | 0.967033 | 0.017375 | 0.986813 | 0.004914 | 0.951648 | 0.016666 | 0.019545 | 0.004257 |
RFE | KNeighborsClassifier | 11 | 0.958705 | 0.005482 | NaN | NaN | 4.0 | 0.0 | NaN | NaN | 1.2 | 0.447214 | ... | 0.953846 | 0.009194 | 0.962637 | 0.016666 | 0.967033 | 0.010989 | 0.962637 | 0.016666 | 0.012098 | 0.002635 | |
8 | 0.956507 | 0.006313 | NaN | NaN | 4.0 | 0.0 | NaN | NaN | 2.0 | 1.000000 | ... | 0.953846 | 0.009194 | 0.967033 | 0.015541 | 0.958242 | 0.014328 | 0.956044 | 0.021978 | 0.013115 | 0.003993 | |||
StandardScaler | RFE | KNeighborsClassifier | 11 | 0.956073 | 0.007155 | NaN | NaN | 4.0 | 0.0 | NaN | NaN | 1.6 | 0.547723 | ... | 0.958242 | 0.023824 | 0.958242 | 0.009194 | 0.971429 | 0.012529 | 0.951648 | 0.036114 | 0.020734 | 0.003697 |
SelectKBest | KNeighborsClassifier | 8 | 0.955198 | 0.005524 | NaN | NaN | 4.0 | 0.0 | NaN | NaN | 2.0 | 0.707107 | ... | 0.960440 | 0.014743 | 0.956044 | 0.017375 | 0.978022 | 0.010989 | 0.942857 | 0.025059 | 0.019446 | 0.004951 | |
MinMaxScaler | SelectKBest | KNeighborsClassifier | 8 | 0.953430 | 0.007843 | NaN | NaN | 4.0 | 0.0 | NaN | NaN | 2.6 | 1.949359 | ... | 0.947253 | 0.009194 | 0.964835 | 0.014328 | 0.978022 | 0.013459 | 0.936264 | 0.025059 | 0.019030 | 0.005669 |
StandardScaler | RFE | KNeighborsClassifier | 5 | 0.952117 | 0.005728 | NaN | NaN | 2.0 | 0.0 | NaN | NaN | 2.8 | 0.836660 | ... | 0.956044 | 0.017375 | 0.951648 | 0.014743 | 0.958242 | 0.014328 | 0.949451 | 0.032599 | 0.017652 | 0.003116 |
8 | 0.951672 | 0.007497 | NaN | NaN | 4.0 | 0.0 | NaN | NaN | 2.8 | 1.303840 | ... | 0.953846 | 0.016299 | 0.958242 | 0.009194 | 0.956044 | 0.019034 | 0.947253 | 0.039925 | 0.019830 | 0.002716 | |||
MinMaxScaler | RFE | KNeighborsClassifier | 7 | 0.950354 | 0.007385 | NaN | NaN | 4.0 | 0.0 | 4.000000 | 0.000000 | 3.2 | 1.483240 | ... | 0.951648 | 0.024076 | 0.953846 | 0.021138 | 0.960440 | 0.006019 | 0.960440 | 0.012529 | 0.018607 | 0.009022 |
SelectKBest | KNeighborsClassifier | 5 | 0.949474 | 0.003494 | NaN | NaN | 2.0 | 0.0 | NaN | NaN | 3.6 | 0.547723 | ... | 0.949451 | 0.018388 | 0.953846 | 0.026236 | 0.969231 | 0.012038 | 0.936264 | 0.030493 | 0.020818 | 0.007043 | |
StandardScaler | RFE | KNeighborsClassifier | 2 | 0.949470 | 0.007481 | NaN | NaN | 2.0 | 0.0 | NaN | NaN | 3.6 | 1.816590 | ... | 0.956044 | 0.015541 | 0.958242 | 0.018057 | 0.938462 | 0.022787 | 0.936264 | 0.028444 | 0.017660 | 0.008590 |
MinMaxScaler | RFE | KNeighborsClassifier | 5 | 0.947286 | 0.008802 | NaN | NaN | 2.0 | 0.0 | NaN | NaN | 4.2 | 1.303840 | ... | 0.945055 | 0.023311 | 0.960440 | 0.009829 | 0.964835 | 0.014328 | 0.936264 | 0.028444 | 0.019275 | 0.005979 |
StandardScaler | SelectKBest | KNeighborsClassifier | 5 | 0.946398 | 0.006357 | NaN | NaN | 2.0 | 0.0 | NaN | NaN | 3.8 | 0.836660 | ... | 0.953846 | 0.012038 | 0.949451 | 0.026465 | 0.958242 | 0.009194 | 0.934066 | 0.026917 | 0.017432 | 0.008160 |
MinMaxScaler | RFE | KNeighborsClassifier | 10 | 0.945079 | 0.011841 | NaN | NaN | 4.0 | 0.0 | 4.000000 | 0.000000 | 5.6 | 2.073644 | ... | 0.940659 | 0.022787 | 0.947253 | 0.021138 | 0.969231 | 0.012038 | 0.960440 | 0.012529 | 0.025049 | 0.009478 |
2 | 0.944195 | 0.009172 | NaN | NaN | 2.0 | 0.0 | NaN | NaN | 5.2 | 1.643168 | ... | 0.949451 | 0.019963 | 0.971429 | 0.006019 | 0.938462 | 0.027582 | 0.923077 | 0.026917 | 0.022622 | 0.003527 | |||
SelectKBest | KNeighborsClassifier | 2 | 0.944190 | 0.007720 | NaN | NaN | 2.0 | 0.0 | NaN | NaN | 4.8 | 1.303840 | ... | 0.945055 | 0.021978 | 0.951648 | 0.027582 | 0.962637 | 0.018388 | 0.914286 | 0.026236 | 0.023213 | 0.009375 | |
StandardScaler | SelectKBest | KNeighborsClassifier | 2 | 0.943311 | 0.009131 | NaN | NaN | 2.0 | 0.0 | NaN | NaN | 5.4 | 1.949359 | ... | 0.949451 | 0.018388 | 0.951648 | 0.027582 | 0.962637 | 0.012529 | 0.894505 | 0.032599 | 0.029359 | 0.009596 |
MinMaxScaler | SelectKBest | KNeighborsClassifier | 7 | 0.942451 | 0.013020 | NaN | NaN | 4.0 | 0.0 | NaN | NaN | 3.8 | 1.303840 | ... | 0.938462 | 0.036114 | 0.962637 | 0.012529 | 0.953846 | 0.014328 | 0.951648 | 0.018388 | 0.024840 | 0.003868 |
StandardScaler | RFE | KNeighborsClassifier | 7 | 0.941128 | 0.015268 | NaN | NaN | 4.0 | 0.0 | 4.000000 | 0.000000 | 4.0 | 2.000000 | ... | 0.945055 | 0.021978 | 0.953846 | 0.018057 | 0.942857 | 0.035946 | 0.940659 | 0.031659 | 0.021137 | 0.012182 |
SelectKBest | KNeighborsClassifier | 7 | 0.938939 | 0.011950 | NaN | NaN | 4.0 | 0.0 | NaN | NaN | 3.8 | 1.303840 | ... | 0.931868 | 0.029487 | 0.956044 | 0.019034 | 0.951648 | 0.016666 | 0.949451 | 0.016666 | 0.022968 | 0.004169 | |
RFE | KNeighborsClassifier | 10 | 0.938930 | 0.016775 | NaN | NaN | 4.0 | 0.0 | 4.000000 | 0.000000 | 4.8 | 1.643168 | ... | 0.945055 | 0.010989 | 0.931868 | 0.023824 | 0.956044 | 0.032038 | 0.945055 | 0.036446 | 0.022850 | 0.004537 | |
MinMaxScaler | RFE | KNeighborsClassifier | 1 | 0.936732 | 0.007675 | NaN | NaN | 2.0 | 0.0 | 4.000000 | 0.000000 | 7.2 | 2.049390 | ... | 0.945055 | 0.038067 | 0.936264 | 0.028444 | 0.938462 | 0.012529 | 0.938462 | 0.016666 | 0.021214 | 0.004668 |
StandardScaler | SelectKBest | KNeighborsClassifier | 10 | 0.936302 | 0.011498 | NaN | NaN | 4.0 | 0.0 | NaN | NaN | 4.8 | 1.095445 | ... | 0.920879 | 0.033331 | 0.949451 | 0.009829 | 0.967033 | 0.007770 | 0.951648 | 0.012529 | 0.028909 | 0.006367 |
MinMaxScaler | SelectKBest | KNeighborsClassifier | 10 | 0.934988 | 0.013670 | NaN | NaN | 4.0 | 0.0 | NaN | NaN | 5.6 | 1.140175 | ... | 0.923077 | 0.025772 | 0.951648 | 0.012529 | 0.964835 | 0.012038 | 0.951648 | 0.016666 | 0.030625 | 0.004991 |
RFE | KNeighborsClassifier | 4 | 0.934553 | 0.014500 | NaN | NaN | 2.0 | 0.0 | 4.000000 | 0.000000 | 7.2 | 2.280351 | ... | 0.931868 | 0.025059 | 0.927473 | 0.029691 | 0.964835 | 0.019658 | 0.951648 | 0.016666 | 0.027440 | 0.002901 | |
DecisionTreeClassifier | 0 | 0.933669 | 0.009354 | 3.6 | 0.894427 | NaN | NaN | 2.666667 | 1.154701 | 1.6 | 0.547723 | ... | 0.925275 | 0.027362 | 0.949451 | 0.016666 | 0.958242 | 0.023824 | 0.923077 | 0.025772 | 0.023792 | 0.008042 | ||
KNeighborsClassifier | 6 | 0.931472 | 0.009943 | NaN | NaN | 4.0 | 0.0 | 2.000000 | 0.000000 | 8.4 | 2.073644 | ... | 0.938462 | 0.016666 | 0.945055 | 0.010989 | 0.925275 | 0.014328 | 0.936264 | 0.026236 | 0.016603 | 0.005262 | ||
SelectKBest | KNeighborsClassifier | 1 | 0.929704 | 0.014237 | NaN | NaN | 2.0 | 0.0 | NaN | NaN | 6.6 | 2.302173 | ... | 0.920879 | 0.043542 | 0.949451 | 0.029691 | 0.923077 | 0.034750 | 0.936264 | 0.032413 | 0.027202 | 0.008209 | |
RFE | DecisionTreeClassifier | 1 | 0.929279 | 0.009507 | 2.8 | 1.095445 | NaN | NaN | 3.333333 | 1.154701 | 1.4 | 0.547723 | ... | 0.920879 | 0.021138 | 0.949451 | 0.022787 | 0.947253 | 0.023824 | 0.934066 | 0.013459 | 0.024440 | 0.013629 | |
KNeighborsClassifier | 9 | 0.928844 | 0.011586 | NaN | NaN | 4.0 | 0.0 | 2.000000 | 0.000000 | 8.6 | 1.673320 | ... | 0.931868 | 0.018057 | 0.931868 | 0.018057 | 0.936264 | 0.004914 | 0.942857 | 0.012038 | 0.016126 | 0.004329 | ||
StandardScaler | SelectKBest | KNeighborsClassifier | 1 | 0.927946 | 0.014775 | NaN | NaN | 2.0 | 0.0 | NaN | NaN | 7.4 | 1.816590 | ... | 0.923077 | 0.047266 | 0.942857 | 0.034225 | 0.923077 | 0.034750 | 0.931868 | 0.025059 | 0.026160 | 0.008606 |
RFE | KNeighborsClassifier | 4 | 0.926613 | 0.012894 | NaN | NaN | 2.0 | 0.0 | 4.000000 | 0.000000 | 7.2 | 0.447214 | ... | 0.934066 | 0.017375 | 0.912088 | 0.036446 | 0.942857 | 0.023824 | 0.936264 | 0.022521 | 0.025616 | 0.007075 | |
1 | 0.923540 | 0.013857 | NaN | NaN | 2.0 | 0.0 | 4.000000 | 0.000000 | 8.0 | 0.707107 | ... | 0.912088 | 0.021978 | 0.918681 | 0.029691 | 0.927473 | 0.033512 | 0.916484 | 0.034401 | 0.023453 | 0.009031 | |||
SelectKBest | KNeighborsClassifier | 4 | 0.923125 | 0.017392 | NaN | NaN | 2.0 | 0.0 | NaN | NaN | 7.8 | 0.836660 | ... | 0.916484 | 0.027582 | 0.923077 | 0.033870 | 0.940659 | 0.024076 | 0.940659 | 0.034401 | 0.024832 | 0.005817 | |
DecisionTreeClassifier | 1 | 0.921366 | 0.006643 | 3.6 | 0.894427 | NaN | NaN | NaN | NaN | 1.4 | 0.547723 | ... | 0.909890 | 0.028444 | 0.929670 | 0.019963 | 0.947253 | 0.023824 | 0.923077 | 0.013459 | 0.023829 | 0.006901 | ||
MinMaxScaler | SelectKBest | KNeighborsClassifier | 4 | 0.920487 | 0.018497 | NaN | NaN | 2.0 | 0.0 | NaN | NaN | 8.2 | 1.095445 | ... | 0.914286 | 0.029487 | 0.925275 | 0.031468 | 0.938462 | 0.022787 | 0.936264 | 0.034225 | 0.025134 | 0.003906 |
DecisionTreeClassifier | 1 | 0.920038 | 0.026502 | 3.6 | 0.894427 | NaN | NaN | NaN | NaN | 1.2 | 0.447214 | ... | 0.907692 | 0.019963 | 0.931868 | 0.021138 | 0.947253 | 0.026236 | 0.927473 | 0.036940 | 0.028544 | 0.010968 | ||
RFE | KNeighborsClassifier | 3 | 0.918724 | 0.013124 | NaN | NaN | 2.0 | 0.0 | 2.000000 | 0.000000 | 10.8 | 0.447214 | ... | 0.916484 | 0.025299 | 0.920879 | 0.030493 | 0.912088 | 0.015541 | 0.936264 | 0.012038 | 0.016863 | 0.005096 | |
StandardScaler | RFE | DecisionTreeClassifier | 1 | 0.914792 | 0.011007 | 3.2 | 1.095445 | NaN | NaN | 2.000000 | 0.000000 | 1.4 | 0.547723 | ... | 0.907692 | 0.035268 | 0.923077 | 0.032038 | 0.940659 | 0.026465 | 0.923077 | 0.023311 | 0.032693 | 0.010808 |
SelectKBest | KNeighborsClassifier | 6 | 0.913908 | 0.011346 | NaN | NaN | 4.0 | 0.0 | NaN | NaN | 9.0 | 0.707107 | ... | 0.905495 | 0.053044 | 0.940659 | 0.019963 | 0.927473 | 0.016666 | 0.925275 | 0.012038 | 0.033909 | 0.013583 | |
MinMaxScaler | SelectKBest | KNeighborsClassifier | 6 | 0.912150 | 0.011100 | NaN | NaN | 4.0 | 0.0 | NaN | NaN | 8.8 | 0.447214 | ... | 0.907692 | 0.044366 | 0.934066 | 0.013459 | 0.927473 | 0.012529 | 0.918681 | 0.012529 | 0.029312 | 0.014397 |
StandardScaler | RFE | DecisionTreeClassifier | 0 | 0.908247 | 0.020262 | 2.0 | 0.000000 | NaN | NaN | 3.000000 | 1.414214 | 1.6 | 0.547723 | ... | 0.883516 | 0.012529 | 0.931868 | 0.022521 | 0.962637 | 0.018388 | 0.923077 | 0.038852 | 0.047392 | 0.025852 |
MinMaxScaler | RFE | KNeighborsClassifier | 0 | 0.906866 | 0.011413 | NaN | NaN | 2.0 | 0.0 | 2.000000 | 0.000000 | 12.0 | 0.000000 | ... | 0.925275 | 0.034225 | 0.912088 | 0.010989 | 0.890110 | 0.032967 | 0.898901 | 0.027362 | 0.021153 | 0.012873 |
StandardScaler | SelectKBest | KNeighborsClassifier | 9 | 0.906455 | 0.020163 | NaN | NaN | 4.0 | 0.0 | NaN | NaN | 9.2 | 1.303840 | ... | 0.890110 | 0.050954 | 0.929670 | 0.021422 | 0.938462 | 0.012529 | 0.925275 | 0.021138 | 0.039798 | 0.014725 |
MinMaxScaler | SelectKBest | KNeighborsClassifier | 9 | 0.903817 | 0.019961 | NaN | NaN | 4.0 | 0.0 | NaN | NaN | 9.4 | 1.341641 | ... | 0.892308 | 0.052930 | 0.918681 | 0.022787 | 0.936264 | 0.009194 | 0.920879 | 0.021138 | 0.038196 | 0.015052 |
StandardScaler | RFE | KNeighborsClassifier | 6 | 0.902494 | 0.022087 | NaN | NaN | 4.0 | 0.0 | 2.000000 | 0.000000 | 8.8 | 1.095445 | ... | 0.896703 | 0.030691 | 0.931868 | 0.030493 | 0.923077 | 0.034750 | 0.892308 | 0.035946 | 0.031191 | 0.009188 |
SelectKBest | DecisionTreeClassifier | 0 | 0.902069 | 0.031317 | 3.2 | 1.095445 | NaN | NaN | NaN | NaN | 1.6 | 0.547723 | ... | 0.898901 | 0.043542 | 0.945055 | 0.019034 | 0.929670 | 0.025299 | 0.901099 | 0.028017 | 0.040945 | 0.016336 | |
MinMaxScaler | SelectKBest | DecisionTreeClassifier | 0 | 0.902045 | 0.014146 | 3.2 | 1.095445 | NaN | NaN | NaN | NaN | 1.8 | 0.447214 | ... | 0.903297 | 0.026236 | 0.929670 | 0.024076 | 0.912088 | 0.028017 | 0.912088 | 0.032038 | 0.033928 | 0.015318 |
StandardScaler | RFE | KNeighborsClassifier | 9 | 0.897664 | 0.021655 | NaN | NaN | 4.0 | 0.0 | 2.000000 | 0.000000 | 9.8 | 0.447214 | ... | 0.890110 | 0.043264 | 0.925275 | 0.036776 | 0.931868 | 0.023824 | 0.907692 | 0.026465 | 0.040752 | 0.007431 |
SelectKBest | KNeighborsClassifier | 3 | 0.889756 | 0.023370 | NaN | NaN | 2.0 | 0.0 | NaN | NaN | 11.2 | 0.447214 | ... | 0.863736 | 0.059483 | 0.925275 | 0.030493 | 0.909890 | 0.014328 | 0.905495 | 0.019963 | 0.039302 | 0.010323 | |
MinMaxScaler | SelectKBest | KNeighborsClassifier | 3 | 0.889756 | 0.022528 | NaN | NaN | 2.0 | 0.0 | NaN | NaN | 11.4 | 0.547723 | ... | 0.865934 | 0.061381 | 0.918681 | 0.033512 | 0.909890 | 0.014328 | 0.907692 | 0.016666 | 0.038451 | 0.010350 |
StandardScaler | RFE | KNeighborsClassifier | 3 | 0.888887 | 0.021979 | NaN | NaN | 2.0 | 0.0 | 2.000000 | 0.000000 | 11.0 | 0.000000 | ... | 0.885714 | 0.041555 | 0.916484 | 0.041555 | 0.905495 | 0.032599 | 0.905495 | 0.027582 | 0.040177 | 0.007849 |
SelectKBest | KNeighborsClassifier | 0 | 0.888404 | 0.018839 | NaN | NaN | 2.0 | 0.0 | NaN | NaN | 11.6 | 0.547723 | ... | 0.859341 | 0.061381 | 0.912088 | 0.021978 | 0.887912 | 0.018057 | 0.892308 | 0.031468 | 0.031946 | 0.010743 | |
MinMaxScaler | SelectKBest | KNeighborsClassifier | 0 | 0.888404 | 0.020371 | NaN | NaN | 2.0 | 0.0 | NaN | NaN | 11.6 | 0.547723 | ... | 0.857143 | 0.065475 | 0.916484 | 0.026465 | 0.885714 | 0.018388 | 0.890110 | 0.032038 | 0.033604 | 0.014043 |
StandardScaler | RFE | KNeighborsClassifier | 0 | 0.877000 | 0.027891 | NaN | NaN | 2.0 | 0.0 | 2.000000 | 0.000000 | 12.0 | 0.000000 | ... | 0.872527 | 0.048277 | 0.901099 | 0.040376 | 0.879121 | 0.030095 | 0.872527 | 0.035268 | 0.023341 | 0.008768 |
56 rows × 22 columns
Best Hyperparameter Pipeline¶
The pipeline with the hyperparameter combination which achieved the highest average test score over all outer CV folds (i.e., the parameter combination which represents the first row of mean_pipeline_score_results()).
NOTE: * The summary of these pipelines does not necessarily correspond to the best-performing pipeline as returned by SklearnPipelinePermuter.metric_summary() or
SklearnPipelinePermuter.best_estimator_summary() because the best-performing pipelines are determined by averaging the best_estimator
instances, as determined by scikit-learn
, over all folds. Hence, all best_estimator
instances can have a
different set of hyperparameters, whereas in this function, it is explicitely averaged over the same set of hyperparameters. * Thus, this function should only be used if you want to gain a deeper understanding of the different hyperparameter combinations and their performance. If you want to get the best-performing pipeline(s) to report in a paper, use
SklearnPipelinePermuter.metric_summary() or
SklearnPipelinePermuter.best_estimator_summary() instead.
[11]:
pipeline_permuter.best_hyperparameter_pipeline()
[11]:
mean_test_accuracy | param_clf__n_neighbors | param_clf__weights | param_reduce_dim__k | params | rank_test_accuracy | split0_test_accuracy | split1_test_accuracy | split2_test_accuracy | split3_test_accuracy | split4_test_accuracy | std_test_accuracy | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
outer_fold | ||||||||||||
0 | 0.953846 | 4.0 | distance | all | {'clf__n_neighbors': 4, 'clf__weights': 'dista... | 2 | 0.956044 | 0.923077 | 0.967033 | 0.967033 | 0.956044 | 0.016150 |
1 | 0.956044 | 4.0 | distance | all | {'clf__n_neighbors': 4, 'clf__weights': 'dista... | 1 | 0.945055 | 0.956044 | 0.978022 | 0.989011 | 0.912088 | 0.026917 |
2 | 0.962637 | 4.0 | distance | all | {'clf__n_neighbors': 4, 'clf__weights': 'dista... | 1 | 0.945055 | 0.978022 | 0.967033 | 0.989011 | 0.934066 | 0.020382 |
3 | 0.958242 | 4.0 | distance | all | {'clf__n_neighbors': 4, 'clf__weights': 'dista... | 1 | 0.945055 | 0.967033 | 0.945055 | 0.978022 | 0.956044 | 0.012815 |
4 | 0.964978 | 4.0 | distance | all | {'clf__n_neighbors': 4, 'clf__weights': 'dista... | 1 | 0.934783 | 0.978022 | 0.945055 | 0.978022 | 0.989011 | 0.021102 |
Regression¶
Load Example Dataset¶
[12]:
diabetes_data = load_diabetes()
X_reg = diabetes_data.data
y_reg = diabetes_data.target
Specify Estimator Combinations and Parameters for Hyperparameter Search¶
[13]:
model_dict_reg = {
"scaler": {"StandardScaler": StandardScaler(), "MinMaxScaler": MinMaxScaler()},
"reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVR(kernel="linear", C=1))},
"clf": {
"KNeighborsRegressor": KNeighborsRegressor(),
"DecisionTreeRegressor": DecisionTreeRegressor(),
# "SVR": SVR(),
# "AdaBoostRegressor": AdaBoostRegressor(),
},
}
[14]:
params_dict_reg = {
"StandardScaler": None,
"MinMaxScaler": None,
"SelectKBest": {"k": [2, 4, "all"]},
"RFE": {"n_features_to_select": [2, 4]},
"KNeighborsRegressor": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
"DecisionTreeRegressor": {"max_depth": [2, 4]},
# "SVR": [
# {
# "kernel": ["linear"],
# "C": np.logspace(start=-2, stop=2, num=5)
# },
# {
# "kernel": ["rbf"],
# "C": np.logspace(start=-2, stop=2, num=5),
# "gamma": np.logspace(start=-2, stop=2, num=5)
# }
# ],
# "AdaBoostRegressor": {
# "base_estimator": [DecisionTreeClassifier(max_depth=1)],
# "n_estimators": np.arange(20, 110, 10),
# "learning_rate": np.arange(0.6, 1.1, 0.1)
# },
}
# use randomized-search for decision tree classifier, use grid-search (the default) for all other estimators
hyper_search_dict_reg = {"DecisionTreeRegressor": {"search_method": "random", "n_iter": 2}}
Setup PipelinePermuter and Cross-Validations for Model Evaluation¶
Note: For further information please visit the documentatin of SklearnPipelinePermuter.
[15]:
pipeline_permuter_regression = SklearnPipelinePermuter(
model_dict_reg, params_dict_reg, hyper_search_dict=hyper_search_dict_reg
)
[16]:
outer_cv = KFold(5)
inner_cv = KFold(5)
pipeline_permuter_regression.fit(X_reg, y_reg, outer_cv=outer_cv, inner_cv=inner_cv, scoring="r2")
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__k': [2, 4, 'all'], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'reduce_dim__k': [2, 4, 'all'], 'clf__max_depth': [2, 4]}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__n_features_to_select': [2, 4], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'reduce_dim__n_features_to_select': [2, 4], 'clf__max_depth': [2, 4]}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__k': [2, 4, 'all'], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'reduce_dim__k': [2, 4, 'all'], 'clf__max_depth': [2, 4]}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__n_features_to_select': [2, 4], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'reduce_dim__n_features_to_select': [2, 4], 'clf__max_depth': [2, 4]}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:149: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Display Results¶
This works analogously to the classification example.
Further Functions¶
Export Results as LaTeX Table¶
[17]:
print(pipeline_permuter.metric_summary_to_latex())
\begin{table}[ht!]
\centering
\sisetup{table-format = 2.1(2)}
\begin{tabular}{lllS}
\toprule
{} & {} & {} & {\makecell{Accuracy [\%]}} \\
{Scaler} & {\makecell[lc]{Feature\\ Selection}} & {Classifier} & {} \\
\midrule
\multirow[c]{4}{*}{Standard} & \multirow[c]{2}{*}{SkB} & kNN & 96.0(1.8) \\
& & DT & 91.2(3.7) \\
\cline{2-4}
& \multirow[c]{2}{*}{RFE} & kNN & 96.1(1.4) \\
& & DT & 91.2(6.9) \\
\cline{1-4} \cline{2-4}
\multirow[c]{4}{*}{Min-Max} & \multirow[c]{2}{*}{SkB} & kNN & 96.0(1.2) \\
& & DT & 90.5(3.6) \\
\cline{2-4}
& \multirow[c]{2}{*}{RFE} & kNN & 96.0(2.1) \\
& & DT & 93.0(3.6) \\
\cline{1-4} \cline{2-4}
\bottomrule
\end{tabular}
\end{table}
Save and Load PipelinePermuter
results¶
Save to Pickle File¶
[18]:
pipeline_permuter.to_pickle(tmpdir.joinpath("test.pkl"))
Load from Pickle File¶
[19]:
pipeline_permuter_load = SklearnPipelinePermuter.from_pickle(tmpdir.joinpath("test.pkl"))
Fit pipeline combinations and save intermediate results¶
This saves the current state after successfully evaluating one pipeline combination.
[20]:
pipeline_permuter.fit_and_save_intermediate(
X=X, y=y, outer_cv=outer_cv, inner_cv=inner_cv, file_path=tmpdir.joinpath("test.pkl")
)
Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) since this combination was already fitted!
Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) since this combination was already fitted!
Skipping (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
Skipping (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) since this combination was already fitted!
Skipping (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
Skipping (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) since this combination was already fitted!
Merge multiple PipelinePermuter
instances¶
In the case the evaluation of different classification pipelines had to be split (e.g., due to runtime reasons), the PipelinePermuter
instances can be saved separately and afterwards merged back into one joint PipelinePermuter
instance.
PipelinePermuter
instances * Loading saved PipelinePermuter
instances from disk * Merging multiple PipelinePermuter
instances into one instance for joint evaluationLoad Example Dataset¶
[21]:
breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target
Fit and Save Different PipelinePermuter
instances¶
[22]:
model_dict_01 = {
"scaler": {"StandardScaler": StandardScaler()},
"reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
"clf": {
"KNeighborsClassifier": KNeighborsClassifier(),
},
}
params_dict_01 = {
"StandardScaler": None,
"SelectKBest": {"k": [2, 4, "all"]},
"RFE": {"n_features_to_select": [2, 4, None]},
"KNeighborsClassifier": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
}
pipeline_permuter_01 = SklearnPipelinePermuter(model_dict_01, params_dict_01, random_state=42)
pipeline_permuter_01.fit(X, y, outer_cv=KFold(5), inner_cv=KFold(5), verbose=0)
pipeline_permuter_01.to_pickle(tmpdir.joinpath("permuter_01.pkl"))
[23]:
model_dict_02 = {
"scaler": {"MinMaxScaler": MinMaxScaler()},
"reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
"clf": {
"KNeighborsClassifier": KNeighborsClassifier(),
},
}
params_dict_02 = {
"MinMaxScaler": None,
"SelectKBest": {"k": [2, 4, "all"]},
"RFE": {"n_features_to_select": [2, 4, None]},
"KNeighborsClassifier": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
}
pipeline_permuter_02 = SklearnPipelinePermuter(model_dict_02, params_dict_02, random_state=42)
pipeline_permuter_02.fit(X, y, outer_cv=KFold(5), inner_cv=KFold(5), verbose=0)
pipeline_permuter_02.to_pickle(tmpdir.joinpath("permuter_02.pkl"))
[24]:
model_dict_03 = {
"scaler": {"StandardScaler": StandardScaler(), "MinMaxScaler": MinMaxScaler()},
"reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
"clf": {
"DecisionTreeClassifier": DecisionTreeClassifier(),
},
}
params_dict_03 = {
"StandardScaler": None,
"MinMaxScaler": None,
"SelectKBest": {"k": [2, 4, "all"]},
"RFE": {"n_features_to_select": [2, 4, None]},
"DecisionTreeClassifier": {"criterion": ["gini", "entropy"], "max_depth": [2, 4]},
}
pipeline_permuter_03 = SklearnPipelinePermuter(model_dict_03, params_dict_03, random_state=42)
pipeline_permuter_03.fit(X, y, outer_cv=KFold(5), inner_cv=KFold(5), verbose=0)
pipeline_permuter_03.to_pickle(tmpdir.joinpath("permuter_03.pkl"))
Load and Merge PipelinePermuter
instances¶
[25]:
permuter_file_list = sorted(tmpdir.glob("permuter_*.pkl"))
print(permuter_file_list)
[PosixPath('tmpdir/permuter_01.pkl'), PosixPath('tmpdir/permuter_02.pkl'), PosixPath('tmpdir/permuter_03.pkl')]
[26]:
permuter_list = [SklearnPipelinePermuter.from_pickle(p) for p in permuter_file_list]
permuter_list
[26]:
[<biopsykit.classification.model_selection.sklearn_pipeline_permuter.SklearnPipelinePermuter at 0x7f1f704903d0>,
<biopsykit.classification.model_selection.sklearn_pipeline_permuter.SklearnPipelinePermuter at 0x7f1f70490b80>,
<biopsykit.classification.model_selection.sklearn_pipeline_permuter.SklearnPipelinePermuter at 0x7f1f70509cd0>]
[27]:
merged_permuter = SklearnPipelinePermuter.merge_permuter_instances(permuter_list)
Double-check if permuters were correcrtly merged:
[28]:
for p in permuter_list:
display(p.best_estimator_summary())
best_estimator | |||
---|---|---|---|
pipeline_scaler | pipeline_reduce_dim | pipeline_clf | |
StandardScaler | SelectKBest | KNeighborsClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
RFE | KNeighborsClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
best_estimator | |||
---|---|---|---|
pipeline_scaler | pipeline_reduce_dim | pipeline_clf | |
MinMaxScaler | SelectKBest | KNeighborsClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
RFE | KNeighborsClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
best_estimator | |||
---|---|---|---|
pipeline_scaler | pipeline_reduce_dim | pipeline_clf | |
StandardScaler | SelectKBest | DecisionTreeClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
RFE | DecisionTreeClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... | |
MinMaxScaler | SelectKBest | DecisionTreeClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
RFE | DecisionTreeClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
[29]:
merged_permuter.best_estimator_summary()
[29]:
best_estimator | |||
---|---|---|---|
pipeline_scaler | pipeline_reduce_dim | pipeline_clf | |
StandardScaler | SelectKBest | KNeighborsClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
RFE | KNeighborsClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... | |
MinMaxScaler | SelectKBest | KNeighborsClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
RFE | KNeighborsClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... | |
StandardScaler | SelectKBest | DecisionTreeClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
RFE | DecisionTreeClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... | |
MinMaxScaler | SelectKBest | DecisionTreeClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
RFE | DecisionTreeClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
Updated partially fitted SklearnPipelinePermuter
with additional Parameters¶
For this example, we perform an experiment using a partial hyperparameter set. We save this object as pickle file, load it in the next step, update the parameter sets, and continue with our experiments. This is useful for incremental experiments without having to run multiple experiments and merge different SklearnPipelinePermuter
instances.
[30]:
model_dict_partial = {
"scaler": {"StandardScaler": StandardScaler()},
"reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
"clf": {
"KNeighborsClassifier": KNeighborsClassifier(),
},
}
params_dict_partial = {
"StandardScaler": None,
"SelectKBest": {"k": [2, 4, "all"]},
"RFE": {"n_features_to_select": [2, 4, None]},
"KNeighborsClassifier": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
}
pipeline_permuter_partial = SklearnPipelinePermuter(model_dict_partial, params_dict_partial, random_state=42)
pipeline_permuter_partial.fit(X, y, outer_cv=KFold(5), inner_cv=KFold(5))
pipeline_permuter_partial.to_pickle(tmpdir.joinpath("permuter_partial.pkl"))
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__k': [2, 4, 'all'], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__n_features_to_select': [2, 4, None], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
[31]:
model_dict_total = {
"scaler": {"StandardScaler": StandardScaler(), "MinMaxScaler": MinMaxScaler()},
"reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
"clf": {
"KNeighborsClassifier": KNeighborsClassifier(),
"DecisionTreeClassifier": DecisionTreeClassifier(),
},
}
params_dict_total = {
"StandardScaler": None,
"MinMaxScaler": None,
"SelectKBest": {"k": [2, 4, "all"]},
"RFE": {"n_features_to_select": [2, 4, None]},
"KNeighborsClassifier": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
"DecisionTreeClassifier": {"criterion": ["gini", "entropy"], "max_depth": [2, 4]},
}
[32]:
pipeline_permuter_total = SklearnPipelinePermuter.from_pickle(tmpdir.joinpath("permuter_partial.pkl"))
pipeline_permuter_total = pipeline_permuter_total.update_permuter(model_dict_total, params_dict_total)
[33]:
pipeline_permuter_total.fit(X, y, outer_cv=KFold(5), inner_cv=KFold(5))
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__k': ['all', 2, 4], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__n_features_to_select': [2, 4, None], 'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__k': ['all', 2, 4], 'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4]}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__k': ['all', 2, 4], 'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4]}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__n_features_to_select': [2, 4, None], 'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4]}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'reduce_dim__n_features_to_select': [2, 4, None], 'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4]}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Cleanup¶
[34]:
rmtree(tmpdir)