Sklearn Pipeline Permuter Example¶
This example shows how to systematically evaluate different machine learning pipelines.
This is, for instance, useful if combinations of different feature selection methods with different estimators want to be evaluated in one step.
Imports and Helper Functions¶
[1]:
from pathlib import Path
from shutil import rmtree
# Utils
from sklearn.datasets import load_breast_cancer, load_diabetes
# Preprocessing & Feature Selection
from sklearn.feature_selection import RFE, SelectKBest
# Cross-Validation
from sklearn.model_selection import KFold
# Classification
# Regression
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.svm import SVC, SVR
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from biopsykit.classification.model_selection import SklearnPipelinePermuter
%load_ext autoreload
%autoreload 2
Classification¶
Create temporary directory
[2]:
tmpdir = Path("tmpdir")
tmpdir.mkdir(exist_ok=True)
Load Example Dataset¶
[3]:
breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target
Specify Estimator Combinations and Parameters for Hyperparameter Search¶
[4]:
model_dict = {
"scaler": {"StandardScaler": StandardScaler(), "MinMaxScaler": MinMaxScaler()},
"reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
"clf": {
"KNeighborsClassifier": KNeighborsClassifier(),
"DecisionTreeClassifier": DecisionTreeClassifier(),
# "SVC": SVC(),
# "AdaBoostClassifier": AdaBoostClassifier(),
},
}
[5]:
params_dict = {
"StandardScaler": None,
"MinMaxScaler": None,
"SelectKBest": {"k": [2, 4, "all"]},
"RFE": {"n_features_to_select": [2, 4, None]},
"KNeighborsClassifier": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
"DecisionTreeClassifier": {"criterion": ["gini", "entropy"], "max_depth": [2, 4]},
# "SVC": [
# {
# "kernel": ["linear"],
# "C": np.logspace(start=-2, stop=2, num=5)
# },
# {
# "kernel": ["rbf"],
# "C": np.logspace(start=-2, stop=2, num=5),
# "gamma": np.logspace(start=-2, stop=2, num=5)
# }
# ],
# "AdaBoostClassifier": {
# "base_estimator": [DecisionTreeClassifier(max_depth=1)],
# "n_estimators": np.arange(20, 110, 10),
# "learning_rate": np.arange(0.6, 1.1, 0.1)
# },
}
# use randomized-search for decision tree classifier, use grid-search (the default) for all other estimators
hyper_search_dict = {"DecisionTreeClassifier": {"search_method": "random", "n_iter": 2}}
Setup PipelinePermuter and Cross-Validations for Model Evaluation¶
Note: For further information please visit the documentation of SklearnPipelinePermuter.
[6]:
pipeline_permuter = SklearnPipelinePermuter(
model_dict, params_dict, hyper_search_dict=hyper_search_dict, random_state=42
)
outer_cv = KFold(5)
inner_cv = KFold(5)
Fit all Parameter Combinations¶
[7]:
pipeline_permuter.fit(X=X, y=y, outer_cv=outer_cv, inner_cv=inner_cv)
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance'], 'reduce_dim__k': [2, 4, 'all']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4], 'reduce_dim__k': [2, 4, 'all']}
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance'], 'reduce_dim__n_features_to_select': [2, 4, None]}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4], 'reduce_dim__n_features_to_select': [2, 4, None]}
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance'], 'reduce_dim__k': [2, 4, 'all']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4], 'reduce_dim__k': [2, 4, 'all']}
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance'], 'reduce_dim__n_features_to_select': [2, 4, None]}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4], 'reduce_dim__n_features_to_select': [2, 4, None]}
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Display Results¶
Metric Summary for Classification Pipelines¶
The summary of all relevant metrics (performance scores, confusion matrix, true and predicted labels) of the best-performing pipelines for each fold (i.e., the best_pipeline() parameter of each inner cv object), evaluated for each evaluated pipeline combination.
[8]:
pipeline_permuter.metric_summary()
[8]:
| conf_matrix | conf_matrix_folds | true_labels | true_labels_folds | predicted_labels | predicted_labels_folds | train_indices | train_indices_folds | test_indices | test_indices_folds | mean_test_accuracy | std_test_accuracy | test_accuracy_fold_0 | test_accuracy_fold_1 | test_accuracy_fold_2 | test_accuracy_fold_3 | test_accuracy_fold_4 | |||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| pipeline_scaler | pipeline_reduce_dim | pipeline_clf | |||||||||||||||||
| StandardScaler | SelectKBest | KNeighborsClassifier | [195, 17, 6, 351] | [[62, 6, 1, 45], [46, 3, 1, 64], [35, 5, 0, 74... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... | [114, 115, 116, 117, 118, 119, 120, 121, 122, ... | [[114, 115, 116, 117, 118, 119, 120, 121, 122,... | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... | [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... | 0.959556 | 0.018127 | 0.938596 | 0.964912 | 0.956140 | 0.991228 | 0.946903 |
| DecisionTreeClassifier | [184, 28, 22, 335] | [[54, 14, 2, 44], [43, 6, 4, 61], [37, 3, 1, 7... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... | [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,... | [114, 115, 116, 117, 118, 119, 120, 121, 122, ... | [[114, 115, 116, 117, 118, 119, 120, 121, 122,... | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... | [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... | 0.912079 | 0.037361 | 0.859649 | 0.912281 | 0.964912 | 0.938596 | 0.884956 | ||
| RFE | KNeighborsClassifier | [201, 11, 11, 346] | [[64, 4, 0, 46], [47, 2, 5, 60], [37, 3, 1, 73... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... | [114, 115, 116, 117, 118, 119, 120, 121, 122, ... | [[114, 115, 116, 117, 118, 119, 120, 121, 122,... | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... | [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... | 0.961326 | 0.014282 | 0.964912 | 0.938596 | 0.964912 | 0.982456 | 0.955752 | |
| DecisionTreeClassifier | [177, 35, 15, 342] | [[43, 25, 0, 46], [46, 3, 6, 59], [38, 2, 2, 7... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... | [0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, ... | [[0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1,... | [114, 115, 116, 117, 118, 119, 120, 121, 122, ... | [[114, 115, 116, 117, 118, 119, 120, 121, 122,... | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... | [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... | 0.912141 | 0.069276 | 0.780702 | 0.921053 | 0.964912 | 0.973684 | 0.920354 | ||
| MinMaxScaler | SelectKBest | KNeighborsClassifier | [198, 14, 9, 348] | [[63, 5, 2, 44], [46, 3, 1, 64], [36, 4, 0, 74... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... | [114, 115, 116, 117, 118, 119, 120, 121, 122, ... | [[114, 115, 116, 117, 118, 119, 120, 121, 122,... | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... | [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... | 0.959571 | 0.011923 | 0.938596 | 0.964912 | 0.964912 | 0.973684 | 0.955752 |
| DecisionTreeClassifier | [182, 30, 24, 333] | [[54, 14, 2, 44], [40, 9, 6, 59], [36, 4, 1, 7... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... | [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,... | [114, 115, 116, 117, 118, 119, 120, 121, 122, ... | [[114, 115, 116, 117, 118, 119, 120, 121, 122,... | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... | [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... | 0.905123 | 0.036065 | 0.859649 | 0.868421 | 0.956140 | 0.921053 | 0.920354 | ||
| RFE | KNeighborsClassifier | [199, 13, 10, 347] | [[61, 7, 2, 44], [47, 2, 3, 62], [37, 3, 1, 73... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,... | [114, 115, 116, 117, 118, 119, 120, 121, 122, ... | [[114, 115, 116, 117, 118, 119, 120, 121, 122,... | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... | [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... | 0.959603 | 0.021168 | 0.921053 | 0.956140 | 0.964912 | 0.982456 | 0.973451 | |
| DecisionTreeClassifier | [186, 26, 14, 343] | [[54, 14, 2, 44], [44, 5, 3, 62], [36, 4, 1, 7... | [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... | [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, ... | [[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,... | [114, 115, 116, 117, 118, 119, 120, 121, 122, ... | [[114, 115, 116, 117, 118, 119, 120, 121, 122,... | [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... | [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13... | 0.929731 | 0.036335 | 0.859649 | 0.929825 | 0.956140 | 0.956140 | 0.946903 |
List of Pipeline objects for the best pipeline for each evaluated pipeline combination.
[9]:
pipeline_permuter.best_estimator_summary()
[9]:
| best_estimator | |||
|---|---|---|---|
| pipeline_scaler | pipeline_reduce_dim | pipeline_clf | |
| StandardScaler | SelectKBest | KNeighborsClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
| DecisionTreeClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... | ||
| RFE | KNeighborsClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... | |
| DecisionTreeClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... | ||
| MinMaxScaler | SelectKBest | KNeighborsClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
| DecisionTreeClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... | ||
| RFE | KNeighborsClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... | |
| DecisionTreeClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
Mean Performance Scores for Individual Hyperparameter Combinations¶
The performance scores for each pipeline and parameter combinations, respectively, averaged over all outer CV folds using SklearnPipelinePermuter.mean_pipeline_score_results().
NOTE: * The summary of these pipelines does not necessarily correspond to the best-performing pipeline as returned by SklearnPipelinePermuter.metric_summary() or
SklearnPipelinePermuter.best_estimator_summary() because the best-performing pipelines are determined by averaging the best_estimator instances, as determined by scikit-learn, over all folds. Hence, all best_estimator instances can have a
different set of hyperparameters, whereas in this function, it is explicitely averaged over the same set of hyperparameters. * Thus, this function should only be used if you want to gain a deeper understanding of the different hyperparameter combinations and their performance. If you want to get the best-performing pipeline(s) to report in a paper, use
SklearnPipelinePermuter.metric_summary() or
SklearnPipelinePermuter.best_estimator_summary() instead.
[10]:
pipeline_permuter.mean_pipeline_score_results()
[10]:
| mean_test_accuracy | rank_test_accuracy | split0_test_accuracy | split1_test_accuracy | split2_test_accuracy | split3_test_accuracy | split4_test_accuracy | std_test_accuracy | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| mean | std | mean | std | mean | std | mean | std | mean | std | mean | std | mean | std | mean | std | ||||
| pipeline_scaler | pipeline_reduce_dim | pipeline_clf | parameter_combination_id | ||||||||||||||||
| StandardScaler | SelectKBest | KNeighborsClassifier | 11 | 0.959150 | 0.004603 | 1.2 | 0.447214 | 0.945198 | 0.007520 | 0.960440 | 0.022787 | 0.960440 | 0.014743 | 0.980220 | 0.009194 | 0.949451 | 0.028656 | 0.019473 | 0.005345 |
| MinMaxScaler | SelectKBest | KNeighborsClassifier | 11 | 0.959150 | 0.001991 | 1.2 | 0.447214 | 0.938605 | 0.005895 | 0.951648 | 0.012529 | 0.967033 | 0.017375 | 0.986813 | 0.004914 | 0.951648 | 0.016666 | 0.019545 | 0.004257 |
| RFE | KNeighborsClassifier | 11 | 0.958705 | 0.005482 | 1.2 | 0.447214 | 0.947372 | 0.004855 | 0.953846 | 0.009194 | 0.962637 | 0.016666 | 0.967033 | 0.010989 | 0.962637 | 0.016666 | 0.012098 | 0.002635 | |
| 8 | 0.956507 | 0.006313 | 2.0 | 1.000000 | 0.947372 | 0.009162 | 0.953846 | 0.009194 | 0.967033 | 0.015541 | 0.958242 | 0.014328 | 0.956044 | 0.021978 | 0.013115 | 0.003993 | |||
| StandardScaler | RFE | KNeighborsClassifier | 11 | 0.956073 | 0.007155 | 1.6 | 0.547723 | 0.940803 | 0.016598 | 0.958242 | 0.023824 | 0.958242 | 0.009194 | 0.971429 | 0.012529 | 0.951648 | 0.036114 | 0.020734 | 0.003697 |
| SelectKBest | KNeighborsClassifier | 8 | 0.955198 | 0.005524 | 2.0 | 0.707107 | 0.938629 | 0.012276 | 0.960440 | 0.014743 | 0.956044 | 0.017375 | 0.978022 | 0.010989 | 0.942857 | 0.025059 | 0.019446 | 0.004951 | |
| MinMaxScaler | SelectKBest | KNeighborsClassifier | 8 | 0.953430 | 0.007843 | 2.6 | 1.949359 | 0.940779 | 0.009899 | 0.947253 | 0.009194 | 0.964835 | 0.014328 | 0.978022 | 0.013459 | 0.936264 | 0.025059 | 0.019030 | 0.005669 |
| StandardScaler | RFE | KNeighborsClassifier | 5 | 0.952117 | 0.005728 | 2.8 | 0.836660 | 0.945198 | 0.015417 | 0.956044 | 0.017375 | 0.951648 | 0.014743 | 0.958242 | 0.014328 | 0.949451 | 0.032599 | 0.017652 | 0.003116 |
| 8 | 0.951672 | 0.007497 | 2.8 | 1.303840 | 0.942977 | 0.016322 | 0.953846 | 0.016299 | 0.958242 | 0.009194 | 0.956044 | 0.019034 | 0.947253 | 0.039925 | 0.019830 | 0.002716 | |||
| MinMaxScaler | RFE | KNeighborsClassifier | 7 | 0.950354 | 0.007385 | 3.2 | 1.483240 | 0.925394 | 0.023949 | 0.951648 | 0.024076 | 0.953846 | 0.021138 | 0.960440 | 0.006019 | 0.960440 | 0.012529 | 0.018607 | 0.009022 |
| SelectKBest | KNeighborsClassifier | 5 | 0.949474 | 0.003494 | 3.6 | 0.547723 | 0.938581 | 0.006186 | 0.949451 | 0.018388 | 0.953846 | 0.026236 | 0.969231 | 0.012038 | 0.936264 | 0.030493 | 0.020818 | 0.007043 | |
| StandardScaler | RFE | KNeighborsClassifier | 2 | 0.949470 | 0.007481 | 3.6 | 1.816590 | 0.958337 | 0.009168 | 0.956044 | 0.015541 | 0.958242 | 0.018057 | 0.938462 | 0.022787 | 0.936264 | 0.028444 | 0.017660 | 0.008590 |
| MinMaxScaler | RFE | KNeighborsClassifier | 5 | 0.947286 | 0.008802 | 4.2 | 1.303840 | 0.929838 | 0.009695 | 0.945055 | 0.023311 | 0.960440 | 0.009829 | 0.964835 | 0.014328 | 0.936264 | 0.028444 | 0.019275 | 0.005979 |
| StandardScaler | SelectKBest | KNeighborsClassifier | 5 | 0.946398 | 0.006357 | 3.8 | 0.836660 | 0.936383 | 0.014422 | 0.953846 | 0.012038 | 0.949451 | 0.026465 | 0.958242 | 0.009194 | 0.934066 | 0.026917 | 0.017432 | 0.008160 |
| MinMaxScaler | RFE | KNeighborsClassifier | 10 | 0.945079 | 0.011841 | 5.6 | 2.073644 | 0.907812 | 0.031836 | 0.940659 | 0.022787 | 0.947253 | 0.021138 | 0.969231 | 0.012038 | 0.960440 | 0.012529 | 0.025049 | 0.009478 |
| 2 | 0.944195 | 0.009172 | 5.2 | 1.643168 | 0.938557 | 0.012698 | 0.949451 | 0.019963 | 0.971429 | 0.006019 | 0.938462 | 0.027582 | 0.923077 | 0.026917 | 0.022622 | 0.003527 | |||
| SelectKBest | KNeighborsClassifier | 2 | 0.944190 | 0.007720 | 4.8 | 1.303840 | 0.947324 | 0.014452 | 0.945055 | 0.021978 | 0.951648 | 0.027582 | 0.962637 | 0.018388 | 0.914286 | 0.026236 | 0.023213 | 0.009375 | |
| StandardScaler | SelectKBest | KNeighborsClassifier | 2 | 0.943311 | 0.009131 | 5.4 | 1.949359 | 0.958313 | 0.009281 | 0.949451 | 0.018388 | 0.951648 | 0.027582 | 0.962637 | 0.012529 | 0.894505 | 0.032599 | 0.029359 | 0.009596 |
| MinMaxScaler | SelectKBest | KNeighborsClassifier | 7 | 0.942451 | 0.013020 | 3.8 | 1.303840 | 0.905662 | 0.016889 | 0.938462 | 0.036114 | 0.962637 | 0.012529 | 0.953846 | 0.014328 | 0.951648 | 0.018388 | 0.024840 | 0.003868 |
| StandardScaler | RFE | KNeighborsClassifier | 7 | 0.941128 | 0.015268 | 4.0 | 2.000000 | 0.923220 | 0.031147 | 0.945055 | 0.021978 | 0.953846 | 0.018057 | 0.942857 | 0.035946 | 0.940659 | 0.031659 | 0.021137 | 0.012182 |
| SelectKBest | KNeighborsClassifier | 7 | 0.938939 | 0.011950 | 3.8 | 1.303840 | 0.905686 | 0.012662 | 0.931868 | 0.029487 | 0.956044 | 0.019034 | 0.951648 | 0.016666 | 0.949451 | 0.016666 | 0.022968 | 0.004169 | |
| RFE | KNeighborsClassifier | 10 | 0.938930 | 0.016775 | 4.8 | 1.643168 | 0.916627 | 0.024208 | 0.945055 | 0.010989 | 0.931868 | 0.023824 | 0.956044 | 0.032038 | 0.945055 | 0.036446 | 0.022850 | 0.004537 | |
| MinMaxScaler | RFE | KNeighborsClassifier | 1 | 0.936732 | 0.007675 | 7.2 | 2.049390 | 0.925418 | 0.016399 | 0.945055 | 0.038067 | 0.936264 | 0.028444 | 0.938462 | 0.012529 | 0.938462 | 0.016666 | 0.021214 | 0.004668 |
| StandardScaler | SelectKBest | KNeighborsClassifier | 10 | 0.936302 | 0.011498 | 4.8 | 1.095445 | 0.892499 | 0.016592 | 0.920879 | 0.033331 | 0.949451 | 0.009829 | 0.967033 | 0.007770 | 0.951648 | 0.012529 | 0.028909 | 0.006367 |
| MinMaxScaler | SelectKBest | KNeighborsClassifier | 10 | 0.934988 | 0.013670 | 5.6 | 1.140175 | 0.883731 | 0.020204 | 0.923077 | 0.025772 | 0.951648 | 0.012529 | 0.964835 | 0.012038 | 0.951648 | 0.016666 | 0.030625 | 0.004991 |
| RFE | KNeighborsClassifier | 4 | 0.934553 | 0.014500 | 7.2 | 2.280351 | 0.896942 | 0.016556 | 0.931868 | 0.025059 | 0.927473 | 0.029691 | 0.964835 | 0.019658 | 0.951648 | 0.016666 | 0.027440 | 0.002901 | |
| DecisionTreeClassifier | 0 | 0.933669 | 0.009354 | 1.6 | 0.547723 | 0.912303 | 0.013246 | 0.925275 | 0.027362 | 0.949451 | 0.016666 | 0.958242 | 0.023824 | 0.923077 | 0.025772 | 0.023792 | 0.008042 | ||
| KNeighborsClassifier | 6 | 0.931472 | 0.009943 | 8.4 | 2.073644 | 0.912303 | 0.015357 | 0.938462 | 0.016666 | 0.945055 | 0.010989 | 0.925275 | 0.014328 | 0.936264 | 0.026236 | 0.016603 | 0.005262 | ||
| SelectKBest | KNeighborsClassifier | 1 | 0.929704 | 0.014237 | 6.6 | 2.302173 | 0.918849 | 0.009929 | 0.920879 | 0.043542 | 0.949451 | 0.029691 | 0.923077 | 0.034750 | 0.936264 | 0.032413 | 0.027202 | 0.008209 | |
| RFE | DecisionTreeClassifier | 1 | 0.929279 | 0.009507 | 1.4 | 0.547723 | 0.894744 | 0.029651 | 0.920879 | 0.021138 | 0.949451 | 0.022787 | 0.947253 | 0.023824 | 0.934066 | 0.013459 | 0.024440 | 0.013629 | |
| KNeighborsClassifier | 9 | 0.928844 | 0.011586 | 8.6 | 1.673320 | 0.901362 | 0.016965 | 0.931868 | 0.018057 | 0.931868 | 0.018057 | 0.936264 | 0.004914 | 0.942857 | 0.012038 | 0.016126 | 0.004329 | ||
| StandardScaler | SelectKBest | KNeighborsClassifier | 1 | 0.927946 | 0.014775 | 7.4 | 1.816590 | 0.918849 | 0.006181 | 0.923077 | 0.047266 | 0.942857 | 0.034225 | 0.923077 | 0.034750 | 0.931868 | 0.025059 | 0.026160 | 0.008606 |
| RFE | KNeighborsClassifier | 4 | 0.926613 | 0.012894 | 7.2 | 0.447214 | 0.907788 | 0.035432 | 0.934066 | 0.017375 | 0.912088 | 0.036446 | 0.942857 | 0.023824 | 0.936264 | 0.022521 | 0.025616 | 0.007075 | |
| 1 | 0.923540 | 0.013857 | 8.0 | 0.707107 | 0.942977 | 0.019676 | 0.912088 | 0.021978 | 0.918681 | 0.029691 | 0.927473 | 0.033512 | 0.916484 | 0.034401 | 0.023453 | 0.009031 | |||
| SelectKBest | KNeighborsClassifier | 4 | 0.923125 | 0.017392 | 7.8 | 0.836660 | 0.894744 | 0.005821 | 0.916484 | 0.027582 | 0.923077 | 0.033870 | 0.940659 | 0.024076 | 0.940659 | 0.034401 | 0.024832 | 0.005817 | |
| DecisionTreeClassifier | 1 | 0.921366 | 0.006643 | 1.4 | 0.547723 | 0.896942 | 0.014619 | 0.909890 | 0.028444 | 0.929670 | 0.019963 | 0.947253 | 0.023824 | 0.923077 | 0.013459 | 0.023829 | 0.006901 | ||
| MinMaxScaler | SelectKBest | KNeighborsClassifier | 4 | 0.920487 | 0.018497 | 8.2 | 1.095445 | 0.888151 | 0.009281 | 0.914286 | 0.029487 | 0.925275 | 0.031468 | 0.938462 | 0.022787 | 0.936264 | 0.034225 | 0.025134 | 0.003906 |
| DecisionTreeClassifier | 1 | 0.920038 | 0.026502 | 1.2 | 0.447214 | 0.885905 | 0.059082 | 0.907692 | 0.019963 | 0.931868 | 0.021138 | 0.947253 | 0.026236 | 0.927473 | 0.036940 | 0.028544 | 0.010968 | ||
| RFE | KNeighborsClassifier | 3 | 0.918724 | 0.013124 | 10.8 | 0.447214 | 0.907907 | 0.014631 | 0.916484 | 0.025299 | 0.920879 | 0.030493 | 0.912088 | 0.015541 | 0.936264 | 0.012038 | 0.016863 | 0.005096 | |
| StandardScaler | RFE | DecisionTreeClassifier | 1 | 0.914792 | 0.011007 | 1.4 | 0.547723 | 0.879455 | 0.041521 | 0.907692 | 0.035268 | 0.923077 | 0.032038 | 0.940659 | 0.026465 | 0.923077 | 0.023311 | 0.032693 | 0.010808 |
| SelectKBest | KNeighborsClassifier | 6 | 0.913908 | 0.011346 | 9.0 | 0.707107 | 0.870640 | 0.037458 | 0.905495 | 0.053044 | 0.940659 | 0.019963 | 0.927473 | 0.016666 | 0.925275 | 0.012038 | 0.033909 | 0.013583 | |
| MinMaxScaler | SelectKBest | KNeighborsClassifier | 6 | 0.912150 | 0.011100 | 8.8 | 0.447214 | 0.872838 | 0.040682 | 0.907692 | 0.044366 | 0.934066 | 0.013459 | 0.927473 | 0.012529 | 0.918681 | 0.012529 | 0.029312 | 0.014397 |
| StandardScaler | RFE | DecisionTreeClassifier | 0 | 0.908247 | 0.020262 | 1.6 | 0.547723 | 0.840134 | 0.075407 | 0.883516 | 0.012529 | 0.931868 | 0.022521 | 0.962637 | 0.018388 | 0.923077 | 0.038852 | 0.047392 | 0.025852 |
| MinMaxScaler | RFE | KNeighborsClassifier | 0 | 0.906866 | 0.011413 | 12.0 | 0.000000 | 0.907955 | 0.016103 | 0.925275 | 0.034225 | 0.912088 | 0.010989 | 0.890110 | 0.032967 | 0.898901 | 0.027362 | 0.021153 | 0.012873 |
| StandardScaler | SelectKBest | KNeighborsClassifier | 9 | 0.906455 | 0.020163 | 9.2 | 1.303840 | 0.848758 | 0.049647 | 0.890110 | 0.050954 | 0.929670 | 0.021422 | 0.938462 | 0.012529 | 0.925275 | 0.021138 | 0.039798 | 0.014725 |
| MinMaxScaler | SelectKBest | KNeighborsClassifier | 9 | 0.903817 | 0.019961 | 9.4 | 1.341641 | 0.850956 | 0.052693 | 0.892308 | 0.052930 | 0.918681 | 0.022787 | 0.936264 | 0.009194 | 0.920879 | 0.021138 | 0.038196 | 0.015052 |
| StandardScaler | RFE | KNeighborsClassifier | 6 | 0.902494 | 0.022087 | 8.8 | 1.095445 | 0.868514 | 0.036708 | 0.896703 | 0.030691 | 0.931868 | 0.030493 | 0.923077 | 0.034750 | 0.892308 | 0.035946 | 0.031191 | 0.009188 |
| SelectKBest | DecisionTreeClassifier | 0 | 0.902069 | 0.031317 | 1.6 | 0.547723 | 0.835619 | 0.065100 | 0.898901 | 0.043542 | 0.945055 | 0.019034 | 0.929670 | 0.025299 | 0.901099 | 0.028017 | 0.040945 | 0.016336 | |
| MinMaxScaler | SelectKBest | DecisionTreeClassifier | 0 | 0.902045 | 0.014146 | 1.8 | 0.447214 | 0.853082 | 0.045653 | 0.903297 | 0.026236 | 0.929670 | 0.024076 | 0.912088 | 0.028017 | 0.912088 | 0.032038 | 0.033928 | 0.015318 |
| StandardScaler | RFE | KNeighborsClassifier | 9 | 0.897664 | 0.021655 | 9.8 | 0.447214 | 0.833373 | 0.027008 | 0.890110 | 0.043264 | 0.925275 | 0.036776 | 0.931868 | 0.023824 | 0.907692 | 0.026465 | 0.040752 | 0.007431 |
| SelectKBest | KNeighborsClassifier | 3 | 0.889756 | 0.023370 | 11.2 | 0.447214 | 0.844386 | 0.045090 | 0.863736 | 0.059483 | 0.925275 | 0.030493 | 0.909890 | 0.014328 | 0.905495 | 0.019963 | 0.039302 | 0.010323 | |
| MinMaxScaler | SelectKBest | KNeighborsClassifier | 3 | 0.889756 | 0.022528 | 11.4 | 0.547723 | 0.846584 | 0.044116 | 0.865934 | 0.061381 | 0.918681 | 0.033512 | 0.909890 | 0.014328 | 0.907692 | 0.016666 | 0.038451 | 0.010350 |
| StandardScaler | RFE | KNeighborsClassifier | 3 | 0.888887 | 0.021979 | 11.0 | 0.000000 | 0.831247 | 0.041569 | 0.885714 | 0.041555 | 0.916484 | 0.041555 | 0.905495 | 0.032599 | 0.905495 | 0.027582 | 0.040177 | 0.007849 |
| MinMaxScaler | SelectKBest | KNeighborsClassifier | 0 | 0.888404 | 0.020371 | 11.6 | 0.547723 | 0.892570 | 0.039055 | 0.857143 | 0.065475 | 0.916484 | 0.026465 | 0.885714 | 0.018388 | 0.890110 | 0.032038 | 0.033604 | 0.014043 |
| StandardScaler | SelectKBest | KNeighborsClassifier | 0 | 0.888404 | 0.018839 | 11.6 | 0.547723 | 0.890373 | 0.037173 | 0.859341 | 0.061381 | 0.912088 | 0.021978 | 0.887912 | 0.018057 | 0.892308 | 0.031468 | 0.031946 | 0.010743 |
| RFE | KNeighborsClassifier | 0 | 0.877000 | 0.027891 | 12.0 | 0.000000 | 0.859723 | 0.020345 | 0.872527 | 0.048277 | 0.901099 | 0.040376 | 0.879121 | 0.030095 | 0.872527 | 0.035268 | 0.023341 | 0.008768 | |
Best Hyperparameter Pipeline¶
The pipeline with the hyperparameter combination which achieved the highest average test score over all outer CV folds (i.e., the parameter combination which represents the first row of mean_pipeline_score_results()).
NOTE: * The summary of these pipelines does not necessarily correspond to the best-performing pipeline as returned by SklearnPipelinePermuter.metric_summary() or
SklearnPipelinePermuter.best_estimator_summary() because the best-performing pipelines are determined by averaging the best_estimator instances, as determined by scikit-learn, over all folds. Hence, all best_estimator instances can have a
different set of hyperparameters, whereas in this function, it is explicitely averaged over the same set of hyperparameters. * Thus, this function should only be used if you want to gain a deeper understanding of the different hyperparameter combinations and their performance. If you want to get the best-performing pipeline(s) to report in a paper, use
SklearnPipelinePermuter.metric_summary() or
SklearnPipelinePermuter.best_estimator_summary() instead.
[11]:
pipeline_permuter.best_hyperparameter_pipeline()
[11]:
| mean_test_accuracy | param_clf__n_neighbors | param_clf__weights | param_reduce_dim__k | params | rank_test_accuracy | split0_test_accuracy | split1_test_accuracy | split2_test_accuracy | split3_test_accuracy | split4_test_accuracy | std_test_accuracy | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| outer_fold | ||||||||||||
| 0 | 0.953846 | 4.0 | distance | all | {'clf__n_neighbors': 4, 'clf__weights': 'dista... | 2 | 0.956044 | 0.923077 | 0.967033 | 0.967033 | 0.956044 | 0.016150 |
| 1 | 0.956044 | 4.0 | distance | all | {'clf__n_neighbors': 4, 'clf__weights': 'dista... | 1 | 0.945055 | 0.956044 | 0.978022 | 0.989011 | 0.912088 | 0.026917 |
| 2 | 0.962637 | 4.0 | distance | all | {'clf__n_neighbors': 4, 'clf__weights': 'dista... | 1 | 0.945055 | 0.978022 | 0.967033 | 0.989011 | 0.934066 | 0.020382 |
| 3 | 0.958242 | 4.0 | distance | all | {'clf__n_neighbors': 4, 'clf__weights': 'dista... | 1 | 0.945055 | 0.967033 | 0.945055 | 0.978022 | 0.956044 | 0.012815 |
| 4 | 0.964978 | 4.0 | distance | all | {'clf__n_neighbors': 4, 'clf__weights': 'dista... | 1 | 0.934783 | 0.978022 | 0.945055 | 0.978022 | 0.989011 | 0.021102 |
Regression¶
Load Example Dataset¶
[12]:
diabetes_data = load_diabetes()
X_reg = diabetes_data.data
y_reg = diabetes_data.target
Specify Estimator Combinations and Parameters for Hyperparameter Search¶
[13]:
model_dict_reg = {
"scaler": {"StandardScaler": StandardScaler(), "MinMaxScaler": MinMaxScaler()},
"reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVR(kernel="linear", C=1))},
"clf": {
"KNeighborsRegressor": KNeighborsRegressor(),
"DecisionTreeRegressor": DecisionTreeRegressor(),
# "SVR": SVR(),
# "AdaBoostRegressor": AdaBoostRegressor(),
},
}
[14]:
params_dict_reg = {
"StandardScaler": None,
"MinMaxScaler": None,
"SelectKBest": {"k": [2, 4, "all"]},
"RFE": {"n_features_to_select": [2, 4]},
"KNeighborsRegressor": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
"DecisionTreeRegressor": {"max_depth": [2, 4]},
# "SVR": [
# {
# "kernel": ["linear"],
# "C": np.logspace(start=-2, stop=2, num=5)
# },
# {
# "kernel": ["rbf"],
# "C": np.logspace(start=-2, stop=2, num=5),
# "gamma": np.logspace(start=-2, stop=2, num=5)
# }
# ],
# "AdaBoostRegressor": {
# "base_estimator": [DecisionTreeClassifier(max_depth=1)],
# "n_estimators": np.arange(20, 110, 10),
# "learning_rate": np.arange(0.6, 1.1, 0.1)
# },
}
# use randomized-search for decision tree classifier, use grid-search (the default) for all other estimators
hyper_search_dict_reg = {"DecisionTreeRegressor": {"search_method": "random", "n_iter": 2}}
Setup PipelinePermuter and Cross-Validations for Model Evaluation¶
Note: For further information please visit the documentatin of SklearnPipelinePermuter.
[15]:
pipeline_permuter_regression = SklearnPipelinePermuter(
model_dict_reg, params_dict_reg, hyper_search_dict=hyper_search_dict_reg
)
[16]:
outer_cv = KFold(5)
inner_cv = KFold(5)
pipeline_permuter_regression.fit(X_reg, y_reg, outer_cv=outer_cv, inner_cv=inner_cv, scoring="r2")
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance'], 'reduce_dim__k': [2, 4, 'all']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'clf__max_depth': [2, 4], 'reduce_dim__k': [2, 4, 'all']}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance'], 'reduce_dim__n_features_to_select': [2, 4]}
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'clf__max_depth': [2, 4], 'reduce_dim__n_features_to_select': [2, 4]}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance'], 'reduce_dim__k': [2, 4, 'all']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 12 candidates, totalling 60 fits
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'clf__max_depth': [2, 4], 'reduce_dim__k': [2, 4, 'all']}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance'], 'reduce_dim__n_features_to_select': [2, 4]}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 8 candidates, totalling 40 fits
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeRegressor')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'random', 'n_iter': 2}): {'clf__max_depth': [2, 4], 'reduce_dim__n_features_to_select': [2, 4]}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Fitting 5 folds for each of 2 candidates, totalling 10 fits
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/classification/model_selection/nested_cv.py:151: UserWarning: Cannot compute confusion matrix for regression tasks.
warnings.warn("Cannot compute confusion matrix for regression tasks.")
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Display Results¶
This works analogously to the classification example.
Further Functions¶
Export Results as LaTeX Table¶
[17]:
print(pipeline_permuter.metric_summary_to_latex())
\begin{table}[ht!]
\centering
\sisetup{table-format = 2.1(2)}
\begin{tabular}{lllS}
\toprule
{} & {} & {} & {\makecell{Accuracy [\%]}} \\
{Scaler} & {\makecell[lc]{Feature\\ Selection}} & {Classifier} & {} \\
\midrule
\multirow[c]{4}{*}{Standard} & \multirow[c]{2}{*}{SkB} & kNN & 96.0(1.8) \\
& & DT & 91.2(3.7) \\
\cline{2-4}
& \multirow[c]{2}{*}{RFE} & kNN & 96.1(1.4) \\
& & DT & 91.2(6.9) \\
\cline{1-4} \cline{2-4}
\multirow[c]{4}{*}{Min-Max} & \multirow[c]{2}{*}{SkB} & kNN & 96.0(1.2) \\
& & DT & 90.5(3.6) \\
\cline{2-4}
& \multirow[c]{2}{*}{RFE} & kNN & 96.0(2.1) \\
& & DT & 93.0(3.6) \\
\cline{1-4} \cline{2-4}
\bottomrule
\end{tabular}
\end{table}
Save and Load PipelinePermuter results¶
Save to Pickle File¶
[18]:
pipeline_permuter.to_pickle(tmpdir.joinpath("test.pkl"))
Load from Pickle File¶
[19]:
pipeline_permuter_load = SklearnPipelinePermuter.from_pickle(tmpdir.joinpath("test.pkl"))
Fit pipeline combinations and save intermediate results¶
This saves the current state after successfully evaluating one pipeline combination.
[20]:
pipeline_permuter.fit_and_save_intermediate(
X=X, y=y, outer_cv=outer_cv, inner_cv=inner_cv, file_path=tmpdir.joinpath("test.pkl")
)
Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) since this combination was already fitted!
Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) since this combination was already fitted!
Skipping (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
Skipping (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) since this combination was already fitted!
Skipping (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
Skipping (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) since this combination was already fitted!
Merge multiple PipelinePermuter instances¶
In the case the evaluation of different classification pipelines had to be split (e.g., due to runtime reasons), the PipelinePermuter instances can be saved separately and afterwards merged back into one joint PipelinePermuter instance.
PipelinePermuter instances * Loading saved PipelinePermuter instances from disk * Merging multiple PipelinePermuter instances into one instance for joint evaluationLoad Example Dataset¶
[21]:
breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target
Fit and Save Different PipelinePermuter instances¶
[22]:
model_dict_01 = {
"scaler": {"StandardScaler": StandardScaler()},
"reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
"clf": {
"KNeighborsClassifier": KNeighborsClassifier(),
},
}
params_dict_01 = {
"StandardScaler": None,
"SelectKBest": {"k": [2, 4, "all"]},
"RFE": {"n_features_to_select": [2, 4, None]},
"KNeighborsClassifier": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
}
pipeline_permuter_01 = SklearnPipelinePermuter(model_dict_01, params_dict_01, random_state=42)
pipeline_permuter_01.fit(X, y, outer_cv=KFold(5), inner_cv=KFold(5), verbose=0)
pipeline_permuter_01.to_pickle(tmpdir.joinpath("permuter_01.pkl"))
[23]:
model_dict_02 = {
"scaler": {"MinMaxScaler": MinMaxScaler()},
"reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
"clf": {
"KNeighborsClassifier": KNeighborsClassifier(),
},
}
params_dict_02 = {
"MinMaxScaler": None,
"SelectKBest": {"k": [2, 4, "all"]},
"RFE": {"n_features_to_select": [2, 4, None]},
"KNeighborsClassifier": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
}
pipeline_permuter_02 = SklearnPipelinePermuter(model_dict_02, params_dict_02, random_state=42)
pipeline_permuter_02.fit(X, y, outer_cv=KFold(5), inner_cv=KFold(5), verbose=0)
pipeline_permuter_02.to_pickle(tmpdir.joinpath("permuter_02.pkl"))
[24]:
model_dict_03 = {
"scaler": {"StandardScaler": StandardScaler(), "MinMaxScaler": MinMaxScaler()},
"reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
"clf": {
"DecisionTreeClassifier": DecisionTreeClassifier(),
},
}
params_dict_03 = {
"StandardScaler": None,
"MinMaxScaler": None,
"SelectKBest": {"k": [2, 4, "all"]},
"RFE": {"n_features_to_select": [2, 4, None]},
"DecisionTreeClassifier": {"criterion": ["gini", "entropy"], "max_depth": [2, 4]},
}
pipeline_permuter_03 = SklearnPipelinePermuter(model_dict_03, params_dict_03, random_state=42)
pipeline_permuter_03.fit(X, y, outer_cv=KFold(5), inner_cv=KFold(5), verbose=0)
pipeline_permuter_03.to_pickle(tmpdir.joinpath("permuter_03.pkl"))
Load and Merge PipelinePermuter instances¶
[25]:
permuter_file_list = sorted(tmpdir.glob("permuter_*.pkl"))
print(permuter_file_list)
[PosixPath('tmpdir/permuter_01.pkl'), PosixPath('tmpdir/permuter_02.pkl'), PosixPath('tmpdir/permuter_03.pkl')]
[26]:
permuter_list = [SklearnPipelinePermuter.from_pickle(p) for p in permuter_file_list]
permuter_list
[26]:
[<biopsykit.classification.model_selection.sklearn_pipeline_permuter.SklearnPipelinePermuter at 0x7b4746c11270>,
<biopsykit.classification.model_selection.sklearn_pipeline_permuter.SklearnPipelinePermuter at 0x7b4746c10790>,
<biopsykit.classification.model_selection.sklearn_pipeline_permuter.SklearnPipelinePermuter at 0x7b4736f4da80>]
[27]:
merged_permuter = SklearnPipelinePermuter.merge_permuter_instances(permuter_list)
Double-check if permuters were correcrtly merged:
[28]:
for p in permuter_list:
display(p.best_estimator_summary())
| best_estimator | |||
|---|---|---|---|
| pipeline_scaler | pipeline_reduce_dim | pipeline_clf | |
| StandardScaler | SelectKBest | KNeighborsClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
| RFE | KNeighborsClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
| best_estimator | |||
|---|---|---|---|
| pipeline_scaler | pipeline_reduce_dim | pipeline_clf | |
| MinMaxScaler | SelectKBest | KNeighborsClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
| RFE | KNeighborsClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
| best_estimator | |||
|---|---|---|---|
| pipeline_scaler | pipeline_reduce_dim | pipeline_clf | |
| StandardScaler | SelectKBest | DecisionTreeClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
| RFE | DecisionTreeClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... | |
| MinMaxScaler | SelectKBest | DecisionTreeClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
| RFE | DecisionTreeClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
[29]:
merged_permuter.best_estimator_summary()
[29]:
| best_estimator | |||
|---|---|---|---|
| pipeline_scaler | pipeline_reduce_dim | pipeline_clf | |
| StandardScaler | SelectKBest | KNeighborsClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
| RFE | KNeighborsClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... | |
| MinMaxScaler | SelectKBest | KNeighborsClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
| RFE | KNeighborsClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... | |
| StandardScaler | SelectKBest | DecisionTreeClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
| RFE | DecisionTreeClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... | |
| MinMaxScaler | SelectKBest | DecisionTreeClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
| RFE | DecisionTreeClassifier | [Pipeline(memory=Memory(location=cachedir/jobl... |
Updated partially fitted SklearnPipelinePermuter with additional Parameters¶
For this example, we perform an experiment using a partial hyperparameter set. We save this object as pickle file, load it in the next step, update the parameter sets, and continue with our experiments. This is useful for incremental experiments without having to run multiple experiments and merge different SklearnPipelinePermuter instances.
[30]:
model_dict_partial = {
"scaler": {"StandardScaler": StandardScaler()},
"reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
"clf": {
"KNeighborsClassifier": KNeighborsClassifier(),
},
}
params_dict_partial = {
"StandardScaler": None,
"SelectKBest": {"k": [2, 4, "all"]},
"RFE": {"n_features_to_select": [2, 4, None]},
"KNeighborsClassifier": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
}
pipeline_permuter_partial = SklearnPipelinePermuter(model_dict_partial, params_dict_partial, random_state=42)
pipeline_permuter_partial.fit(X, y, outer_cv=KFold(5), inner_cv=KFold(5))
pipeline_permuter_partial.to_pickle(tmpdir.joinpath("permuter_partial.pkl"))
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance'], 'reduce_dim__k': [2, 4, 'all']}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__n_neighbors': [2, 4], 'clf__weights': ['uniform', 'distance'], 'reduce_dim__n_features_to_select': [2, 4, None]}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
[31]:
model_dict_total = {
"scaler": {"StandardScaler": StandardScaler(), "MinMaxScaler": MinMaxScaler()},
"reduce_dim": {"SelectKBest": SelectKBest(), "RFE": RFE(SVC(kernel="linear", C=1))},
"clf": {
"KNeighborsClassifier": KNeighborsClassifier(),
"DecisionTreeClassifier": DecisionTreeClassifier(),
},
}
params_dict_total = {
"StandardScaler": None,
"MinMaxScaler": None,
"SelectKBest": {"k": [2, 4, "all"]},
"RFE": {"n_features_to_select": [2, 4, None]},
"KNeighborsClassifier": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
"DecisionTreeClassifier": {"criterion": ["gini", "entropy"], "max_depth": [2, 4]},
}
[32]:
pipeline_permuter_total = SklearnPipelinePermuter.from_pickle(tmpdir.joinpath("permuter_partial.pkl"))
pipeline_permuter_total = pipeline_permuter_total.update_permuter(model_dict_total, params_dict_total)
[33]:
pipeline_permuter_total.fit(X, y, outer_cv=KFold(5), inner_cv=KFold(5))
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4], 'reduce_dim__n_features_to_select': [None, 2, 4]}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4], 'reduce_dim__k': [2, 'all', 4]}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Skipping (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) since this combination was already fitted!
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__n_neighbors': [2, 4], 'clf__weights': ['distance', 'uniform'], 'reduce_dim__k': [2, 'all', 4]}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'RFE'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4], 'reduce_dim__n_features_to_select': [None, 2, 4]}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
### Running hyperparameter search for pipeline: (('scaler', 'MinMaxScaler'), ('reduce_dim', 'RFE'), ('clf', 'KNeighborsClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__n_neighbors': [2, 4], 'clf__weights': ['distance', 'uniform'], 'reduce_dim__n_features_to_select': [None, 2, 4]}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
### Running hyperparameter search for pipeline: (('scaler', 'StandardScaler'), ('reduce_dim', 'SelectKBest'), ('clf', 'DecisionTreeClassifier')) with 1 parameter grid(s):
Parameter grid #0 ({'search_method': 'grid'}): {'clf__criterion': ['gini', 'entropy'], 'clf__max_depth': [2, 4], 'reduce_dim__k': [2, 'all', 4]}
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Fitting 5 folds for each of 12 candidates, totalling 60 fits
Cleanup¶
[34]:
rmtree(tmpdir)