biopsykit.classification.analysis package¶

Functions to analyze classification results.

biopsykit.classification.analysis.predictions_as_df(pipeline_permuter, data, pipeline, label_mapping=None, index_col=None)[source]¶

Get predictions from a specified pipeline and merge them with the index of the input dataframe.

Parameters

pipeline_permuter (SklearnPipelinePermuter) – SklearnPipelinePermuter instance
data (DataFrame) – input data
pipeline (tuple) – pipeline to get predictions from
label_mapping (dict, optional) – mapping of labels to rename labels in the output dataframe or None to keep original labels. Default: None
index_col (str, optional) – name of the index column to merge the predictions with. If data has a multi-index, the first level is used unless index_col is specified. Default: None

Returns

predictions as dataframe

Return type

DataFrame

biopsykit.classification.analysis.predict_proba_from_estimator(pipeline_permuter, data, pipeline, label_col='label', column_names=None)[source]¶

Get predictions as probabilities from a specified pipeline and merge them with the index of the input dataframe.

Parameters

pipeline_permuter (SklearnPipelinePermuter) – SklearnPipelinePermuter instance
data (DataFrame) – input data
pipeline (tuple) – pipeline to get predictions from
label_col (str, optional) – name of the label column in the input dataframe. Default: "label"
column_names (dict, optional) – mapping of column names to rename columns in the output dataframe or None to keep original column names. Default: None

Returns

dataframe with predictions as probabilities

Return type

DataFrame

biopsykit.classification.analysis.plot_conf_matrix(predictions, labels, label_name='label', conf_matrix_kwargs=None, **kwargs)[source]¶

Plot confusion matrix from predictions.

Parameters

predictions (DataFrame) – dataframe with predictions
labels (list, dict, optional) – list of labels to use in the confusion matrix or dictionary with label names in the data frame as key and the corresponding label names to use in the confusion matrix as value. Default: None to use the labels in the data frame in the order they appear
label_name (str, optional) – name of the ‘label’ in the axis titles. Default: “label” to yield “True label” and “Predicted label”
conf_matrix_kwargs (dict, optional) – additional keyword arguments to pass to from_predictions()
**kwargs – additional keyword arguments to pass to plt.subplots()

Return type

Tuple[matplotlib.figure.Figure, matplotlib.axes._axes.Axes]

biopsykit.classification.analysis.plot_conf_matrix_proba(predictions, labels, label_col='label', label_name='label', **kwargs)[source]¶

Plot confusion matrix from prediction probabilities.

Parameters

predictions (DataFrame) – dataframe with predictions as probabilities
labels (list) – list of labels
label_col (str, optional) – name of the label column in the input dataframe. Default: "label"
label_name (str, optional) – name of the ‘label’ in the axis titles. Default: “label” to yield “True label” and “Predicted label”
**kwargs – additional keyword arguments to pass to plt.subplots()

Return type

Tuple[matplotlib.figure.Figure, matplotlib.axes._axes.Axes]

biopsykit.classification.analysis.metric_summary_to_latex(permuter_or_df, metrics=None, pipeline_steps=None, si_table_format=None, highlight_best=None, **kwargs)[source]¶

Return a latex table with the performance metrics of the pipeline combinations.

Notes

This method is a legacy method that is kept for backwards compatibility with older pickled instances of the SklearnPipelinePermuter class. It is recommended to use the SklearnPipelinePermuter.metric_summary() method instead.