biopsykit.classification.analysis package¶
Functions to analyze classification results.
- biopsykit.classification.analysis.predictions_as_df(pipeline_permuter, data, pipeline, label_mapping=None, index_col=None)[source]¶
Get predictions from a specified pipeline and merge them with the index of the input dataframe.
- Parameters
pipeline_permuter (
SklearnPipelinePermuter
) –SklearnPipelinePermuter
instancedata (
DataFrame
) – input datapipeline (tuple) – pipeline to get predictions from
label_mapping (dict, optional) – mapping of labels to rename labels in the output dataframe or
None
to keep original labels. Default:None
index_col (str, optional) – name of the index column to merge the predictions with. If
data
has a multi-index, the first level is used unlessindex_col
is specified. Default:None
- Returns
predictions as dataframe
- Return type
- biopsykit.classification.analysis.predict_proba_from_estimator(pipeline_permuter, data, pipeline, label_col='label', column_names=None)[source]¶
Get predictions as probabilities from a specified pipeline and merge them with the index of the input dataframe.
- Parameters
pipeline_permuter (
SklearnPipelinePermuter
) –SklearnPipelinePermuter
instancedata (
DataFrame
) – input datapipeline (tuple) – pipeline to get predictions from
label_col (str, optional) – name of the label column in the input dataframe. Default:
"label"
column_names (dict, optional) – mapping of column names to rename columns in the output dataframe or
None
to keep original column names. Default:None
- Returns
dataframe with predictions as probabilities
- Return type
- biopsykit.classification.analysis.plot_conf_matrix(predictions, labels, label_name='label', conf_matrix_kwargs=None, **kwargs)[source]¶
Plot confusion matrix from predictions.
- Parameters
predictions (
DataFrame
) – dataframe with predictionslabels (list, dict, optional) – list of labels to use in the confusion matrix or dictionary with label names in the data frame as key and the corresponding label names to use in the confusion matrix as value. Default:
None
to use the labels in the data frame in the order they appearlabel_name (str, optional) – name of the ‘label’ in the axis titles. Default: “label” to yield “True label” and “Predicted label”
conf_matrix_kwargs (dict, optional) – additional keyword arguments to pass to
from_predictions()
**kwargs – additional keyword arguments to pass to
plt.subplots()
- Return type
- biopsykit.classification.analysis.plot_conf_matrix_proba(predictions, labels, label_col='label', label_name='label', **kwargs)[source]¶
Plot confusion matrix from prediction probabilities.
- Parameters
predictions (
DataFrame
) – dataframe with predictions as probabilitieslabels (list) – list of labels
label_col (str, optional) – name of the label column in the input dataframe. Default:
"label"
label_name (str, optional) – name of the ‘label’ in the axis titles. Default: “label” to yield “True label” and “Predicted label”
**kwargs – additional keyword arguments to pass to
plt.subplots()
- Return type
- biopsykit.classification.analysis.metric_summary_to_latex(permuter_or_df, metrics=None, pipeline_steps=None, si_table_format=None, highlight_best=None, **kwargs)[source]¶
Return a latex table with the performance metrics of the pipeline combinations.
Notes
This method is a legacy method that is kept for backwards compatibility with older pickled instances of the
SklearnPipelinePermuter
class. It is recommended to use theSklearnPipelinePermuter.metric_summary()
method instead.See also
SklearnPipelinePermuter.metric_summary_to_latex()
- Parameters
permuter_or_df (
SklearnPipelinePermuter
orDataFrame
) –SklearnPipelinePermuter
instance or dataframe with performance metrics.metrics (list of str, optional) – list of metrics to include in the table or
None
to use all available metrics in the dataframe. Default:None
pipeline_steps (list of str, optional) – list of pipeline steps to include in the table index or
None
to show all available pipeline steps as table index. Default:None
si_table_format (str, optional) – table format for the
siunitx
package orNone
to use the default format. Default:None
highlight_best (bool or str, optional) – Whether to highlight the pipeline with the best value in each column or not. * If
highlight_best
is a boolean, the best pipeline is highlighted in each column. * Ifhighlight_best
is a string, the best pipeline is highlighted in the column with the name**kwargs – additional keyword arguments passed to
to_latex()
- Return type