biopsykit.classification.analysis package

Functions to analyze classification results.

biopsykit.classification.analysis.predictions_as_df(pipeline_permuter, data, pipeline, label_mapping=None, index_col=None)[source]

Get predictions from a specified pipeline and merge them with the index of the input dataframe.

Parameters
  • pipeline_permuter (SklearnPipelinePermuter) – SklearnPipelinePermuter instance

  • data (DataFrame) – input data

  • pipeline (tuple) – pipeline to get predictions from

  • label_mapping (dict, optional) – mapping of labels to rename labels in the output dataframe or None to keep original labels. Default: None

  • index_col (str, optional) – name of the index column to merge the predictions with. If data has a multi-index, the first level is used unless index_col is specified. Default: None

Returns

predictions as dataframe

Return type

DataFrame

biopsykit.classification.analysis.predict_proba_from_estimator(pipeline_permuter, data, pipeline, label_col='label', column_names=None)[source]

Get predictions as probabilities from a specified pipeline and merge them with the index of the input dataframe.

Parameters
  • pipeline_permuter (SklearnPipelinePermuter) – SklearnPipelinePermuter instance

  • data (DataFrame) – input data

  • pipeline (tuple) – pipeline to get predictions from

  • label_col (str, optional) – name of the label column in the input dataframe. Default: "label"

  • column_names (dict, optional) – mapping of column names to rename columns in the output dataframe or None to keep original column names. Default: None

Returns

dataframe with predictions as probabilities

Return type

DataFrame

biopsykit.classification.analysis.plot_conf_matrix(predictions, labels, label_name='label', conf_matrix_kwargs=None, **kwargs)[source]

Plot confusion matrix from predictions.

Parameters
  • predictions (DataFrame) – dataframe with predictions

  • labels (list, dict, optional) – list of labels to use in the confusion matrix or dictionary with label names in the data frame as key and the corresponding label names to use in the confusion matrix as value. Default: None to use the labels in the data frame in the order they appear

  • label_name (str, optional) – name of the ‘label’ in the axis titles. Default: “label” to yield “True label” and “Predicted label”

  • conf_matrix_kwargs (dict, optional) – additional keyword arguments to pass to from_predictions()

  • **kwargs – additional keyword arguments to pass to plt.subplots()

Return type

Tuple[matplotlib.figure.Figure, matplotlib.axes._axes.Axes]

biopsykit.classification.analysis.plot_conf_matrix_proba(predictions, labels, label_col='label', label_name='label', **kwargs)[source]

Plot confusion matrix from prediction probabilities.

Parameters
  • predictions (DataFrame) – dataframe with predictions as probabilities

  • labels (list) – list of labels

  • label_col (str, optional) – name of the label column in the input dataframe. Default: "label"

  • label_name (str, optional) – name of the ‘label’ in the axis titles. Default: “label” to yield “True label” and “Predicted label”

  • **kwargs – additional keyword arguments to pass to plt.subplots()

Return type

Tuple[matplotlib.figure.Figure, matplotlib.axes._axes.Axes]

biopsykit.classification.analysis.metric_summary_to_latex(permuter_or_df, metrics=None, pipeline_steps=None, si_table_format=None, highlight_best=None, **kwargs)[source]

Return a latex table with the performance metrics of the pipeline combinations.

Notes

This method is a legacy method that is kept for backwards compatibility with older pickled instances of the SklearnPipelinePermuter class. It is recommended to use the SklearnPipelinePermuter.metric_summary() method instead.

See also

SklearnPipelinePermuter.metric_summary_to_latex()

Parameters
  • permuter_or_df (SklearnPipelinePermuter or DataFrame) – SklearnPipelinePermuter instance or dataframe with performance metrics.

  • metrics (list of str, optional) – list of metrics to include in the table or None to use all available metrics in the dataframe. Default: None

  • pipeline_steps (list of str, optional) – list of pipeline steps to include in the table index or None to show all available pipeline steps as table index. Default: None

  • si_table_format (str, optional) – table format for the siunitx package or None to use the default format. Default: None

  • highlight_best (bool or str, optional) – Whether to highlight the pipeline with the best value in each column or not. * If highlight_best is a boolean, the best pipeline is highlighted in each column. * If highlight_best is a string, the best pipeline is highlighted in the column with the name

  • **kwargs – additional keyword arguments passed to to_latex()

Return type

str