StatsPipeline & Plotting Example

This example shows how to set up a statistical analysis pipeline using StatsPipeline, how to export results, and how to create LaTeX output from statistical analysis results.

Additionally, it demonstrates the integrated plotting functions of BioPsyKit, that wrap the boxplot() function of seaborn and offer additional, often used features, such as adding significance brackets:

Import and Helper Functions

[1]:
from pathlib import Path

import pandas as pd
import numpy as np

from fau_colors import cmaps
import biopsykit as bp
from biopsykit.utils.dataframe_handling import multi_xs
from biopsykit.stats import StatsPipeline

import pingouin as pg

import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
%load_ext autoreload
%autoreload 2
[2]:
plt.close("all")

palette = sns.color_palette(cmaps.faculties_light)
sns.set_theme(context="notebook", style="ticks", font="sans-serif", palette=palette)

plt.rcParams["figure.figsize"] = (8, 4)
plt.rcParams["pdf.fonttype"] = 42
plt.rcParams["mathtext.default"] = "regular"

palette
[2]:

Data Import

[3]:
sample_times = [-30, -1, 30, 40, 50, 60, 70]

condition_order = ["Control", "Intervention"]

Raw Cortisol

[4]:
cort_samples = bp.example_data.get_saliva_example()
cort_samples.head()
[4]:
cortisol
condition subject sample
Intervention Vp01 0 6.98890
1 7.03320
2 5.77670
3 5.25790
4 5.00795
[5]:
fig, ax = plt.subplots(figsize=(8, 4))
# specialized function for plotting saliva data
bp.protocols.plotting.saliva_plot(
    data=cort_samples,
    saliva_type="cortisol",
    sample_times=sample_times,
    test_times=[0, 30],
    sample_times_absolute=True,
    test_title="TEST",
    ax=ax,
);
../../_images/examples__notebooks_StatsPipeline_Plotting_Example_9_0.svg

Compute Cortisol Features

[6]:
auc = bp.saliva.auc(
    cort_samples, saliva_type="cortisol", sample_times=sample_times, compute_auc_post=True, remove_s0=True
)
max_inc = bp.saliva.max_increase(cort_samples, saliva_type="cortisol", remove_s0=True)
slope = bp.saliva.slope(cort_samples, sample_idx=[1, 4], sample_times=sample_times, saliva_type="cortisol")

cort_features = pd.concat([auc, max_inc, slope], axis=1)
cort_features = bp.saliva.utils.saliva_feature_wide_to_long(cort_features, saliva_type="cortisol")
cort_features.head()
[6]:
cortisol
condition subject saliva_feature
Control Vp03 auc_g 870.418075
auc_i -72.035925
auc_i_post 104.057250
max_inc 0.956000
slope14 0.013971

See the Saliva_Example.ipynb notebook for further explanations on the different processing steps.

General Information

The purpose of creating such a statistical analysis pipeline is to assemble several steps of a typical statistical analysis procedure while setting different parameters. The parameters passed to this class depend on the used statistical functions.

The interface of this class is inspired by the scikit-learn Pipeline for machine learning tasks.

The different steps of statistical analysis can be divided into categories:

  • Preparatory Analysis (prep): Analyses applied to the data before performing the actual statistical analysis, such as:

    • normality: Testing whether a random sample comes from a normal distribution.

    • equal_var: Testing the equality of variances (homoscedasticity).

  • Statistical Tests (test): Statistical test to determine differences or similarities in the data, such as:

    • pairwise_ttests: Pairwise T-tests (either for independent or dependent samples).

    • anova: One-way or N-way ANOVA.

    • welch_anova: One-way Welch-ANOVA.

    • rm_anova: One-way and two-way repeated measures ANOVA.

    • mixed_anova: Mixed-design (split-plot) ANOVA.

    • kruskal: Kruskal-Wallis H-test for independent samples.

  • Posthoc Tests (posthoc): Posthoc tests to determine differences of individual groups if more than two groups are analyzed, such as:

    • pairwise_ttests: Pairwise T-tests (either for independent or dependent samples).

    • pairwise_tukey: Pairwise Tukey-HSD post-hoc test.

    • pairwise_gameshowell: Pairwise Games-Howell post-hoc test.

A StatsPipeline consists of a list of tuples specifying the individual steps of the pipeline. The first value of each tuple indicates the category this step belongs to (prep, test, or posthoc), the second value indicates the analysis function to use in this step (e.g., normality, or rm_anova).

Furthermore, a params dictionary specifying the parameters and variables for statistical analysis needs to be supplied. Parameters can either be specified globally, i.e., for all steps in the pipeline (the default), or locally, i.e., only for one specific category, by prepending the category name and separating it from the parameter name by a __. The parameters depend on the type of analysis used in the pipeline.

Examples are:

  • dv: column name of the dependent variable

  • between: column name of the between-subject factor

  • within: column name of the within-subject factor

  • effsize: type of effect size to compute (if applicable)

  • multicomp: whether (and how) to apply multi-comparison correction of p-values to the last step in the pipeline (either “test” or “posthoc”) using `StatsPipeline.multicomp() <https://biopsykit.readthedocs.io/en/latest/api/biopsykit.stats.html#biopsykit.stats.StatsPipeline.multicomp>`__. The arguments are supplied via dictionary.

More information can be found here: StatsPipeline

T-Tests

[7]:
cort_features.head()
[7]:
cortisol
condition subject saliva_feature
Control Vp03 auc_g 870.418075
auc_i -72.035925
auc_i_post 104.057250
max_inc 0.956000
slope14 0.013971

cort_features contains multiple features that need to be analyzed individually. The analysis pipeline can be applied to each feature individually by specifying a column to group the dataframe by (groupby parameter). The result dataframes will then contain the groupby column as index level.

[8]:
pipeline = StatsPipeline(
    steps=[("prep", "normality"), ("prep", "equal_var"), ("test", "pairwise_tests")],
    params={"dv": "cortisol", "groupby": "saliva_feature", "between": "condition"},
)

pipeline.apply(cort_features);

Display Results

[9]:
# display all results
pipeline.display_results()
# don't show results from the "prep" cagegory
# pipeline.display_results(prep=False)
# only significant results
# pipeline.display_results(sig_only=True)
# only significant results from the "posthoc" category (results from other categories will all be displayed)
# pipeline.display_results(sig_only="posthoc")

Overview

dv groupby between
parameter cortisol saliva_feature condition
prep prep test
parameter normality equal_var pairwise_tests

Preparatory Analysis

Test for Normal Distribution

W pval normal
saliva_feature condition
auc_g Control 0.898948 0.1294 True
Intervention 0.895831 0.1172 True
auc_i Control 0.836432 0.0191 False
Intervention 0.950165 0.6008 True
auc_i_post Control 0.852757 0.0310 False
Intervention 0.941383 0.4751 True
max_inc Control 0.853697 0.0318 False
Intervention 0.983297 0.9917 True
slope14 Control 0.857235 0.0354 False
Intervention 0.965331 0.8330 True

Test for Homoscedasticity (Equal Variances)

W pval equal_var
saliva_feature
auc_g levene 1.972959 0.1729 True
auc_i levene 0.999657 0.3274 True
auc_i_post levene 0.254504 0.6185 True
max_inc levene 1.946181 0.1758 True
slope14 levene 0.782662 0.3851 True

Statistical Tests

Pairwise Tests

Contrast A B Paired Parametric T dof alternative p-unc BF10 hedges
saliva_feature
auc_g 0 condition Control Intervention False True 0.934452 24.0 two-sided 0.3594 0.5 0.354948
auc_i 0 condition Control Intervention False True 1.970669 24.0 two-sided 0.0604 1.439 0.748551
auc_i_post 0 condition Control Intervention False True 1.910518 24.0 two-sided 0.0681 1.33 0.725703
max_inc 0 condition Control Intervention False True 2.169696 24.0 two-sided 0.0402 1.896 0.824150
slope14 0 condition Control Intervention False True 2.201028 24.0 two-sided 0.0376 1.984 0.836051

Further functions of StatsPipeline

Analysis categories and their respective analysis steps

[10]:
pipeline.category_steps
[10]:
{'prep': ['normality', 'equal_var'], 'test': ['pairwise_tests']}

Dictionary with analysis results per step

[11]:
results = pipeline.results

Get results from normality check

[12]:
results["normality"]
[12]:
W pval normal
saliva_feature condition
auc_g Control 0.898948 0.129354 True
Intervention 0.895831 0.117228 True
auc_i Control 0.836432 0.019092 False
Intervention 0.950165 0.600787 True
auc_i_post Control 0.852757 0.030958 False
Intervention 0.941383 0.475055 True
max_inc Control 0.853697 0.031845 False
Intervention 0.983297 0.991731 True
slope14 Control 0.857235 0.035429 False
Intervention 0.965331 0.833049 True

Return only results from one category

[13]:
pipeline.results_cat("test")
[13]:
Contrast A B Paired Parametric T dof alternative p-unc BF10 hedges
saliva_feature
auc_g 0 condition Control Intervention False True 0.934452 24.0 two-sided 0.359378 0.5 0.354948
auc_i 0 condition Control Intervention False True 1.970669 24.0 two-sided 0.060402 1.439 0.748551
auc_i_post 0 condition Control Intervention False True 1.910518 24.0 two-sided 0.068085 1.33 0.725703
max_inc 0 condition Control Intervention False True 2.169696 24.0 two-sided 0.040155 1.896 0.824150
slope14 0 condition Control Intervention False True 2.201028 24.0 two-sided 0.037595 1.984 0.836051

Export the whole pipeline as Excel sheet

[14]:
# pipeline.export_statistics("path_to_file.xlsx")

LaTeX Output

Inline LaTeX Code

Calling StatsPipeline.stats_to_latex() generates LaTeX code for each row. The output can be copied and pasted into LaTeX documents.

[15]:
pipeline.stats_to_latex("pairwise_tests")
[15]:
saliva_feature     Contrast   A        B
auc_g           0  condition  Control  Intervention    $t(24) = 0.934, p = 0.359, g = 0.355$
auc_i           0  condition  Control  Intervention    $t(24) = 1.971, p = 0.060, g = 0.749$
auc_i_post      0  condition  Control  Intervention    $t(24) = 1.911, p = 0.068, g = 0.726$
max_inc         0  condition  Control  Intervention    $t(24) = 2.170, p = 0.040, g = 0.824$
slope14         0  condition  Control  Intervention    $t(24) = 2.201, p = 0.038, g = 0.836$
dtype: object

Generate LaTeX output for selected rows:

[16]:
pipeline.stats_to_latex("pairwise_tests", "auc_g")
[16]:
'$t(24) = 0.934, p = 0.359, g = 0.355$'

LaTeX Table

StatsPipeline.results_to_latex_table() uses DataFrame.to_latex() to convert a pandas dataframe into a LaTeX table.

Default Output
[17]:
results_out = pipeline.results_to_latex_table("pairwise_tests")
print(results_out)
\begin{table}[th!]
\centering
\sisetup{table-format = <1.3}

\begin{tabular}{l||SSS}
\toprule
{} & {$t(24)$} &          {p} & {Hedges' g} \\
\textit{saliva_feature} &           &              &             \\
\midrule
\textit{auc_g}          &     0.934 &   0.359$^{}$ &       0.355 \\
\textit{auc_i}          &     1.971 &   0.060$^{}$ &       0.749 \\
\textit{auc_i_post}     &     1.911 &   0.068$^{}$ &       0.726 \\
\textit{max_inc}        &     2.170 &  0.040$^{*}$ &       0.824 \\
\textit{slope14}        &     2.201 &  0.038$^{*}$ &       0.836 \\
\bottomrule
\end{tabular}
\end{table}

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/stats/stats.py:828: FutureWarning: In future versions `DataFrame.to_latex` is expected to utilise the base implementation of `Styler.to_latex` for formatting and rendering. The arguments signature may therefore change. It is recommended instead to use `DataFrame.style.to_latex` which also contains additional functionality.
  data_latex = data.to_latex(**kwargs)
With Caption and Label
[18]:
results_out = pipeline.results_to_latex_table("pairwise_tests", caption="This is a caption.", label="tab:label")
print(results_out)
\begin{table}[th!]
\centering
\caption{This is a caption.}
\label{tab:label}
\sisetup{table-format = <1.3}

\begin{tabular}{l||SSS}
\toprule
{} & {$t(24)$} &          {p} & {Hedges' g} \\
\textit{saliva_feature} &           &              &             \\
\midrule
\textit{auc_g}          &     0.934 &   0.359$^{}$ &       0.355 \\
\textit{auc_i}          &     1.971 &   0.060$^{}$ &       0.749 \\
\textit{auc_i_post}     &     1.911 &   0.068$^{}$ &       0.726 \\
\textit{max_inc}        &     2.170 &  0.040$^{*}$ &       0.824 \\
\textit{slope14}        &     2.201 &  0.038$^{*}$ &       0.836 \\
\bottomrule
\end{tabular}
\end{table}

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/stats/stats.py:828: FutureWarning: In future versions `DataFrame.to_latex` is expected to utilise the base implementation of `Styler.to_latex` for formatting and rendering. The arguments signature may therefore change. It is recommended instead to use `DataFrame.style.to_latex` which also contains additional functionality.
  data_latex = data.to_latex(**kwargs)
Further Customization

By default, the column format for columns that contain numbers is “S” which is provided by the siunitx package. The column format can be configured by the si_table_format argument. The default table format is "<1.3".

Additionally, StatsPipeline.results_to_latex_table() provides many options to customize the LaTeX table output:

  • collapse_dof: True to collapse degree(s)-of-freedom (dof) from a separate column into the column header of the t- or F-value, respectively, False to keep it as separate “dof” column(s). This only works if the degrees-of-freedom are the same for all tests in the table. Default: True

  • unstack_levels: name(s) of dataframe index level(s) to be unstacked in the resulting LaTeX table or None to not unstack any index level(s). Default: None

  • All other arguments that are passed down to DataFrame.to_latex.

[19]:
results_out = pipeline.results_to_latex_table("pairwise_tests", collapse_dof=False, si_table_format="table-format=1.1")
print(results_out)
\begin{table}[th!]
\centering
\sisetup{table-format=1.1}

\begin{tabular}{l||SSSS}
\toprule
{} &    {t} & {df} &          {p} & {Hedges' g} \\
\textit{saliva_feature} &        &      &              &             \\
\midrule
\textit{auc_g}          &  0.934 &   24 &   0.359$^{}$ &       0.355 \\
\textit{auc_i}          &  1.971 &   24 &   0.060$^{}$ &       0.749 \\
\textit{auc_i_post}     &  1.911 &   24 &   0.068$^{}$ &       0.726 \\
\textit{max_inc}        &  2.170 &   24 &  0.040$^{*}$ &       0.824 \\
\textit{slope14}        &  2.201 &   24 &  0.038$^{*}$ &       0.836 \\
\bottomrule
\end{tabular}
\end{table}

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/stats/stats.py:828: FutureWarning: In future versions `DataFrame.to_latex` is expected to utilise the base implementation of `Styler.to_latex` for formatting and rendering. The arguments signature may therefore change. It is recommended instead to use `DataFrame.style.to_latex` which also contains additional functionality.
  data_latex = data.to_latex(**kwargs)
Index Customization

The index of the LaTeX table can be further configured by passing further arguments as a index_kws dict:

  • index_italic: True to format index columns in italic, False otherwise. Default: True

  • index_level_order: list of index level names indicating the index level order of a MultiIndex in the LaTeX table. If None the index order of the dataframe will be used

  • index_value_order: list of index values if rows in LaTeX table should have a different order than the underlying dataframe or if only specific rows should be exported as LaTeX table. If the table index is a MultiIndex then index_value_order should be a dictionary with the index level names as keys and lists of index values of the specific level as values

  • index_rename_map: mapping with dictionary with index values as keys and new index values to be exported

  • index_level_names_tex: names of index levels in the LaTeX table or None to keep the index level names of the dataframe

[20]:
index_kws = {
    "index_italic": False,
    "index_rename_map": {
        "auc_g": "$AUC_{G}$",
        "auc_i": "$AUC_{I}$",
        "auc_i_post": "$AUC_{I}^{Post}$",
        "max_inc": "$\Delta c_{max}$",
        "slope14": "$a_{S1S4}$",
    },
    "index_level_names_tex": "Saliva Feature",
}

results_out = pipeline.results_to_latex_table("pairwise_tests", index_kws=index_kws)
print(results_out)
\begin{table}[th!]
\centering
\sisetup{table-format = <1.3}

\begin{tabular}{l||SSS}
\toprule
{} & {$t(24)$} &          {p} & {Hedges' g} \\
Saliva Feature   &           &              &             \\
\midrule
$AUC_{G}$        &     0.934 &   0.359$^{}$ &       0.355 \\
$AUC_{I}$        &     1.971 &   0.060$^{}$ &       0.749 \\
$AUC_{I}^{Post}$ &     1.911 &   0.068$^{}$ &       0.726 \\
$\Delta c_{max}$ &     2.170 &  0.040$^{*}$ &       0.824 \\
$a_{S1S4}$       &     2.201 &  0.038$^{*}$ &       0.836 \\
\bottomrule
\end{tabular}
\end{table}

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/stats/stats.py:828: FutureWarning: In future versions `DataFrame.to_latex` is expected to utilise the base implementation of `Styler.to_latex` for formatting and rendering. The arguments signature may therefore change. It is recommended instead to use `DataFrame.style.to_latex` which also contains additional functionality.
  data_latex = data.to_latex(**kwargs)

Mixed ANOVA

To demonstrate Mixed-ANOVA analysis we construct some artificial example:

[21]:
data_example = multi_xs(cort_samples, ["2", "3"], level="sample")
[22]:
data_example.head()
[22]:
cortisol
condition subject sample
Control Vp03 2 10.10115
3 14.23000
Vp06 2 8.42920
3 12.12950
Vp08 2 3.09960
[23]:
pipeline = StatsPipeline(
    steps=[("prep", "normality"), ("prep", "equal_var"), ("test", "mixed_anova"), ("posthoc", "pairwise_tests")],
    params={
        "dv": "cortisol",
        "between": "condition",
        "within": "sample",
        "subject": "subject",
        "padjust": "bonf",  # specify multicorrection method to be applied on the posthoc tests
    },
)

pipeline.apply(data_example)
pipeline.display_results()

Overview

dv between within subject padjust
parameter cortisol condition sample subject bonf
prep prep test posthoc
parameter normality equal_var mixed_anova pairwise_tests

Preparatory Analysis

Test for Normal Distribution

W pval normal
sample condition
2 Control 0.879827 0.0709 True
Intervention 0.941023 0.4703 True
3 Control 0.892834 0.1067 True
Intervention 0.877228 0.0654 True

Test for Homoscedasticity (Equal Variances)

W pval equal_var
sample
2 levene 0.676865 0.4188 True
3 levene 3.976012 0.0576 True

Statistical Tests

Mixed ANOVA

Source SS DF1 DF2 MS F p-unc np2 eps
0 condition 38.270919 1 24 38.270919 1.611756 0.2164 0.062930 NaN
1 sample 12.540926 1 24 12.540926 8.048454 0.0091 0.251134 1.0
2 Interaction 10.106663 1 24 10.106663 6.486204 0.0177 0.212759 NaN

Post-Hoc Analysis

Pairwise Tests

Contrast sample A B Paired Parametric T dof alternative p-unc p-corr p-adjust BF10 hedges
0 sample - 2 3 True True -2.569062 25.0 two-sided 0.0166 NaN NaN 3.098 -0.267144
1 condition - Control Intervention False True 1.269549 24.0 two-sided 0.2164 NaN NaN 0.653 0.482233
2 sample * condition 2 Control Intervention False True 0.681959 24.0 two-sided 0.5018 1.0000 bonf 0.431 0.259040
3 sample * condition 3 Control Intervention False True 1.677748 24.0 two-sided 0.1064 0.2127 bonf 0.998 0.637286

Repeated-Measurements ANOVA

[24]:
data_slice = data_example.xs("Control", level="condition")

pipeline = StatsPipeline(
    steps=[("prep", "normality"), ("prep", "equal_var"), ("test", "rm_anova"), ("posthoc", "pairwise_tests")],
    params={"dv": "cortisol", "within": "sample", "subject": "subject"},
)

pipeline.apply(data_slice)
pipeline.display_results()

Overview

dv within subject
parameter cortisol sample subject
prep prep test posthoc
parameter normality equal_var rm_anova pairwise_tests

Preparatory Analysis

Test for Normal Distribution

W pval normal
sample
2 0.879827 0.0709 True
3 0.892834 0.1067 True

Test for Homoscedasticity (Equal Variances)

W pval equal_var
levene 1.094124 0.306 True

Statistical Tests

Repeated-measurement ANOVA

Source ddof1 ddof2 F p-unc ng2 eps
0 sample 1 12 9.099312 0.0107 0.048474 1.0

Post-Hoc Analysis

Pairwise Tests

Contrast A B Paired Parametric T dof alternative p-unc BF10 hedges
0 sample 2 3 True True -3.016507 12.0 two-sided 0.0107 5.469 -0.420007

Get Significance Brackets from StatsPipeline

StatsPipeline.sig_brackets() returns the significance brackets and the corresponding p-values to add to the plotting functions of BioPsyKit.

The method takes the following parameters (from the documentation):

  • stats_category_or_data: either a string of the pipeline category to use for generating significance brackets or a dataframe with statistical if significance brackets should be generated from the dataframe

  • stats_effect_type: effect type of analysis performed (“between”, “within”, “interaction”). Needed to extract the correct information from the analysis dataframe

  • plot_type: type of plot for which significance brackets are generated: “multi” if boxplots are grouped (by a hue variable), “single” (the default) otherwise

  • features: feature(s) displayed in the boxplot. The resulting significance brackets will be filtered accordingly to only contain features present in the boxplot. It can have the following formats:

    • str: only one feature is plotted in the boxplot => returns significance brackets of only one feature

    • list: multiple features are combined into one Axes object (i.e., no subplots) => returns significance brackets of multiple features

    • dict: if boxplots of features are organized in subplots then features needs to dictionary with the feature (or list of features) per subplot (subplots is True) => returns dictionary with significance brackets per subplot

  • x: name of column used as x axis in the boxplot. Only required if plot_type is “multi”.

Single Plot – One Feature

This example shows how to add significance brackets to a single plot with a single boxplot where only one type of feature is plotted (e.g., only to display max_inc feature, where the different groups are separated by the x variable).

If the feature to be plotted is only a subset of different features, it must be filtered via the features parameter:

[25]:
# rerun the t-test pipeline for plotting
pipeline = StatsPipeline(
    steps=[("prep", "normality"), ("prep", "equal_var"), ("test", "pairwise_tests")],
    params={"dv": "cortisol", "groupby": "saliva_feature", "between": "condition"},
)

pipeline.apply(cort_features);
[26]:
box_pairs, pvalues = pipeline.sig_brackets("test", stats_effect_type="between", plot_type="single", features="max_inc")
print(box_pairs)
[('Control', 'Intervention')]

Single Plot – Multiple Features

This example shows how to add significance brackets to a single plot with multiple boxplots where multiple types of features are plotted (e.g., to display the max_inc and slope14 features, where different features are separated by the x variable and the different groups are separated by the hue variable).

If only a subset of features should be plotted, the features of interest must be filtered via the features parameter:

[27]:
box_pairs, pvalues = pipeline.sig_brackets(
    "test", stats_effect_type="between", plot_type="multi", x="saliva_feature", features=["max_inc", "slope14"]
)
print(box_pairs)
[(('max_inc', 'Control'), ('max_inc', 'Intervention')), (('slope14', 'Control'), ('slope14', 'Intervention'))]

Multiple Plots – Multiple Features

This example shows how to add significance brackets to multiple subplots with multiple boxplots per subplot (e.g., to display the max_inc and the slope14 feature where max_inc and slope14 each have their own subplot, the features are plotted along the x variable and the different groups are separated by the hue variable). For creating significance brackets to be used in subplots set the subplots parameter to True.

By default, each feature is assumed to be in its own subplot (see the next example if you want to change the behavior).

If only a subset of features should be plotted, the features of interest must be filtered via the features parameter:

[28]:
box_pairs, pvalues = pipeline.sig_brackets(
    "test",
    stats_effect_type="between",
    plot_type="multi",
    x="saliva_feature",
    features=["max_inc", "slope14"],
    subplots=True,
)
print(box_pairs)
{'max_inc': [(('max_inc', 'Control'), ('max_inc', 'Intervention'))], 'slope14': [(('slope14', 'Control'), ('slope14', 'Intervention'))]}

If you want to specify a custom mapping of features to subplots you can provide a dictionary specifying this mapping as features argument (here, we want to have the features auc_i and auc_g in one subplot, and max_inc and slope14 in another subplot):

[29]:
box_pairs, pvalues = pipeline.sig_brackets(
    "test",
    stats_effect_type="between",
    plot_type="multi",
    x="saliva_feature",
    features={"auc": ["auc_i", "auc_g"], "inc": ["max_inc", "slope14"]},
    subplots=True,
)
print(box_pairs)
{'auc': [], 'inc': [(('max_inc', 'Control'), ('max_inc', 'Intervention')), (('slope14', 'Control'), ('slope14', 'Intervention'))]}

Plotting

Single Plot – One Feature

This example shows how to plot a single feature in a single boxplot using plotting.feature_boxplot(). The two conditions are plotted along the x axis.

[30]:
features = "max_inc"
data_plot = multi_xs(cort_features, features, level="saliva_feature")

box_pairs, pvalues = pipeline.sig_brackets("test", stats_effect_type="between", plot_type="single", features=features)

fig, ax = plt.subplots()
bp.plotting.feature_boxplot(
    data=data_plot,
    x="condition",
    y="cortisol",
    stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues, "verbose": 0},
    ax=ax,
)
fig.tight_layout()
../../_images/examples__notebooks_StatsPipeline_Plotting_Example_75_0.svg

Single Plot – Multiple Features

This example is the same as the one in the Single Plot – One Feature example above, but this time, the (single) feature is plotted along the x axis and the two groups are separated by the hue parameter. This makes it plot_type “multi” and thus requires to specify the x parameter in StatsPipeline.sig_brackets().

[31]:
features = "max_inc"
data_plot = multi_xs(cort_features, features, level="saliva_feature")

box_pairs, pvalues = pipeline.sig_brackets(
    "test", stats_effect_type="between", plot_type="multi", x="saliva_feature", features=features
)

fig, ax = plt.subplots()
bp.plotting.feature_boxplot(
    data=data_plot,
    x="saliva_feature",
    y="cortisol",
    hue="condition",
    stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues, "verbose": 0},
    ax=ax,
);
../../_images/examples__notebooks_StatsPipeline_Plotting_Example_78_0.svg

Now with a “real” example: We use plotting.feature_boxplot() to plot actually multiple features along the x axis with the hue variable separating the conditions.

In this example, however, no feature shows statistically significant differences between the two conditions.

[32]:
features = ["auc_g", "auc_i"]
data_plot = multi_xs(cort_features, features, level="saliva_feature")

box_pairs, pvalues = pipeline.sig_brackets(
    "test", stats_effect_type="between", features=features, plot_type="multi", x="saliva_feature"
)

fig, ax = plt.subplots()
bp.plotting.feature_boxplot(
    data=data_plot,
    x="saliva_feature",
    y="cortisol",
    hue="condition",
    stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues, "verbose": 0},
    ax=ax,
);
../../_images/examples__notebooks_StatsPipeline_Plotting_Example_80_0.svg

Multiple Plots – Multiple Features

This example shows how to plot features in multiple subplots with multiple boxplots per subplot. Here, max_inc and the slope14 features are displayed in their own subplot and the features auc_i and auc_g are combined into one joint subplot. The features are separated by the x variable and the different groups are separated by the hue variable.

This custom mapping of features to subplots can be provided via a dictionary passed to features argument.

[33]:
features = {"auc": ["auc_g", "auc_i"], "max_inc": "max_inc", "slope14": "slope14"}

box_pairs, pvalues = pipeline.sig_brackets(
    "test", stats_effect_type="between", plot_type="multi", x="saliva_feature", features=features, subplots=True
)

data_plot = cort_features.copy()

bp.plotting.multi_feature_boxplot(
    data=data_plot,
    x="saliva_feature",
    y="cortisol",
    hue="condition",
    group="saliva_feature",
    features=features,
    stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues, "verbose": 0},
)
fig.tight_layout()
../../_images/examples__notebooks_StatsPipeline_Plotting_Example_83_0.svg

Use Specialized saliva_feature_boxplot() functions

In addition to the “general-purpose” plotting functions BioPsyKit also offers specialized plotting functions for saliva features since plotting saliva data is a commonly performed task. These functions offer a better styling of axis and labels for saliva data.

[34]:
# rerun the pipeline
pipeline = StatsPipeline(
    steps=[("prep", "normality"), ("prep", "equal_var"), ("test", "pairwise_tests")],
    params={"dv": "cortisol", "groupby": "saliva_feature", "between": "condition"},
)

pipeline.apply(cort_features)
pipeline.display_results()

Overview

dv groupby between
parameter cortisol saliva_feature condition
prep prep test
parameter normality equal_var pairwise_tests

Preparatory Analysis

Test for Normal Distribution

W pval normal
saliva_feature condition
auc_g Control 0.898948 0.1294 True
Intervention 0.895831 0.1172 True
auc_i Control 0.836432 0.0191 False
Intervention 0.950165 0.6008 True
auc_i_post Control 0.852757 0.0310 False
Intervention 0.941383 0.4751 True
max_inc Control 0.853697 0.0318 False
Intervention 0.983297 0.9917 True
slope14 Control 0.857235 0.0354 False
Intervention 0.965331 0.8330 True

Test for Homoscedasticity (Equal Variances)

W pval equal_var
saliva_feature
auc_g levene 1.972959 0.1729 True
auc_i levene 0.999657 0.3274 True
auc_i_post levene 0.254504 0.6185 True
max_inc levene 1.946181 0.1758 True
slope14 levene 0.782662 0.3851 True

Statistical Tests

Pairwise Tests

Contrast A B Paired Parametric T dof alternative p-unc BF10 hedges
saliva_feature
auc_g 0 condition Control Intervention False True 0.934452 24.0 two-sided 0.3594 0.5 0.354948
auc_i 0 condition Control Intervention False True 1.970669 24.0 two-sided 0.0604 1.439 0.748551
auc_i_post 0 condition Control Intervention False True 1.910518 24.0 two-sided 0.0681 1.33 0.725703
max_inc 0 condition Control Intervention False True 2.169696 24.0 two-sided 0.0402 1.896 0.824150
slope14 0 condition Control Intervention False True 2.201028 24.0 two-sided 0.0376 1.984 0.836051

Single Plot – One Feature

[35]:
features = "max_inc"
data_plot = multi_xs(cort_features, features, level="saliva_feature")

box_pairs, pvalues = pipeline.sig_brackets("test", stats_effect_type="between", features=features, plot_type="single")

fig, ax = plt.subplots()
bp.protocols.plotting.saliva_feature_boxplot(
    data=data_plot,
    x="condition",
    saliva_type="cortisol",
    feature=features,
    stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues, "verbose": 0},
    ax=ax,
)
fig.tight_layout()
../../_images/examples__notebooks_StatsPipeline_Plotting_Example_88_0.svg

Single Plot – Multiple Feature

[36]:
features = ["auc_g", "auc_i"]
data_plot = multi_xs(cort_features, features, level="saliva_feature")

box_pairs, pvalues = pipeline.sig_brackets(
    "test", stats_effect_type="between", features=features, plot_type="multi", x="saliva_feature"
)

fig, ax = plt.subplots()
bp.protocols.plotting.saliva_feature_boxplot(
    data=data_plot,
    x="saliva_feature",
    saliva_type="cortisol",
    hue="condition",
    feature=features,
    stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues, "verbose": 0},
    ax=ax,
)
fig.tight_layout()
../../_images/examples__notebooks_StatsPipeline_Plotting_Example_90_0.svg

Plot Multiple Features in Subplots

Using saliva_multi_feature_boxplot():

[37]:
features = {"auc": ["auc_g", "auc_i"], "max_inc": "max_inc", "slope14": "slope14"}

box_pairs, pvalues = pipeline.sig_brackets(
    "test", stats_effect_type="between", plot_type="multi", x="saliva_feature", features=features, subplots=True
)

data_plot = cort_features.copy()

bp.protocols.plotting.saliva_multi_feature_boxplot(
    data=data_plot,
    saliva_type="cortisol",
    features=features,
    hue="condition",
    stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues, "verbose": 0},
)
fig.tight_layout()
../../_images/examples__notebooks_StatsPipeline_Plotting_Example_93_0.svg
[ ]:

Download Notebook
(Right-Click -> Save Link As...)