StatsPipeline & Plotting Example — BioPsyKit 0.10.2 documentation

Import and Helper Functions¶

[1]:

from pathlib import Path

import pandas as pd
import numpy as np

from fau_colors import cmaps
import biopsykit as bp
from biopsykit.utils.dataframe_handling import multi_xs
from biopsykit.stats import StatsPipeline

import pingouin as pg

import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
%load_ext autoreload
%autoreload 2

[2]:

plt.close("all")

palette = sns.color_palette(cmaps.faculties_light)
sns.set_theme(context="notebook", style="ticks", font="sans-serif", palette=palette)

plt.rcParams["figure.figsize"] = (8, 4)
plt.rcParams["pdf.fonttype"] = 42
plt.rcParams["mathtext.default"] = "regular"

palette

[2]:

Data Import¶

[3]:

sample_times = [-30, -1, 30, 40, 50, 60, 70]

condition_order = ["Control", "Intervention"]

Raw Cortisol¶

[4]:

cort_samples = bp.example_data.get_saliva_example()
cort_samples.head()

[4]:

			cortisol
condition	subject	sample
Intervention	Vp01	0	6.98890
		1	7.03320
		2	5.77670
		3	5.25790
		4	5.00795

[5]:

fig, ax = plt.subplots(figsize=(8, 4))
# specialized function for plotting saliva data
bp.protocols.plotting.saliva_plot(
    data=cort_samples,
    saliva_type="cortisol",
    sample_times=sample_times,
    test_times=[0, 30],
    sample_times_absolute=True,
    test_title="TEST",
    ax=ax,
);

../../_images/examples__notebooks_StatsPipeline_Plotting_Example_9_0.svg

Compute Cortisol Features¶

[6]:

auc = bp.saliva.auc(
    cort_samples, saliva_type="cortisol", sample_times=sample_times, compute_auc_post=True, remove_s0=True
)
max_inc = bp.saliva.max_increase(cort_samples, saliva_type="cortisol", remove_s0=True)
slope = bp.saliva.slope(cort_samples, sample_idx=[1, 4], sample_times=sample_times, saliva_type="cortisol")

cort_features = pd.concat([auc, max_inc, slope], axis=1)
cort_features = bp.saliva.utils.saliva_feature_wide_to_long(cort_features, saliva_type="cortisol")
cort_features.head()

[6]:

			cortisol
condition	subject	saliva_feature
Control	Vp03	auc_g	870.418075
		auc_i	-72.035925
		auc_i_post	104.057250
		max_inc	0.956000
		slope14	0.013971

See the Saliva_Example.ipynb notebook for further explanations on the different processing steps.

General Information¶

The purpose of creating such a statistical analysis pipeline is to assemble several steps of a typical statistical analysis procedure while setting different parameters. The parameters passed to this class depend on the used statistical functions.

The interface of this class is inspired by the scikit-learn Pipeline for machine learning tasks.

The different steps of statistical analysis can be divided into categories:

Preparatory Analysis (prep): Analyses applied to the data before performing the actual statistical analysis, such as:
- normality: Testing whether a random sample comes from a normal distribution.
- equal_var: Testing the equality of variances (homoscedasticity).
Statistical Tests (test): Statistical test to determine differences or similarities in the data, such as:
- pairwise_ttests: Pairwise T-tests (either for independent or dependent samples).
- anova: One-way or N-way ANOVA.
- welch_anova: One-way Welch-ANOVA.
- rm_anova: One-way and two-way repeated measures ANOVA.
- mixed_anova: Mixed-design (split-plot) ANOVA.
- kruskal: Kruskal-Wallis H-test for independent samples.
Posthoc Tests (posthoc): Posthoc tests to determine differences of individual groups if more than two groups are analyzed, such as:
- pairwise_ttests: Pairwise T-tests (either for independent or dependent samples).
- pairwise_tukey: Pairwise Tukey-HSD post-hoc test.
- pairwise_gameshowell: Pairwise Games-Howell post-hoc test.

A StatsPipeline consists of a list of tuples specifying the individual steps of the pipeline. The first value of each tuple indicates the category this step belongs to (prep, test, or posthoc), the second value indicates the analysis function to use in this step (e.g., normality, or rm_anova).

Furthermore, a params dictionary specifying the parameters and variables for statistical analysis needs to be supplied. Parameters can either be specified globally, i.e., for all steps in the pipeline (the default), or locally, i.e., only for one specific category, by prepending the category name and separating it from the parameter name by a __. The parameters depend on the type of analysis used in the pipeline.

Examples are:

dv: column name of the dependent variable
between: column name of the between-subject factor
within: column name of the within-subject factor
effsize: type of effect size to compute (if applicable)
multicomp: whether (and how) to apply multi-comparison correction of p-values to the last step in the pipeline (either “test” or “posthoc”) using `StatsPipeline.multicomp() <https://biopsykit.readthedocs.io/en/latest/api/biopsykit.stats.html#biopsykit.stats.StatsPipeline.multicomp>`__. The arguments are supplied via dictionary.
…

More information can be found here: StatsPipeline

T-Tests¶

[7]:

cort_features.head()

[7]:

			cortisol
condition	subject	saliva_feature
Control	Vp03	auc_g	870.418075
		auc_i	-72.035925
		auc_i_post	104.057250
		max_inc	0.956000
		slope14	0.013971

cort_features contains multiple features that need to be analyzed individually. The analysis pipeline can be applied to each feature individually by specifying a column to group the dataframe by (groupby parameter). The result dataframes will then contain the groupby column as index level.

[8]:

pipeline = StatsPipeline(
    steps=[("prep", "normality"), ("prep", "equal_var"), ("test", "pairwise_tests")],
    params={"dv": "cortisol", "groupby": "saliva_feature", "between": "condition"},
)

pipeline.apply(cort_features);

Display Results¶

[9]:

# display all results
pipeline.display_results()
# don't show results from the "prep" cagegory
# pipeline.display_results(prep=False)
# only significant results
# pipeline.display_results(sig_only=True)
# only significant results from the "posthoc" category (results from other categories will all be displayed)
# pipeline.display_results(sig_only="posthoc")

Overview

	dv	groupby	between
parameter	cortisol	saliva_feature	condition

	prep	prep	test
parameter	normality	equal_var	pairwise_tests

Preparatory Analysis

Test for Normal Distribution

		W	pval	normal
saliva_feature	condition
auc_g	Control	0.898948	0.1294	True
auc_g	Intervention	0.895831	0.1172	True
auc_i	Control	0.836432	0.0191	False
auc_i	Intervention	0.950165	0.6008	True
auc_i_post	Control	0.852757	0.0310	False
auc_i_post	Intervention	0.941383	0.4751	True
max_inc	Control	0.853697	0.0318	False
max_inc	Intervention	0.983297	0.9917	True
slope14	Control	0.857235	0.0354	False
slope14	Intervention	0.965331	0.8330	True

Test for Homoscedasticity (Equal Variances)

		W	pval	equal_var
saliva_feature
auc_g	levene	1.972959	0.1729	True
auc_i	levene	0.999657	0.3274	True
auc_i_post	levene	0.254504	0.6185	True
max_inc	levene	1.946181	0.1758	True
slope14	levene	0.782662	0.3851	True

Statistical Tests

Pairwise Tests

		Contrast	A	B	Paired	Parametric	T	dof	alternative	p-unc	BF10	hedges
saliva_feature
auc_g	0	condition	Control	Intervention	False	True	0.934452	24.0	two-sided	0.3594	0.5	0.354948
auc_i	0	condition	Control	Intervention	False	True	1.970669	24.0	two-sided	0.0604	1.439	0.748551
auc_i_post	0	condition	Control	Intervention	False	True	1.910518	24.0	two-sided	0.0681	1.33	0.725703
max_inc	0	condition	Control	Intervention	False	True	2.169696	24.0	two-sided	0.0402	1.896	0.824150
slope14	0	condition	Control	Intervention	False	True	2.201028	24.0	two-sided	0.0376	1.984	0.836051

Further functions of `StatsPipeline`¶

Analysis categories and their respective analysis steps

[10]:

pipeline.category_steps

[10]:

{'prep': ['normality', 'equal_var'], 'test': ['pairwise_tests']}

Dictionary with analysis results per step

[11]:

results = pipeline.results

Get results from normality check

[12]:

results["normality"]

[12]:

		W	pval	normal
saliva_feature	condition
auc_g	Control	0.898948	0.129354	True
auc_g	Intervention	0.895831	0.117228	True
auc_i	Control	0.836432	0.019092	False
auc_i	Intervention	0.950165	0.600787	True
auc_i_post	Control	0.852757	0.030958	False
auc_i_post	Intervention	0.941383	0.475055	True
max_inc	Control	0.853697	0.031845	False
max_inc	Intervention	0.983297	0.991731	True
slope14	Control	0.857235	0.035429	False
slope14	Intervention	0.965331	0.833049	True

Return only results from one category

[13]:

pipeline.results_cat("test")

[13]:

		Contrast	A	B	Paired	Parametric	T	dof	alternative	p-unc	BF10	hedges
saliva_feature
auc_g	0	condition	Control	Intervention	False	True	0.934452	24.0	two-sided	0.359378	0.5	0.354948
auc_i	0	condition	Control	Intervention	False	True	1.970669	24.0	two-sided	0.060402	1.439	0.748551
auc_i_post	0	condition	Control	Intervention	False	True	1.910518	24.0	two-sided	0.068085	1.33	0.725703
max_inc	0	condition	Control	Intervention	False	True	2.169696	24.0	two-sided	0.040155	1.896	0.824150
slope14	0	condition	Control	Intervention	False	True	2.201028	24.0	two-sided	0.037595	1.984	0.836051

Export the whole pipeline as Excel sheet

[14]:

# pipeline.export_statistics("path_to_file.xlsx")

LaTeX Output¶

Inline LaTeX Code¶

Calling StatsPipeline.stats_to_latex() generates LaTeX code for each row. The output can be copied and pasted into LaTeX documents.

[15]:

pipeline.stats_to_latex("pairwise_tests")

[15]:

saliva_feature     Contrast   A        B
auc_g           0  condition  Control  Intervention    $t(24) = 0.934, p = 0.359, g = 0.355$
auc_i           0  condition  Control  Intervention    $t(24) = 1.971, p = 0.060, g = 0.749$
auc_i_post      0  condition  Control  Intervention    $t(24) = 1.911, p = 0.068, g = 0.726$
max_inc         0  condition  Control  Intervention    $t(24) = 2.170, p = 0.040, g = 0.824$
slope14         0  condition  Control  Intervention    $t(24) = 2.201, p = 0.038, g = 0.836$
dtype: object

Generate LaTeX output for selected rows:

[16]:

pipeline.stats_to_latex("pairwise_tests", "auc_g")

[16]:

'$t(24) = 0.934, p = 0.359, g = 0.355$'

LaTeX Table¶

StatsPipeline.results_to_latex_table() uses DataFrame.to_latex() to convert a pandas dataframe into a LaTeX table.

Default Output¶

[17]:

results_out = pipeline.results_to_latex_table("pairwise_tests")
print(results_out)

\begin{table}[th!]
\centering
\sisetup{table-format = <1.3}

\begin{tabular}{l||SSS}
\toprule
{} & {$t(24)$} &          {p} & {Hedges' g} \\
\textit{saliva_feature} &           &              &             \\
\midrule
\textit{auc_g}          &     0.934 &   0.359$^{}$ &       0.355 \\
\textit{auc_i}          &     1.971 &   0.060$^{}$ &       0.749 \\
\textit{auc_i_post}     &     1.911 &   0.068$^{}$ &       0.726 \\
\textit{max_inc}        &     2.170 &  0.040$^{*}$ &       0.824 \\
\textit{slope14}        &     2.201 &  0.038$^{*}$ &       0.836 \\
\bottomrule
\end{tabular}
\end{table}

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/stats/stats.py:828: FutureWarning: In future versions `DataFrame.to_latex` is expected to utilise the base implementation of `Styler.to_latex` for formatting and rendering. The arguments signature may therefore change. It is recommended instead to use `DataFrame.style.to_latex` which also contains additional functionality.
  data_latex = data.to_latex(**kwargs)

With Caption and Label¶

[18]:

results_out = pipeline.results_to_latex_table("pairwise_tests", caption="This is a caption.", label="tab:label")
print(results_out)

\begin{table}[th!]
\centering
\caption{This is a caption.}
\label{tab:label}
\sisetup{table-format = <1.3}

\begin{tabular}{l||SSS}
\toprule
{} & {$t(24)$} &          {p} & {Hedges' g} \\
\textit{saliva_feature} &           &              &             \\
\midrule
\textit{auc_g}          &     0.934 &   0.359$^{}$ &       0.355 \\
\textit{auc_i}          &     1.971 &   0.060$^{}$ &       0.749 \\
\textit{auc_i_post}     &     1.911 &   0.068$^{}$ &       0.726 \\
\textit{max_inc}        &     2.170 &  0.040$^{*}$ &       0.824 \\
\textit{slope14}        &     2.201 &  0.038$^{*}$ &       0.836 \\
\bottomrule
\end{tabular}
\end{table}

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/stats/stats.py:828: FutureWarning: In future versions `DataFrame.to_latex` is expected to utilise the base implementation of `Styler.to_latex` for formatting and rendering. The arguments signature may therefore change. It is recommended instead to use `DataFrame.style.to_latex` which also contains additional functionality.
  data_latex = data.to_latex(**kwargs)

Further Customization¶

By default, the column format for columns that contain numbers is “S” which is provided by the siunitx package. The column format can be configured by the si_table_format argument. The default table format is "<1.3".

Additionally, StatsPipeline.results_to_latex_table() provides many options to customize the LaTeX table output:

collapse_dof: True to collapse degree(s)-of-freedom (dof) from a separate column into the column header of the t- or F-value, respectively, False to keep it as separate “dof” column(s). This only works if the degrees-of-freedom are the same for all tests in the table. Default: True
unstack_levels: name(s) of dataframe index level(s) to be unstacked in the resulting LaTeX table or None to not unstack any index level(s). Default: None
All other arguments that are passed down to DataFrame.to_latex.

[19]:

results_out = pipeline.results_to_latex_table("pairwise_tests", collapse_dof=False, si_table_format="table-format=1.1")
print(results_out)

\begin{table}[th!]
\centering
\sisetup{table-format=1.1}

\begin{tabular}{l||SSSS}
\toprule
{} &    {t} & {df} &          {p} & {Hedges' g} \\
\textit{saliva_feature} &        &      &              &             \\
\midrule
\textit{auc_g}          &  0.934 &   24 &   0.359$^{}$ &       0.355 \\
\textit{auc_i}          &  1.971 &   24 &   0.060$^{}$ &       0.749 \\
\textit{auc_i_post}     &  1.911 &   24 &   0.068$^{}$ &       0.726 \\
\textit{max_inc}        &  2.170 &   24 &  0.040$^{*}$ &       0.824 \\
\textit{slope14}        &  2.201 &   24 &  0.038$^{*}$ &       0.836 \\
\bottomrule
\end{tabular}
\end{table}

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/stats/stats.py:828: FutureWarning: In future versions `DataFrame.to_latex` is expected to utilise the base implementation of `Styler.to_latex` for formatting and rendering. The arguments signature may therefore change. It is recommended instead to use `DataFrame.style.to_latex` which also contains additional functionality.
  data_latex = data.to_latex(**kwargs)

Index Customization¶

The index of the LaTeX table can be further configured by passing further arguments as a index_kws dict:

index_italic: True to format index columns in italic, False otherwise. Default: True
index_level_order: list of index level names indicating the index level order of a MultiIndex in the LaTeX table. If None the index order of the dataframe will be used
index_value_order: list of index values if rows in LaTeX table should have a different order than the underlying dataframe or if only specific rows should be exported as LaTeX table. If the table index is a MultiIndex then index_value_order should be a dictionary with the index level names as keys and lists of index values of the specific level as values
index_rename_map: mapping with dictionary with index values as keys and new index values to be exported
index_level_names_tex: names of index levels in the LaTeX table or None to keep the index level names of the dataframe

[20]:

index_kws = {
    "index_italic": False,
    "index_rename_map": {
        "auc_g": "$AUC_{G}$",
        "auc_i": "$AUC_{I}$",
        "auc_i_post": "$AUC_{I}^{Post}$",
        "max_inc": "$\Delta c_{max}$",
        "slope14": "$a_{S1S4}$",
    },
    "index_level_names_tex": "Saliva Feature",
}

results_out = pipeline.results_to_latex_table("pairwise_tests", index_kws=index_kws)
print(results_out)

\begin{table}[th!]
\centering
\sisetup{table-format = <1.3}

\begin{tabular}{l||SSS}
\toprule
{} & {$t(24)$} &          {p} & {Hedges' g} \\
Saliva Feature   &           &              &             \\
\midrule
$AUC_{G}$        &     0.934 &   0.359$^{}$ &       0.355 \\
$AUC_{I}$        &     1.971 &   0.060$^{}$ &       0.749 \\
$AUC_{I}^{Post}$ &     1.911 &   0.068$^{}$ &       0.726 \\
$\Delta c_{max}$ &     2.170 &  0.040$^{*}$ &       0.824 \\
$a_{S1S4}$       &     2.201 &  0.038$^{*}$ &       0.836 \\
\bottomrule
\end{tabular}
\end{table}

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/stats/stats.py:828: FutureWarning: In future versions `DataFrame.to_latex` is expected to utilise the base implementation of `Styler.to_latex` for formatting and rendering. The arguments signature may therefore change. It is recommended instead to use `DataFrame.style.to_latex` which also contains additional functionality.
  data_latex = data.to_latex(**kwargs)

Mixed ANOVA¶

To demonstrate Mixed-ANOVA analysis we construct some artificial example:

[21]:

data_example = multi_xs(cort_samples, ["2", "3"], level="sample")

[22]:

data_example.head()

[22]:

			cortisol
condition	subject	sample
Control	Vp03	2	10.10115
	Vp03	3	14.23000
	Vp06	2	8.42920
	Vp06	3	12.12950
	Vp08	2	3.09960

[23]:

pipeline = StatsPipeline(
    steps=[("prep", "normality"), ("prep", "equal_var"), ("test", "mixed_anova"), ("posthoc", "pairwise_tests")],
    params={
        "dv": "cortisol",
        "between": "condition",
        "within": "sample",
        "subject": "subject",
        "padjust": "bonf",  # specify multicorrection method to be applied on the posthoc tests
    },
)

pipeline.apply(data_example)
pipeline.display_results()

Overview

	dv	between	within	subject	padjust
parameter	cortisol	condition	sample	subject	bonf

	prep	prep	test	posthoc
parameter	normality	equal_var	mixed_anova	pairwise_tests

Preparatory Analysis

Test for Normal Distribution

		W	pval	normal
sample	condition
2	Control	0.879827	0.0709	True
2	Intervention	0.941023	0.4703	True
3	Control	0.892834	0.1067	True
3	Intervention	0.877228	0.0654	True

Test for Homoscedasticity (Equal Variances)

		W	pval	equal_var
sample
2	levene	0.676865	0.4188	True
3	levene	3.976012	0.0576	True

Statistical Tests

Mixed ANOVA

	Source	SS	DF1	DF2	MS	F	p-unc	np2	eps
0	condition	38.270919	1	24	38.270919	1.611756	0.2164	0.062930	NaN
1	sample	12.540926	1	24	12.540926	8.048454	0.0091	0.251134	1.0
2	Interaction	10.106663	1	24	10.106663	6.486204	0.0177	0.212759	NaN

Post-Hoc Analysis

Pairwise Tests

	Contrast	sample	A	B	Paired	Parametric	T	dof	alternative	p-unc	p-corr	p-adjust	BF10	hedges
0	sample	-	2	3	True	True	-2.569062	25.0	two-sided	0.0166	NaN	NaN	3.098	-0.267144
1	condition	-	Control	Intervention	False	True	1.269549	24.0	two-sided	0.2164	NaN	NaN	0.653	0.482233
2	sample * condition	2	Control	Intervention	False	True	0.681959	24.0	two-sided	0.5018	1.0000	bonf	0.431	0.259040
3	sample * condition	3	Control	Intervention	False	True	1.677748	24.0	two-sided	0.1064	0.2127	bonf	0.998	0.637286

Repeated-Measurements ANOVA¶

[24]:

data_slice = data_example.xs("Control", level="condition")

pipeline = StatsPipeline(
    steps=[("prep", "normality"), ("prep", "equal_var"), ("test", "rm_anova"), ("posthoc", "pairwise_tests")],
    params={"dv": "cortisol", "within": "sample", "subject": "subject"},
)

pipeline.apply(data_slice)
pipeline.display_results()

Overview

	dv	within	subject
parameter	cortisol	sample	subject

	prep	prep	test	posthoc
parameter	normality	equal_var	rm_anova	pairwise_tests

Preparatory Analysis

Test for Normal Distribution

	W	pval	normal
sample
2	0.879827	0.0709	True
3	0.892834	0.1067	True

Test for Homoscedasticity (Equal Variances)

	W	pval	equal_var
levene	1.094124	0.306	True

Statistical Tests

Repeated-measurement ANOVA

	Source	ddof1	ddof2	F	p-unc	ng2	eps
0	sample	1	12	9.099312	0.0107	0.048474	1.0

Post-Hoc Analysis

Pairwise Tests

	Contrast	A	B	Paired	Parametric	T	dof	alternative	p-unc	BF10	hedges
0	sample	2	3	True	True	-3.016507	12.0	two-sided	0.0107	5.469	-0.420007

Get Significance Brackets from StatsPipeline ¶

StatsPipeline.sig_brackets() returns the significance brackets and the corresponding p-values to add to the plotting functions of BioPsyKit.

The method takes the following parameters (from the documentation):

stats_category_or_data: either a string of the pipeline category to use for generating significance brackets or a dataframe with statistical if significance brackets should be generated from the dataframe
stats_effect_type: effect type of analysis performed (“between”, “within”, “interaction”). Needed to extract the correct information from the analysis dataframe
plot_type: type of plot for which significance brackets are generated: “multi” if boxplots are grouped (by a hue variable), “single” (the default) otherwise
features: feature(s) displayed in the boxplot. The resulting significance brackets will be filtered accordingly to only contain features present in the boxplot. It can have the following formats:
- str: only one feature is plotted in the boxplot => returns significance brackets of only one feature
- list: multiple features are combined into one Axes object (i.e., no subplots) => returns significance brackets of multiple features
- dict: if boxplots of features are organized in subplots then features needs to dictionary with the feature (or list of features) per subplot (subplots is True) => returns dictionary with significance brackets per subplot
x: name of column used as x axis in the boxplot. Only required if plot_type is “multi”.

Single Plot – One Feature¶

This example shows how to add significance brackets to a single plot with a single boxplot where only one type of feature is plotted (e.g., only to display max_inc feature, where the different groups are separated by the x variable).

If the feature to be plotted is only a subset of different features, it must be filtered via the features parameter:

[25]:

# rerun the t-test pipeline for plotting
pipeline = StatsPipeline(
    steps=[("prep", "normality"), ("prep", "equal_var"), ("test", "pairwise_tests")],
    params={"dv": "cortisol", "groupby": "saliva_feature", "between": "condition"},
)

pipeline.apply(cort_features);

[26]:

box_pairs, pvalues = pipeline.sig_brackets("test", stats_effect_type="between", plot_type="single", features="max_inc")
print(box_pairs)

[('Control', 'Intervention')]

Single Plot – Multiple Features¶

This example shows how to add significance brackets to a single plot with multiple boxplots where multiple types of features are plotted (e.g., to display the max_inc and slope14 features, where different features are separated by the x variable and the different groups are separated by the hue variable).

If only a subset of features should be plotted, the features of interest must be filtered via the features parameter:

[27]:

box_pairs, pvalues = pipeline.sig_brackets(
    "test", stats_effect_type="between", plot_type="multi", x="saliva_feature", features=["max_inc", "slope14"]
)
print(box_pairs)

[(('max_inc', 'Control'), ('max_inc', 'Intervention')), (('slope14', 'Control'), ('slope14', 'Intervention'))]

Multiple Plots – Multiple Features¶

This example shows how to add significance brackets to multiple subplots with multiple boxplots per subplot (e.g., to display the max_inc and the slope14 feature where max_inc and slope14 each have their own subplot, the features are plotted along the x variable and the different groups are separated by the hue variable). For creating significance brackets to be used in subplots set the subplots parameter to True.

By default, each feature is assumed to be in its own subplot (see the next example if you want to change the behavior).

If only a subset of features should be plotted, the features of interest must be filtered via the features parameter:

[28]:

box_pairs, pvalues = pipeline.sig_brackets(
    "test",
    stats_effect_type="between",
    plot_type="multi",
    x="saliva_feature",
    features=["max_inc", "slope14"],
    subplots=True,
)
print(box_pairs)

{'max_inc': [(('max_inc', 'Control'), ('max_inc', 'Intervention'))], 'slope14': [(('slope14', 'Control'), ('slope14', 'Intervention'))]}

If you want to specify a custom mapping of features to subplots you can provide a dictionary specifying this mapping as features argument (here, we want to have the features auc_i and auc_g in one subplot, and max_inc and slope14 in another subplot):

[29]:

box_pairs, pvalues = pipeline.sig_brackets(
    "test",
    stats_effect_type="between",
    plot_type="multi",
    x="saliva_feature",
    features={"auc": ["auc_i", "auc_g"], "inc": ["max_inc", "slope14"]},
    subplots=True,
)
print(box_pairs)

{'auc': [], 'inc': [(('max_inc', 'Control'), ('max_inc', 'Intervention')), (('slope14', 'Control'), ('slope14', 'Intervention'))]}

Plotting¶

Single Plot – One Feature¶

This example shows how to plot a single feature in a single boxplot using plotting.feature_boxplot(). The two conditions are plotted along the x axis.

[30]:

features = "max_inc"
data_plot = multi_xs(cort_features, features, level="saliva_feature")

box_pairs, pvalues = pipeline.sig_brackets("test", stats_effect_type="between", plot_type="single", features=features)

fig, ax = plt.subplots()
bp.plotting.feature_boxplot(
    data=data_plot,
    x="condition",
    y="cortisol",
    stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues, "verbose": 0},
    ax=ax,
)
fig.tight_layout()

../../_images/examples__notebooks_StatsPipeline_Plotting_Example_75_0.svg

Single Plot – Multiple Features¶

This example is the same as the one in the Single Plot – One Feature example above, but this time, the (single) feature is plotted along the x axis and the two groups are separated by the hue parameter. This makes it plot_type “multi” and thus requires to specify the x parameter in StatsPipeline.sig_brackets().

[31]:

features = "max_inc"
data_plot = multi_xs(cort_features, features, level="saliva_feature")

box_pairs, pvalues = pipeline.sig_brackets(
    "test", stats_effect_type="between", plot_type="multi", x="saliva_feature", features=features
)

fig, ax = plt.subplots()
bp.plotting.feature_boxplot(
    data=data_plot,
    x="saliva_feature",
    y="cortisol",
    hue="condition",
    stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues, "verbose": 0},
    ax=ax,
);

../../_images/examples__notebooks_StatsPipeline_Plotting_Example_78_0.svg

Now with a “real” example: We use plotting.feature_boxplot() to plot actually multiple features along the x axis with the hue variable separating the conditions.

In this example, however, no feature shows statistically significant differences between the two conditions.

[32]:

features = ["auc_g", "auc_i"]
data_plot = multi_xs(cort_features, features, level="saliva_feature")

box_pairs, pvalues = pipeline.sig_brackets(
    "test", stats_effect_type="between", features=features, plot_type="multi", x="saliva_feature"
)

fig, ax = plt.subplots()
bp.plotting.feature_boxplot(
    data=data_plot,
    x="saliva_feature",
    y="cortisol",
    hue="condition",
    stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues, "verbose": 0},
    ax=ax,
);

../../_images/examples__notebooks_StatsPipeline_Plotting_Example_80_0.svg

Multiple Plots – Multiple Features¶

This example shows how to plot features in multiple subplots with multiple boxplots per subplot. Here, max_inc and the slope14 features are displayed in their own subplot and the features auc_i and auc_g are combined into one joint subplot. The features are separated by the x variable and the different groups are separated by the hue variable.

This custom mapping of features to subplots can be provided via a dictionary passed to features argument.

[33]:

features = {"auc": ["auc_g", "auc_i"], "max_inc": "max_inc", "slope14": "slope14"}

box_pairs, pvalues = pipeline.sig_brackets(
    "test", stats_effect_type="between", plot_type="multi", x="saliva_feature", features=features, subplots=True
)

data_plot = cort_features.copy()

bp.plotting.multi_feature_boxplot(
    data=data_plot,
    x="saliva_feature",
    y="cortisol",
    hue="condition",
    group="saliva_feature",
    features=features,
    stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues, "verbose": 0},
)
fig.tight_layout()

../../_images/examples__notebooks_StatsPipeline_Plotting_Example_83_0.svg

Use Specialized saliva_feature_boxplot() functions¶

In addition to the “general-purpose” plotting functions BioPsyKit also offers specialized plotting functions for saliva features since plotting saliva data is a commonly performed task. These functions offer a better styling of axis and labels for saliva data.

[34]:

# rerun the pipeline
pipeline = StatsPipeline(
    steps=[("prep", "normality"), ("prep", "equal_var"), ("test", "pairwise_tests")],
    params={"dv": "cortisol", "groupby": "saliva_feature", "between": "condition"},
)

pipeline.apply(cort_features)
pipeline.display_results()

Overview

	dv	groupby	between
parameter	cortisol	saliva_feature	condition

	prep	prep	test
parameter	normality	equal_var	pairwise_tests

Preparatory Analysis

Test for Normal Distribution

		W	pval	normal
saliva_feature	condition
auc_g	Control	0.898948	0.1294	True
auc_g	Intervention	0.895831	0.1172	True
auc_i	Control	0.836432	0.0191	False
auc_i	Intervention	0.950165	0.6008	True
auc_i_post	Control	0.852757	0.0310	False
auc_i_post	Intervention	0.941383	0.4751	True
max_inc	Control	0.853697	0.0318	False
max_inc	Intervention	0.983297	0.9917	True
slope14	Control	0.857235	0.0354	False
slope14	Intervention	0.965331	0.8330	True

Test for Homoscedasticity (Equal Variances)

		W	pval	equal_var
saliva_feature
auc_g	levene	1.972959	0.1729	True
auc_i	levene	0.999657	0.3274	True
auc_i_post	levene	0.254504	0.6185	True
max_inc	levene	1.946181	0.1758	True
slope14	levene	0.782662	0.3851	True

Statistical Tests

Pairwise Tests

		Contrast	A	B	Paired	Parametric	T	dof	alternative	p-unc	BF10	hedges
saliva_feature
auc_g	0	condition	Control	Intervention	False	True	0.934452	24.0	two-sided	0.3594	0.5	0.354948
auc_i	0	condition	Control	Intervention	False	True	1.970669	24.0	two-sided	0.0604	1.439	0.748551
auc_i_post	0	condition	Control	Intervention	False	True	1.910518	24.0	two-sided	0.0681	1.33	0.725703
max_inc	0	condition	Control	Intervention	False	True	2.169696	24.0	two-sided	0.0402	1.896	0.824150
slope14	0	condition	Control	Intervention	False	True	2.201028	24.0	two-sided	0.0376	1.984	0.836051

Single Plot – One Feature¶

[35]:

features = "max_inc"
data_plot = multi_xs(cort_features, features, level="saliva_feature")

box_pairs, pvalues = pipeline.sig_brackets("test", stats_effect_type="between", features=features, plot_type="single")

fig, ax = plt.subplots()
bp.protocols.plotting.saliva_feature_boxplot(
    data=data_plot,
    x="condition",
    saliva_type="cortisol",
    feature=features,
    stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues, "verbose": 0},
    ax=ax,
)
fig.tight_layout()

../../_images/examples__notebooks_StatsPipeline_Plotting_Example_88_0.svg

Single Plot – Multiple Feature¶

[36]:

features = ["auc_g", "auc_i"]
data_plot = multi_xs(cort_features, features, level="saliva_feature")

box_pairs, pvalues = pipeline.sig_brackets(
    "test", stats_effect_type="between", features=features, plot_type="multi", x="saliva_feature"
)

fig, ax = plt.subplots()
bp.protocols.plotting.saliva_feature_boxplot(
    data=data_plot,
    x="saliva_feature",
    saliva_type="cortisol",
    hue="condition",
    feature=features,
    stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues, "verbose": 0},
    ax=ax,
)
fig.tight_layout()

../../_images/examples__notebooks_StatsPipeline_Plotting_Example_90_0.svg

Plot Multiple Features in Subplots¶

Using saliva_multi_feature_boxplot():

[37]:

features = {"auc": ["auc_g", "auc_i"], "max_inc": "max_inc", "slope14": "slope14"}

box_pairs, pvalues = pipeline.sig_brackets(
    "test", stats_effect_type="between", plot_type="multi", x="saliva_feature", features=features, subplots=True
)

data_plot = cort_features.copy()

bp.protocols.plotting.saliva_multi_feature_boxplot(
    data=data_plot,
    saliva_type="cortisol",
    features=features,
    hue="condition",
    stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues, "verbose": 0},
)
fig.tight_layout()

../../_images/examples__notebooks_StatsPipeline_Plotting_Example_93_0.svg

[ ]:

StatsPipeline & Plotting Example¶

Import and Helper Functions¶

Data Import¶

Raw Cortisol¶

Compute Cortisol Features¶

General Information¶

T-Tests¶

Display Results¶

Further functions of StatsPipeline¶

LaTeX Output¶

Inline LaTeX Code¶

LaTeX Table¶

Default Output¶

With Caption and Label¶

Further Customization¶

Index Customization¶

Mixed ANOVA¶

Repeated-Measurements ANOVA¶

Get Significance Brackets from StatsPipeline¶

Single Plot – One Feature¶

Single Plot – Multiple Features¶

Multiple Plots – Multiple Features¶

Plotting¶

Single Plot – One Feature¶

Single Plot – Multiple Features¶

Multiple Plots – Multiple Features¶

Use Specialized saliva_feature_boxplot() functions¶

Single Plot – One Feature¶

Single Plot – Multiple Feature¶

Plot Multiple Features in Subplots¶

Further functions of `StatsPipeline`¶

Get Significance Brackets from StatsPipeline ¶