StatsPipeline & Plotting Example¶
This example shows how to set up a statistical analysis pipeline using StatsPipeline, how to export results, and how to create LaTeX output from statistical analysis results.
Additionally, it demonstrates the integrated plotting functions of BioPsyKit
, that wrap the boxplot() function of seaborn and offer additional, often used features, such as adding significance brackets:
as well as the derived functions specialized for saliva features: saliva_feature_boxplot() and saliva_multi_feature_boxplot().
Import and Helper Functions¶
[1]:
from pathlib import Path
import pandas as pd
import numpy as np
from fau_colors import cmaps
import biopsykit as bp
from biopsykit.utils.dataframe_handling import multi_xs
from biopsykit.stats import StatsPipeline
import pingouin as pg
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
%load_ext autoreload
%autoreload 2
[2]:
plt.close("all")
palette = sns.color_palette(cmaps.faculties_light)
sns.set_theme(context="notebook", style="ticks", font="sans-serif", palette=palette)
plt.rcParams["figure.figsize"] = (8, 4)
plt.rcParams["pdf.fonttype"] = 42
plt.rcParams["mathtext.default"] = "regular"
palette
[2]:
Data Import¶
[3]:
sample_times = [-30, -1, 30, 40, 50, 60, 70]
condition_order = ["Control", "Intervention"]
Raw Cortisol¶
[4]:
cort_samples = bp.example_data.get_saliva_example()
cort_samples.head()
[4]:
cortisol | |||
---|---|---|---|
condition | subject | sample | |
Intervention | Vp01 | 0 | 6.98890 |
1 | 7.03320 | ||
2 | 5.77670 | ||
3 | 5.25790 | ||
4 | 5.00795 |
[5]:
fig, ax = plt.subplots(figsize=(8, 4))
# specialized function for plotting saliva data
bp.protocols.plotting.saliva_plot(
data=cort_samples,
saliva_type="cortisol",
sample_times=sample_times,
test_times=[0, 30],
sample_times_absolute=True,
test_title="TEST",
ax=ax,
);
Compute Cortisol Features¶
[6]:
auc = bp.saliva.auc(
cort_samples, saliva_type="cortisol", sample_times=sample_times, compute_auc_post=True, remove_s0=True
)
max_inc = bp.saliva.max_increase(cort_samples, saliva_type="cortisol", remove_s0=True)
slope = bp.saliva.slope(cort_samples, sample_idx=[1, 4], sample_times=sample_times, saliva_type="cortisol")
cort_features = pd.concat([auc, max_inc, slope], axis=1)
cort_features = bp.saliva.utils.saliva_feature_wide_to_long(cort_features, saliva_type="cortisol")
cort_features.head()
[6]:
cortisol | |||
---|---|---|---|
condition | subject | saliva_feature | |
Control | Vp03 | auc_g | 870.418075 |
auc_i | -72.035925 | ||
auc_i_post | 104.057250 | ||
max_inc | 0.956000 | ||
slope14 | 0.013971 |
See the Saliva_Example.ipynb notebook for further explanations on the different processing steps.
General Information¶
The purpose of creating such a statistical analysis pipeline is to assemble several steps of a typical statistical analysis procedure while setting different parameters. The parameters passed to this class depend on the used statistical functions.
The interface of this class is inspired by the scikit-learn Pipeline for machine learning tasks.
The different steps of statistical analysis can be divided into categories:
Preparatory Analysis (
prep
): Analyses applied to the data before performing the actual statistical analysis, such as:normality
: Testing whether a random sample comes from a normal distribution.equal_var
: Testing the equality of variances (homoscedasticity).
Statistical Tests (
test
): Statistical test to determine differences or similarities in the data, such as:pairwise_ttests
: Pairwise T-tests (either for independent or dependent samples).anova
: One-way or N-way ANOVA.welch_anova
: One-way Welch-ANOVA.rm_anova
: One-way and two-way repeated measures ANOVA.mixed_anova
: Mixed-design (split-plot) ANOVA.kruskal
: Kruskal-Wallis H-test for independent samples.
Posthoc Tests (
posthoc
): Posthoc tests to determine differences of individual groups if more than two groups are analyzed, such as:pairwise_ttests
: Pairwise T-tests (either for independent or dependent samples).pairwise_tukey
: Pairwise Tukey-HSD post-hoc test.pairwise_gameshowell
: Pairwise Games-Howell post-hoc test.
A StatsPipeline consists of a list of tuples specifying the individual steps
of the pipeline. The first value of each tuple indicates the category this step belongs to (prep
, test
, or posthoc
), the second value indicates the analysis function to use in this step (e.g., normality
, or rm_anova
).
Furthermore, a params
dictionary specifying the parameters and variables for statistical analysis needs to be supplied. Parameters can either be specified globally, i.e., for all steps in the pipeline (the default), or locally, i.e., only for one specific category, by prepending the category name and separating it from the parameter name by a __
. The parameters depend on the type of analysis used in the pipeline.
Examples are:
dv
: column name of the dependent variablebetween
: column name of the between-subject factorwithin
: column name of the within-subject factoreffsize
: type of effect size to compute (if applicable)multicomp
: whether (and how) to apply multi-comparison correction of p-values to the last step in the pipeline (either “test” or “posthoc”) using`StatsPipeline.multicomp()
<https://biopsykit.readthedocs.io/en/latest/api/biopsykit.stats.html#biopsykit.stats.StatsPipeline.multicomp>`__. The arguments are supplied via dictionary.…
More information can be found here: StatsPipeline
T-Tests¶
[7]:
cort_features.head()
[7]:
cortisol | |||
---|---|---|---|
condition | subject | saliva_feature | |
Control | Vp03 | auc_g | 870.418075 |
auc_i | -72.035925 | ||
auc_i_post | 104.057250 | ||
max_inc | 0.956000 | ||
slope14 | 0.013971 |
cort_features
contains multiple features that need to be analyzed individually. The analysis pipeline can be applied to each feature individually by specifying a column to group the dataframe by (groupby
parameter). The result dataframes will then contain the groupby
column as index level.
[8]:
pipeline = StatsPipeline(
steps=[("prep", "normality"), ("prep", "equal_var"), ("test", "pairwise_tests")],
params={"dv": "cortisol", "groupby": "saliva_feature", "between": "condition"},
)
pipeline.apply(cort_features);
Display Results¶
[9]:
# display all results
pipeline.display_results()
# don't show results from the "prep" cagegory
# pipeline.display_results(prep=False)
# only significant results
# pipeline.display_results(sig_only=True)
# only significant results from the "posthoc" category (results from other categories will all be displayed)
# pipeline.display_results(sig_only="posthoc")
Overview
dv | groupby | between | |
---|---|---|---|
parameter | cortisol | saliva_feature | condition |
prep | prep | test | |
---|---|---|---|
parameter | normality | equal_var | pairwise_tests |
Preparatory Analysis
Test for Normal Distribution
W | pval | normal | ||
---|---|---|---|---|
saliva_feature | condition | |||
auc_g | Control | 0.898948 | 0.1294 | True |
Intervention | 0.895831 | 0.1172 | True | |
auc_i | Control | 0.836432 | 0.0191 | False |
Intervention | 0.950165 | 0.6008 | True | |
auc_i_post | Control | 0.852757 | 0.0310 | False |
Intervention | 0.941383 | 0.4751 | True | |
max_inc | Control | 0.853697 | 0.0318 | False |
Intervention | 0.983297 | 0.9917 | True | |
slope14 | Control | 0.857235 | 0.0354 | False |
Intervention | 0.965331 | 0.8330 | True |
Test for Homoscedasticity (Equal Variances)
W | pval | equal_var | ||
---|---|---|---|---|
saliva_feature | ||||
auc_g | levene | 1.972959 | 0.1729 | True |
auc_i | levene | 0.999657 | 0.3274 | True |
auc_i_post | levene | 0.254504 | 0.6185 | True |
max_inc | levene | 1.946181 | 0.1758 | True |
slope14 | levene | 0.782662 | 0.3851 | True |
Statistical Tests
Pairwise Tests
Contrast | A | B | Paired | Parametric | T | dof | alternative | p-unc | BF10 | hedges | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|
saliva_feature | ||||||||||||
auc_g | 0 | condition | Control | Intervention | False | True | 0.934452 | 24.0 | two-sided | 0.3594 | 0.5 | 0.354948 |
auc_i | 0 | condition | Control | Intervention | False | True | 1.970669 | 24.0 | two-sided | 0.0604 | 1.439 | 0.748551 |
auc_i_post | 0 | condition | Control | Intervention | False | True | 1.910518 | 24.0 | two-sided | 0.0681 | 1.33 | 0.725703 |
max_inc | 0 | condition | Control | Intervention | False | True | 2.169696 | 24.0 | two-sided | 0.0402 | 1.896 | 0.824150 |
slope14 | 0 | condition | Control | Intervention | False | True | 2.201028 | 24.0 | two-sided | 0.0376 | 1.984 | 0.836051 |
Further functions of StatsPipeline
¶
Analysis categories and their respective analysis steps
[10]:
pipeline.category_steps
[10]:
{'prep': ['normality', 'equal_var'], 'test': ['pairwise_tests']}
Dictionary with analysis results per step
[11]:
results = pipeline.results
Get results from normality check
[12]:
results["normality"]
[12]:
W | pval | normal | ||
---|---|---|---|---|
saliva_feature | condition | |||
auc_g | Control | 0.898948 | 0.129354 | True |
Intervention | 0.895831 | 0.117228 | True | |
auc_i | Control | 0.836432 | 0.019092 | False |
Intervention | 0.950165 | 0.600787 | True | |
auc_i_post | Control | 0.852757 | 0.030958 | False |
Intervention | 0.941383 | 0.475055 | True | |
max_inc | Control | 0.853697 | 0.031845 | False |
Intervention | 0.983297 | 0.991731 | True | |
slope14 | Control | 0.857235 | 0.035429 | False |
Intervention | 0.965331 | 0.833049 | True |
Return only results from one category
[13]:
pipeline.results_cat("test")
[13]:
Contrast | A | B | Paired | Parametric | T | dof | alternative | p-unc | BF10 | hedges | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|
saliva_feature | ||||||||||||
auc_g | 0 | condition | Control | Intervention | False | True | 0.934452 | 24.0 | two-sided | 0.359378 | 0.5 | 0.354948 |
auc_i | 0 | condition | Control | Intervention | False | True | 1.970669 | 24.0 | two-sided | 0.060402 | 1.439 | 0.748551 |
auc_i_post | 0 | condition | Control | Intervention | False | True | 1.910518 | 24.0 | two-sided | 0.068085 | 1.33 | 0.725703 |
max_inc | 0 | condition | Control | Intervention | False | True | 2.169696 | 24.0 | two-sided | 0.040155 | 1.896 | 0.824150 |
slope14 | 0 | condition | Control | Intervention | False | True | 2.201028 | 24.0 | two-sided | 0.037595 | 1.984 | 0.836051 |
Export the whole pipeline as Excel sheet
[14]:
# pipeline.export_statistics("path_to_file.xlsx")
LaTeX Output¶
Inline LaTeX Code¶
Calling StatsPipeline.stats_to_latex() generates LaTeX code for each row. The output can be copied and pasted into LaTeX documents.
[15]:
pipeline.stats_to_latex("pairwise_tests")
[15]:
saliva_feature Contrast A B
auc_g 0 condition Control Intervention $t(24) = 0.934, p = 0.359, g = 0.355$
auc_i 0 condition Control Intervention $t(24) = 1.971, p = 0.060, g = 0.749$
auc_i_post 0 condition Control Intervention $t(24) = 1.911, p = 0.068, g = 0.726$
max_inc 0 condition Control Intervention $t(24) = 2.170, p = 0.040, g = 0.824$
slope14 0 condition Control Intervention $t(24) = 2.201, p = 0.038, g = 0.836$
dtype: object
Generate LaTeX output for selected rows:
[16]:
pipeline.stats_to_latex("pairwise_tests", "auc_g")
[16]:
'$t(24) = 0.934, p = 0.359, g = 0.355$'
LaTeX Table¶
StatsPipeline.results_to_latex_table() uses DataFrame.to_latex() to convert a pandas dataframe into a LaTeX table.
Default Output¶
[17]:
results_out = pipeline.results_to_latex_table("pairwise_tests")
print(results_out)
\begin{table}[th!]
\centering
\sisetup{table-format = <1.3}
\begin{tabular}{l||SSS}
\toprule
{} & {$t(24)$} & {p} & {Hedges' g} \\
\textit{saliva_feature} & & & \\
\midrule
\textit{auc_g} & 0.934 & 0.359$^{}$ & 0.355 \\
\textit{auc_i} & 1.971 & 0.060$^{}$ & 0.749 \\
\textit{auc_i_post} & 1.911 & 0.068$^{}$ & 0.726 \\
\textit{max_inc} & 2.170 & 0.040$^{*}$ & 0.824 \\
\textit{slope14} & 2.201 & 0.038$^{*}$ & 0.836 \\
\bottomrule
\end{tabular}
\end{table}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/stats/stats.py:828: FutureWarning: In future versions `DataFrame.to_latex` is expected to utilise the base implementation of `Styler.to_latex` for formatting and rendering. The arguments signature may therefore change. It is recommended instead to use `DataFrame.style.to_latex` which also contains additional functionality.
data_latex = data.to_latex(**kwargs)
With Caption and Label¶
[18]:
results_out = pipeline.results_to_latex_table("pairwise_tests", caption="This is a caption.", label="tab:label")
print(results_out)
\begin{table}[th!]
\centering
\caption{This is a caption.}
\label{tab:label}
\sisetup{table-format = <1.3}
\begin{tabular}{l||SSS}
\toprule
{} & {$t(24)$} & {p} & {Hedges' g} \\
\textit{saliva_feature} & & & \\
\midrule
\textit{auc_g} & 0.934 & 0.359$^{}$ & 0.355 \\
\textit{auc_i} & 1.971 & 0.060$^{}$ & 0.749 \\
\textit{auc_i_post} & 1.911 & 0.068$^{}$ & 0.726 \\
\textit{max_inc} & 2.170 & 0.040$^{*}$ & 0.824 \\
\textit{slope14} & 2.201 & 0.038$^{*}$ & 0.836 \\
\bottomrule
\end{tabular}
\end{table}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/stats/stats.py:828: FutureWarning: In future versions `DataFrame.to_latex` is expected to utilise the base implementation of `Styler.to_latex` for formatting and rendering. The arguments signature may therefore change. It is recommended instead to use `DataFrame.style.to_latex` which also contains additional functionality.
data_latex = data.to_latex(**kwargs)
Further Customization¶
By default, the column format for columns that contain numbers is “S” which is provided by the siunitx package. The column format can be configured by the si_table_format
argument. The default table format is "<1.3"
.
Additionally, StatsPipeline.results_to_latex_table() provides many options to customize the LaTeX table output:
collapse_dof
:True
to collapse degree(s)-of-freedom (dof) from a separate column into the column header of the t- or F-value, respectively,False
to keep it as separate “dof” column(s). This only works if the degrees-of-freedom are the same for all tests in the table. Default:True
unstack_levels
: name(s) of dataframe index level(s) to be unstacked in the resulting LaTeX table orNone
to not unstack any index level(s). Default:None
All other arguments that are passed down to DataFrame.to_latex.
[19]:
results_out = pipeline.results_to_latex_table("pairwise_tests", collapse_dof=False, si_table_format="table-format=1.1")
print(results_out)
\begin{table}[th!]
\centering
\sisetup{table-format=1.1}
\begin{tabular}{l||SSSS}
\toprule
{} & {t} & {df} & {p} & {Hedges' g} \\
\textit{saliva_feature} & & & & \\
\midrule
\textit{auc_g} & 0.934 & 24 & 0.359$^{}$ & 0.355 \\
\textit{auc_i} & 1.971 & 24 & 0.060$^{}$ & 0.749 \\
\textit{auc_i_post} & 1.911 & 24 & 0.068$^{}$ & 0.726 \\
\textit{max_inc} & 2.170 & 24 & 0.040$^{*}$ & 0.824 \\
\textit{slope14} & 2.201 & 24 & 0.038$^{*}$ & 0.836 \\
\bottomrule
\end{tabular}
\end{table}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/stats/stats.py:828: FutureWarning: In future versions `DataFrame.to_latex` is expected to utilise the base implementation of `Styler.to_latex` for formatting and rendering. The arguments signature may therefore change. It is recommended instead to use `DataFrame.style.to_latex` which also contains additional functionality.
data_latex = data.to_latex(**kwargs)
Index Customization¶
The index of the LaTeX table can be further configured by passing further arguments as a index_kws
dict:
index_italic
:True
to format index columns in italic,False
otherwise. Default:True
index_level_order
: list of index level names indicating the index level order of a MultiIndex in the LaTeX table. IfNone
the index order of the dataframe will be usedindex_value_order
: list of index values if rows in LaTeX table should have a different order than the underlying dataframe or if only specific rows should be exported as LaTeX table. If the table index is a MultiIndex thenindex_value_order
should be a dictionary with the index level names as keys and lists of index values of the specific level as valuesindex_rename_map
: mapping with dictionary with index values as keys and new index values to be exportedindex_level_names_tex
: names of index levels in the LaTeX table orNone
to keep the index level names of the dataframe
[20]:
index_kws = {
"index_italic": False,
"index_rename_map": {
"auc_g": "$AUC_{G}$",
"auc_i": "$AUC_{I}$",
"auc_i_post": "$AUC_{I}^{Post}$",
"max_inc": "$\Delta c_{max}$",
"slope14": "$a_{S1S4}$",
},
"index_level_names_tex": "Saliva Feature",
}
results_out = pipeline.results_to_latex_table("pairwise_tests", index_kws=index_kws)
print(results_out)
\begin{table}[th!]
\centering
\sisetup{table-format = <1.3}
\begin{tabular}{l||SSS}
\toprule
{} & {$t(24)$} & {p} & {Hedges' g} \\
Saliva Feature & & & \\
\midrule
$AUC_{G}$ & 0.934 & 0.359$^{}$ & 0.355 \\
$AUC_{I}$ & 1.971 & 0.060$^{}$ & 0.749 \\
$AUC_{I}^{Post}$ & 1.911 & 0.068$^{}$ & 0.726 \\
$\Delta c_{max}$ & 2.170 & 0.040$^{*}$ & 0.824 \\
$a_{S1S4}$ & 2.201 & 0.038$^{*}$ & 0.836 \\
\bottomrule
\end{tabular}
\end{table}
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/src/biopsykit/stats/stats.py:828: FutureWarning: In future versions `DataFrame.to_latex` is expected to utilise the base implementation of `Styler.to_latex` for formatting and rendering. The arguments signature may therefore change. It is recommended instead to use `DataFrame.style.to_latex` which also contains additional functionality.
data_latex = data.to_latex(**kwargs)
Mixed ANOVA¶
To demonstrate Mixed-ANOVA analysis we construct some artificial example:
[21]:
data_example = multi_xs(cort_samples, ["2", "3"], level="sample")
[22]:
data_example.head()
[22]:
cortisol | |||
---|---|---|---|
condition | subject | sample | |
Control | Vp03 | 2 | 10.10115 |
3 | 14.23000 | ||
Vp06 | 2 | 8.42920 | |
3 | 12.12950 | ||
Vp08 | 2 | 3.09960 |
[23]:
pipeline = StatsPipeline(
steps=[("prep", "normality"), ("prep", "equal_var"), ("test", "mixed_anova"), ("posthoc", "pairwise_tests")],
params={
"dv": "cortisol",
"between": "condition",
"within": "sample",
"subject": "subject",
"padjust": "bonf", # specify multicorrection method to be applied on the posthoc tests
},
)
pipeline.apply(data_example)
pipeline.display_results()
Overview
dv | between | within | subject | padjust | |
---|---|---|---|---|---|
parameter | cortisol | condition | sample | subject | bonf |
prep | prep | test | posthoc | |
---|---|---|---|---|
parameter | normality | equal_var | mixed_anova | pairwise_tests |
Preparatory Analysis
Test for Normal Distribution
W | pval | normal | ||
---|---|---|---|---|
sample | condition | |||
2 | Control | 0.879827 | 0.0709 | True |
Intervention | 0.941023 | 0.4703 | True | |
3 | Control | 0.892834 | 0.1067 | True |
Intervention | 0.877228 | 0.0654 | True |
Test for Homoscedasticity (Equal Variances)
W | pval | equal_var | ||
---|---|---|---|---|
sample | ||||
2 | levene | 0.676865 | 0.4188 | True |
3 | levene | 3.976012 | 0.0576 | True |
Statistical Tests
Mixed ANOVA
Source | SS | DF1 | DF2 | MS | F | p-unc | np2 | eps | |
---|---|---|---|---|---|---|---|---|---|
0 | condition | 38.270919 | 1 | 24 | 38.270919 | 1.611756 | 0.2164 | 0.062930 | NaN |
1 | sample | 12.540926 | 1 | 24 | 12.540926 | 8.048454 | 0.0091 | 0.251134 | 1.0 |
2 | Interaction | 10.106663 | 1 | 24 | 10.106663 | 6.486204 | 0.0177 | 0.212759 | NaN |
Post-Hoc Analysis
Pairwise Tests
Contrast | sample | A | B | Paired | Parametric | T | dof | alternative | p-unc | p-corr | p-adjust | BF10 | hedges | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | sample | - | 2 | 3 | True | True | -2.569062 | 25.0 | two-sided | 0.0166 | NaN | NaN | 3.098 | -0.267144 |
1 | condition | - | Control | Intervention | False | True | 1.269549 | 24.0 | two-sided | 0.2164 | NaN | NaN | 0.653 | 0.482233 |
2 | sample * condition | 2 | Control | Intervention | False | True | 0.681959 | 24.0 | two-sided | 0.5018 | 1.0000 | bonf | 0.431 | 0.259040 |
3 | sample * condition | 3 | Control | Intervention | False | True | 1.677748 | 24.0 | two-sided | 0.1064 | 0.2127 | bonf | 0.998 | 0.637286 |
Repeated-Measurements ANOVA¶
[24]:
data_slice = data_example.xs("Control", level="condition")
pipeline = StatsPipeline(
steps=[("prep", "normality"), ("prep", "equal_var"), ("test", "rm_anova"), ("posthoc", "pairwise_tests")],
params={"dv": "cortisol", "within": "sample", "subject": "subject"},
)
pipeline.apply(data_slice)
pipeline.display_results()
Overview
dv | within | subject | |
---|---|---|---|
parameter | cortisol | sample | subject |
prep | prep | test | posthoc | |
---|---|---|---|---|
parameter | normality | equal_var | rm_anova | pairwise_tests |
Preparatory Analysis
Test for Normal Distribution
W | pval | normal | |
---|---|---|---|
sample | |||
2 | 0.879827 | 0.0709 | True |
3 | 0.892834 | 0.1067 | True |
Test for Homoscedasticity (Equal Variances)
W | pval | equal_var | |
---|---|---|---|
levene | 1.094124 | 0.306 | True |
Statistical Tests
Repeated-measurement ANOVA
Source | ddof1 | ddof2 | F | p-unc | ng2 | eps | |
---|---|---|---|---|---|---|---|
0 | sample | 1 | 12 | 9.099312 | 0.0107 | 0.048474 | 1.0 |
Post-Hoc Analysis
Pairwise Tests
Contrast | A | B | Paired | Parametric | T | dof | alternative | p-unc | BF10 | hedges | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | sample | 2 | 3 | True | True | -3.016507 | 12.0 | two-sided | 0.0107 | 5.469 | -0.420007 |
Get Significance Brackets from StatsPipeline¶
StatsPipeline.sig_brackets() returns the significance brackets and the corresponding p-values to add to the plotting functions of BioPsyKit
.
The method takes the following parameters (from the documentation):
stats_category_or_data
: either a string of the pipeline category to use for generating significance brackets or a dataframe with statistical if significance brackets should be generated from the dataframestats_effect_type
: effect type of analysis performed (“between”, “within”, “interaction”). Needed to extract the correct information from the analysis dataframeplot_type
: type of plot for which significance brackets are generated: “multi” if boxplots are grouped (by ahue
variable), “single” (the default) otherwisefeatures
: feature(s) displayed in the boxplot. The resulting significance brackets will be filtered accordingly to only contain features present in the boxplot. It can have the following formats:str
: only one feature is plotted in the boxplot => returns significance brackets of only one featurelist
: multiple features are combined into oneAxes
object (i.e., no subplots) => returns significance brackets of multiple featuresdict
: if boxplots of features are organized in subplots thenfeatures
needs to dictionary with the feature (or list of features) per subplot (subplots
isTrue
) => returns dictionary with significance brackets per subplot
x
: name of column used asx
axis in the boxplot. Only required ifplot_type
is “multi”.
Single Plot – One Feature¶
This example shows how to add significance brackets to a single plot with a single boxplot where only one type of feature is plotted (e.g., only to display max_inc
feature, where the different groups are separated by the x
variable).
If the feature to be plotted is only a subset of different features, it must be filtered via the features
parameter:
[25]:
# rerun the t-test pipeline for plotting
pipeline = StatsPipeline(
steps=[("prep", "normality"), ("prep", "equal_var"), ("test", "pairwise_tests")],
params={"dv": "cortisol", "groupby": "saliva_feature", "between": "condition"},
)
pipeline.apply(cort_features);
[26]:
box_pairs, pvalues = pipeline.sig_brackets("test", stats_effect_type="between", plot_type="single", features="max_inc")
print(box_pairs)
[('Control', 'Intervention')]
Single Plot – Multiple Features¶
This example shows how to add significance brackets to a single plot with multiple boxplots where multiple types of features are plotted (e.g., to display the max_inc
and slope14
features, where different features are separated by the x
variable and the different groups are separated by the hue
variable).
If only a subset of features should be plotted, the features of interest must be filtered via the features
parameter:
[27]:
box_pairs, pvalues = pipeline.sig_brackets(
"test", stats_effect_type="between", plot_type="multi", x="saliva_feature", features=["max_inc", "slope14"]
)
print(box_pairs)
[(('max_inc', 'Control'), ('max_inc', 'Intervention')), (('slope14', 'Control'), ('slope14', 'Intervention'))]
Multiple Plots – Multiple Features¶
This example shows how to add significance brackets to multiple subplots with multiple boxplots per subplot (e.g., to display the max_inc
and the slope14
feature where max_inc
and slope14
each have their own subplot, the features are plotted along the x
variable and the different groups are separated by the hue
variable). For creating significance brackets to be used in subplots set the subplots
parameter to True
.
By default, each feature is assumed to be in its own subplot (see the next example if you want to change the behavior).
If only a subset of features should be plotted, the features of interest must be filtered via the features
parameter:
[28]:
box_pairs, pvalues = pipeline.sig_brackets(
"test",
stats_effect_type="between",
plot_type="multi",
x="saliva_feature",
features=["max_inc", "slope14"],
subplots=True,
)
print(box_pairs)
{'max_inc': [(('max_inc', 'Control'), ('max_inc', 'Intervention'))], 'slope14': [(('slope14', 'Control'), ('slope14', 'Intervention'))]}
If you want to specify a custom mapping of features to subplots you can provide a dictionary specifying this mapping as features
argument (here, we want to have the features auc_i
and auc_g
in one subplot, and max_inc
and slope14
in another subplot):
[29]:
box_pairs, pvalues = pipeline.sig_brackets(
"test",
stats_effect_type="between",
plot_type="multi",
x="saliva_feature",
features={"auc": ["auc_i", "auc_g"], "inc": ["max_inc", "slope14"]},
subplots=True,
)
print(box_pairs)
{'auc': [], 'inc': [(('max_inc', 'Control'), ('max_inc', 'Intervention')), (('slope14', 'Control'), ('slope14', 'Intervention'))]}
Plotting¶
Single Plot – One Feature¶
This example shows how to plot a single feature in a single boxplot using plotting.feature_boxplot(). The two conditions are plotted along the x
axis.
[30]:
features = "max_inc"
data_plot = multi_xs(cort_features, features, level="saliva_feature")
box_pairs, pvalues = pipeline.sig_brackets("test", stats_effect_type="between", plot_type="single", features=features)
fig, ax = plt.subplots()
bp.plotting.feature_boxplot(
data=data_plot,
x="condition",
y="cortisol",
stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues, "verbose": 0},
ax=ax,
)
fig.tight_layout()
Single Plot – Multiple Features¶
This example is the same as the one in the Single Plot – One Feature example above, but this time, the (single) feature is plotted along the x
axis and the two groups are separated by the hue
parameter. This makes it plot_type
“multi” and thus requires to specify the x
parameter in StatsPipeline.sig_brackets().
[31]:
features = "max_inc"
data_plot = multi_xs(cort_features, features, level="saliva_feature")
box_pairs, pvalues = pipeline.sig_brackets(
"test", stats_effect_type="between", plot_type="multi", x="saliva_feature", features=features
)
fig, ax = plt.subplots()
bp.plotting.feature_boxplot(
data=data_plot,
x="saliva_feature",
y="cortisol",
hue="condition",
stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues, "verbose": 0},
ax=ax,
);
Now with a “real” example: We use plotting.feature_boxplot() to plot actually multiple features along the x
axis with the hue
variable separating the conditions.
In this example, however, no feature shows statistically significant differences between the two conditions.
[32]:
features = ["auc_g", "auc_i"]
data_plot = multi_xs(cort_features, features, level="saliva_feature")
box_pairs, pvalues = pipeline.sig_brackets(
"test", stats_effect_type="between", features=features, plot_type="multi", x="saliva_feature"
)
fig, ax = plt.subplots()
bp.plotting.feature_boxplot(
data=data_plot,
x="saliva_feature",
y="cortisol",
hue="condition",
stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues, "verbose": 0},
ax=ax,
);
Multiple Plots – Multiple Features¶
This example shows how to plot features in multiple subplots with multiple boxplots per subplot. Here, max_inc
and the slope14
features are displayed in their own subplot and the features auc_i
and auc_g
are combined into one joint subplot. The features are separated by the x
variable and the different groups are separated by the hue
variable.
This custom mapping of features to subplots can be provided via a dictionary passed to features
argument.
[33]:
features = {"auc": ["auc_g", "auc_i"], "max_inc": "max_inc", "slope14": "slope14"}
box_pairs, pvalues = pipeline.sig_brackets(
"test", stats_effect_type="between", plot_type="multi", x="saliva_feature", features=features, subplots=True
)
data_plot = cort_features.copy()
bp.plotting.multi_feature_boxplot(
data=data_plot,
x="saliva_feature",
y="cortisol",
hue="condition",
group="saliva_feature",
features=features,
stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues, "verbose": 0},
)
fig.tight_layout()
Use Specialized saliva_feature_boxplot() functions¶
In addition to the “general-purpose” plotting functions BioPsyKit
also offers specialized plotting functions for saliva features since plotting saliva data is a commonly performed task. These functions offer a better styling of axis and labels for saliva data.
[34]:
# rerun the pipeline
pipeline = StatsPipeline(
steps=[("prep", "normality"), ("prep", "equal_var"), ("test", "pairwise_tests")],
params={"dv": "cortisol", "groupby": "saliva_feature", "between": "condition"},
)
pipeline.apply(cort_features)
pipeline.display_results()
Overview
dv | groupby | between | |
---|---|---|---|
parameter | cortisol | saliva_feature | condition |
prep | prep | test | |
---|---|---|---|
parameter | normality | equal_var | pairwise_tests |
Preparatory Analysis
Test for Normal Distribution
W | pval | normal | ||
---|---|---|---|---|
saliva_feature | condition | |||
auc_g | Control | 0.898948 | 0.1294 | True |
Intervention | 0.895831 | 0.1172 | True | |
auc_i | Control | 0.836432 | 0.0191 | False |
Intervention | 0.950165 | 0.6008 | True | |
auc_i_post | Control | 0.852757 | 0.0310 | False |
Intervention | 0.941383 | 0.4751 | True | |
max_inc | Control | 0.853697 | 0.0318 | False |
Intervention | 0.983297 | 0.9917 | True | |
slope14 | Control | 0.857235 | 0.0354 | False |
Intervention | 0.965331 | 0.8330 | True |
Test for Homoscedasticity (Equal Variances)
W | pval | equal_var | ||
---|---|---|---|---|
saliva_feature | ||||
auc_g | levene | 1.972959 | 0.1729 | True |
auc_i | levene | 0.999657 | 0.3274 | True |
auc_i_post | levene | 0.254504 | 0.6185 | True |
max_inc | levene | 1.946181 | 0.1758 | True |
slope14 | levene | 0.782662 | 0.3851 | True |
Statistical Tests
Pairwise Tests
Contrast | A | B | Paired | Parametric | T | dof | alternative | p-unc | BF10 | hedges | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|
saliva_feature | ||||||||||||
auc_g | 0 | condition | Control | Intervention | False | True | 0.934452 | 24.0 | two-sided | 0.3594 | 0.5 | 0.354948 |
auc_i | 0 | condition | Control | Intervention | False | True | 1.970669 | 24.0 | two-sided | 0.0604 | 1.439 | 0.748551 |
auc_i_post | 0 | condition | Control | Intervention | False | True | 1.910518 | 24.0 | two-sided | 0.0681 | 1.33 | 0.725703 |
max_inc | 0 | condition | Control | Intervention | False | True | 2.169696 | 24.0 | two-sided | 0.0402 | 1.896 | 0.824150 |
slope14 | 0 | condition | Control | Intervention | False | True | 2.201028 | 24.0 | two-sided | 0.0376 | 1.984 | 0.836051 |
Single Plot – One Feature¶
[35]:
features = "max_inc"
data_plot = multi_xs(cort_features, features, level="saliva_feature")
box_pairs, pvalues = pipeline.sig_brackets("test", stats_effect_type="between", features=features, plot_type="single")
fig, ax = plt.subplots()
bp.protocols.plotting.saliva_feature_boxplot(
data=data_plot,
x="condition",
saliva_type="cortisol",
feature=features,
stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues, "verbose": 0},
ax=ax,
)
fig.tight_layout()
Single Plot – Multiple Feature¶
[36]:
features = ["auc_g", "auc_i"]
data_plot = multi_xs(cort_features, features, level="saliva_feature")
box_pairs, pvalues = pipeline.sig_brackets(
"test", stats_effect_type="between", features=features, plot_type="multi", x="saliva_feature"
)
fig, ax = plt.subplots()
bp.protocols.plotting.saliva_feature_boxplot(
data=data_plot,
x="saliva_feature",
saliva_type="cortisol",
hue="condition",
feature=features,
stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues, "verbose": 0},
ax=ax,
)
fig.tight_layout()
Plot Multiple Features in Subplots¶
Using saliva_multi_feature_boxplot():
[37]:
features = {"auc": ["auc_g", "auc_i"], "max_inc": "max_inc", "slope14": "slope14"}
box_pairs, pvalues = pipeline.sig_brackets(
"test", stats_effect_type="between", plot_type="multi", x="saliva_feature", features=features, subplots=True
)
data_plot = cort_features.copy()
bp.protocols.plotting.saliva_multi_feature_boxplot(
data=data_plot,
saliva_type="cortisol",
features=features,
hue="condition",
stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues, "verbose": 0},
)
fig.tight_layout()
[ ]: