Saliva Example¶
This example illustrates how to import saliva data (cortisol, amylase, etc.), how to compute often used parameters and how to export it to perform futher analysis.
Setup and Helper Functions¶
[1]:
from pathlib import Path
import re
import pandas as pd
import numpy as np
from fau_colors import cmaps
import biopsykit as bp
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
%load_ext autoreload
%autoreload 2
[2]:
plt.close("all")
palette = sns.color_palette(cmaps.faculties)
sns.set_theme(context="notebook", style="ticks", font="sans-serif", palette=palette)
plt.rcParams["figure.figsize"] = (8, 4)
plt.rcParams["pdf.fonttype"] = 42
plt.rcParams["mathtext.default"] = "regular"
palette
[2]:
Set Saliva Time Points¶
[3]:
sample_times = [-30, -1, 30, 40, 50, 60, 70]
Load Condition List¶
Load example data
[4]:
condition_list = bp.example_data.get_condition_list_example()
condition_list.head()
[4]:
condition | |
---|---|
subject | |
Vp01 | Intervention |
Vp02 | Intervention |
Vp03 | Control |
Vp04 | Intervention |
Vp05 | Intervention |
Alternatively: Load your own data
[5]:
# condition_list = bp.io.load_subject_condition_list("<path-to-condition-list-file>")
# condition_list.head()
Load Data¶
Option 0: Load BioPsyKit example data¶
The example data will be imported and converted into a long-format dataframe:
[6]:
df_cort = bp.example_data.get_saliva_example()
df_cort.head()
[6]:
cortisol | |||
---|---|---|---|
condition | subject | sample | |
Intervention | Vp01 | 0 | 6.98890 |
1 | 7.03320 | ||
2 | 5.77670 | ||
3 | 5.25790 | ||
4 | 5.00795 |
Compute Mean and Standard Error¶
Mean and standard error is computed in saliva.mean_se() for the two conditions and for each sample separately.
[7]:
bp.saliva.mean_se(df_cort, saliva_type="cortisol")
[7]:
mean | se | ||
---|---|---|---|
condition | sample | ||
Control | 0 | 4.569485 | 0.708467 |
1 | 5.367404 | 0.956169 | |
2 | 5.805423 | 1.005129 | |
3 | 7.669331 | 1.353230 | |
4 | 7.737588 | 1.479832 | |
5 | 6.964650 | 1.396808 | |
6 | 5.916773 | 1.128819 | |
Intervention | 0 | 6.041300 | 0.788725 |
1 | 5.934754 | 0.867792 | |
2 | 4.971362 | 0.696804 | |
3 | 5.071823 | 0.752146 | |
4 | 5.105092 | 0.917503 | |
5 | 4.932746 | 0.914560 | |
6 | 4.645919 | 0.869791 |
Plot Saliva Data¶
[8]:
bp.plotting.lineplot(data=df_cort, x="sample", y="cortisol", hue="condition", style="condition");
Example: Subject Exclusion¶
For this example, we assume we want to exclude the subjects ‘Vp01’ and ‘Vp02’ from both the condition list and the dataframe with cortisol samples using data_processing.exclude_subjects()
[9]:
dict_result = bp.utils.data_processing.exclude_subjects(
["Vp01", "Vp02"], condition_list=condition_list, cortisol=df_cort
)
dict_result
# uncomment to reassign the data with excluded subjects to the original variable names
# df_cort = dict_result['cortisol']
# condition_list = dict_result['condition_list']
[9]:
{'condition_list': condition
subject
Vp03 Control
Vp04 Intervention
Vp05 Intervention
Vp06 Control
Vp07 Intervention
Vp08 Control
Vp09 Control
Vp10 Intervention
Vp11 Control
Vp15 Intervention
Vp16 Intervention
Vp17 Control
Vp18 Intervention
Vp19 Control
Vp20 Intervention
Vp21 Control
Vp22 Control
Vp24 Control
Vp25 Intervention
Vp26 Intervention
Vp27 Intervention
Vp28 Intervention
Vp30 Control
Vp31 Control
Vp32 Control
Vp33 Control,
'cortisol': cortisol
condition subject sample
Control Vp03 0 5.39850
1 13.27400
2 10.10115
3 14.23000
4 13.98650
... ...
Vp33 2 7.78115
3 8.74815
4 7.66505
5 7.75460
6 6.72890
[168 rows x 1 columns]}
Option 1: Use BioPsyKit to load saliva data in ‘plate’ format¶
The data is converted into long-format (SalivaRawDataFrame) and returned.
Load saliva data into pandas dataframe (example data using biopsykit.example_data.get_saliva_example_plate_format() or own data in ‘plate’-format using biopsykit.io.saliva.load_saliva_plate()):
[10]:
df_cort = bp.example_data.get_saliva_example_plate_format()
# alternatively: load own data that is present in "plate" format
# df_cort = bp.io.saliva.load_saliva_plate(file_path="<path-to-saliva-data-in-plate-format>", saliva_type="cortisol")
[11]:
df_cort.head()
[11]:
cortisol | ||
---|---|---|
subject | sample | |
Vp01 | S0 | 6.98890 |
S1 | 7.03320 | |
S2 | 5.77670 | |
S3 | 5.25790 | |
S4 | 5.00795 |
We can additionally directly pass a ‘condition list’ to the data loader function (for example data loader as well as for your own data). This allows us to directly assign subject conditions to the saliva samples:
[12]:
df_cort = bp.example_data.get_saliva_example_plate_format(condition_list=condition_list)
# alternatively: load own data in "plate" format and directly assign subject conditions
# df_cort = bp.io.saliva.load_saliva_plate(file_path="<path-to-saliva-data-in-plate-format>", saliva_type="cortisol", condition_list=condition_list)
df_cort.head()
[12]:
cortisol | |||
---|---|---|---|
condition | subject | sample | |
Intervention | Vp01 | S0 | 6.98890 |
S1 | 7.03320 | ||
S2 | 5.77670 | ||
S3 | 5.25790 | ||
S4 | 5.00795 |
Speficy your custom regular expression string to extract Subject ID and Saliva ID (see the documentation of biopsykit.io.saliva.load_saliva_plate() for further information).
For example, the regex string in regex_str
will extract the subject IDs without the Vp
prefix and sample IDs without the S
prefix:
[13]:
regex_str = "Vp(\d+) S(\d)"
df_cort = bp.example_data.get_saliva_example_plate_format(regex_str=regex_str)
# works analogously for bp.io.saliva.load_saliva_plate()
df_cort.head()
[13]:
cortisol | ||
---|---|---|
subject | sample | |
01 | 0 | 6.98890 |
1 | 7.03320 | |
2 | 5.77670 | |
3 | 5.25790 | |
4 | 5.00795 |
Option 2: Use BioPsyKit to load saliva data that’s already in the “correct” wide format¶
During import, the data is converted into long-format (SalivaRawDataFrame) and returned.
[14]:
df_cort = bp.example_data.get_saliva_example()
# Alternatively, use your own data:
# df_cort = bp.io.saliva.load_saliva_wide_format(file_path="<path-to-cortisol-data-in-wide-format.csv>", saliva_type="cortisol")
[15]:
df_cort.head()
[15]:
cortisol | |||
---|---|---|---|
condition | subject | sample | |
Intervention | Vp01 | 0 | 6.98890 |
1 | 7.03320 | ||
2 | 5.77670 | ||
3 | 5.25790 | ||
4 | 5.00795 |
Save Data¶
Save Dataframe as csv (in standardized format)
[16]:
# bp.io.saliva.save_saliva("<export-path.csv>", df_cort)
Saliva Data Processing¶
Saliva Data in SalivaRawDataFrame
[17]:
df_cort.head()
[17]:
cortisol | |||
---|---|---|---|
condition | subject | sample | |
Intervention | Vp01 | 0 | 6.98890 |
1 | 7.03320 | ||
2 | 5.77670 | ||
3 | 5.25790 | ||
4 | 5.00795 |
Mean and Standard Error over all Subjects¶
Computing mean and standard error over all subjects returns a SalivaMeanSeDataFrame. This dataframe can directly be used for plotting functions from BioPsyKit
, such as plotting.saliva_plot() (more on that later).
[18]:
df_cort_mean_se = bp.saliva.mean_se(df_cort)
df_cort_mean_se.head()
[18]:
mean | se | ||
---|---|---|---|
condition | sample | ||
Control | 0 | 4.569485 | 0.708467 |
1 | 5.367404 | 0.956169 | |
2 | 5.805423 | 1.005129 | |
3 | 7.669331 | 1.353230 | |
4 | 7.737588 | 1.479832 |
Saliva Features¶
Standard Features¶
Compute a set of “Standard Features”, including:
argmax
: location of maximummean
: mean valuestd
: standard deviationskew
: skewnesskurt
: kurtosis
using saliva.standard_features()
[19]:
bp.saliva.standard_features(df_cort).head()
[19]:
saliva_feature | cortisol_argmax | cortisol_mean | cortisol_std | cortisol_skew | cortisol_kurt | |
---|---|---|---|---|---|---|
condition | subject | |||||
Control | Vp03 | 3 | 11.433950 | 3.067825 | -1.433263 | 2.214301 |
Vp06 | 5 | 10.483879 | 4.101966 | -0.358008 | -1.497109 | |
Vp08 | 3 | 3.713750 | 1.103525 | 0.772725 | -0.729211 | |
Vp09 | 0 | 5.399236 | 1.864181 | 1.655006 | 2.695174 | |
Vp11 | 4 | 6.400743 | 1.220714 | -0.784137 | 1.954389 |
AUC¶
Area under the Curve (AUC), in different variations (according to Pruessner et al. 2003):
auc_g
: Total Area under the Curveauc_i
: Area under the Curve with respect to increaseauc_i_post
: (ifcompute_auc_post=True
) Area under the Curve with respect to increase after the stressor: This is only relevant for an acute stress scenario where we collected saliva samples before and after the stressor. PerBioPsyKit
convention, saliva samples collected after the stressor have saliva times \(t \geq 0\).
(see the documentation of saliva.auc() for further information).
Note: For these examples we neglect the first saliva sample (S0) by setting remove_s0
to True
because this sample was only assessed to control for high cortisol baseline and is not relevant for feature computation. The feature computation is performed on the remaining saliva samples (S1-S6).
[20]:
# sample times must directly be supplied if they are not already part of the saliva dataframe
bp.saliva.auc(df_cort, remove_s0=True, sample_times=sample_times).head()
[20]:
saliva_feature | cortisol_auc_g | cortisol_auc_i | |
---|---|---|---|
condition | subject | ||
Control | Vp03 | 870.418075 | -72.035925 |
Vp06 | 745.785975 | 305.071225 | |
Vp08 | 261.263925 | 84.065675 | |
Vp09 | 349.085125 | -107.491025 | |
Vp11 | 477.073050 | 56.433550 |
Maximum Increase¶
Absolute maximum increase (or the relative increase in percent if percent=True
) between the first sample in the data and all others:
Note: see the documentation of saliva.max_increase() for further information)
[21]:
bp.saliva.max_increase(df_cort, remove_s0=True).head()
[21]:
saliva_feature | cortisol_max_inc | |
---|---|---|
condition | subject | |
Control | Vp03 | 0.95600 |
Vp06 | 9.44625 | |
Vp08 | 3.01050 | |
Vp09 | -1.40395 | |
Vp11 | 2.19135 |
Slope¶
Slope between two saliva samples (specified by sample_idx
): Note: see the documentation of saliva.slope() for further information)
[22]:
bp.saliva.slope(df_cort, sample_idx=(1, 4), sample_times=sample_times).head()
[22]:
saliva_feature | cortisol_slope14 | |
---|---|---|
condition | subject | |
Control | Vp03 | 0.013971 |
Vp06 | 0.141828 | |
Vp08 | 0.045958 | |
Vp09 | -0.027528 | |
Vp11 | 0.042968 |
Plotting¶
Using Seaborn (some very simple Examples)¶
[23]:
fig, ax = plt.subplots()
sns.lineplot(
data=df_cort.reset_index(),
x="sample",
y="cortisol",
hue="condition",
hue_order=["Control", "Intervention"],
ci=None, # ci = None => no error bars
ax=ax,
)
ax.set_ylabel("Cortisol [nmol/l]")
ax.set_xlabel("Sample Times")
fig.tight_layout()
/tmp/ipykernel_3349/3425057158.py:2: FutureWarning:
The `ci` parameter is deprecated. Use `errorbar=None` for the same effect.
sns.lineplot(
[24]:
fig, ax = plt.subplots()
sns.lineplot(
data=df_cort.reset_index(), x="sample", y="cortisol", hue="condition", hue_order=["Control", "Intervention"], ax=ax
)
ax.set_ylabel("Cortisol [nmol/l]")
ax.set_xlabel("Sample Times")
fig.tight_layout()
[25]:
fig, ax = plt.subplots()
sns.boxplot(
data=df_cort.reset_index(),
x="sample",
y="cortisol",
hue="condition",
hue_order=["Control", "Intervention"],
palette=cmaps.faculties_light,
ax=ax,
)
fig.tight_layout()
[26]:
fig, ax = plt.subplots()
sns.boxplot(
data=bp.saliva.max_increase(df_cort).reset_index(),
x="condition",
y="cortisol_max_inc",
order=["Control", "Intervention"],
palette=cmaps.faculties_light,
)
fig.tight_layout()
[27]:
fig, ax = plt.subplots()
data_features = bp.saliva.utils.saliva_feature_wide_to_long(bp.saliva.standard_features(df_cort), "cortisol")
sns.boxplot(
data=data_features.reset_index(),
x="saliva_feature",
y="cortisol",
hue="condition",
hue_order=["Control", "Intervention"],
palette=cmaps.faculties_light,
ax=ax,
)
fig.tight_layout()
Using functions from BioPsyKit
¶
[28]:
df_cort_mean_se.T
[28]:
condition | Control | Intervention | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
sample | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
mean | 4.569485 | 5.367404 | 5.805423 | 7.669331 | 7.737588 | 6.964650 | 5.916773 | 6.041300 | 5.934754 | 4.971362 | 5.071823 | 5.105092 | 4.932746 | 4.645919 |
se | 0.708467 | 0.956169 | 1.005129 | 1.353230 | 1.479832 | 1.396808 | 1.128819 | 0.788725 | 0.867792 | 0.696804 | 0.752146 | 0.917503 | 0.914560 | 0.869791 |
Note: This example shows how to use the function plotting.saliva_plot(). If you recorded saliva data within a psychological protocol, however, it’s recommended to use the Protocol API and create a Protocol object, where you can add all data and use plotting functions in a more convenient way.
Saliva Plot¶
[29]:
# plot without first saliva sample
bp.protocols.plotting.saliva_plot(
df_cort_mean_se.drop("0", level="sample"),
saliva_type="cortisol",
sample_times=sample_times[1:],
sample_times_absolute=True,
test_times=[0, 30],
test_title="TEST",
);
Saliva Features¶
All features in one plot using saliva_feature_boxplot(), x
variable separates the features, hue
variable separates the conditions.
[30]:
fig, ax = plt.subplots()
bp.protocols.plotting.saliva_feature_boxplot(
data=data_features,
x="saliva_feature",
saliva_type="cortisol",
hue="condition",
hue_order=["Control", "Intervention"],
palette=cmaps.faculties_light,
ax=ax,
)
fig.tight_layout()
One subplot per feature using saliva_multi_feature_boxplot(), x
variable separates the conditions. features
is provided a list of the features to plot.
[31]:
bp.protocols.plotting.saliva_multi_feature_boxplot(
data=data_features,
saliva_type="cortisol",
features=["argmax", "kurt", "std", "mean"],
hue="condition",
hue_order=["Control", "Intervention"],
palette=cmaps.faculties_light,
)
fig.tight_layout()
Grouping of features per subplot, features
is provided a dictionary that specifies the mapping.
[32]:
bp.protocols.plotting.saliva_multi_feature_boxplot(
data=data_features,
saliva_type="cortisol",
features={"mean": ["mean", "argmax"], "std": ["std", "skew", "kurt"]},
hue="condition",
hue_order=["Control", "Intervention"],
palette=cmaps.faculties_light,
)
fig.tight_layout()
[ ]: