Protocol Example¶
This example illustrates how to work with the Protocol API in the Protocols module. These classes represent (psychological) protocols. One instance of these protocols represents a particular study the protocol was conducted, and data collected during the study.
The Protocol API further offers functions to process different kind of data collected during that study without the need of setting up “common” processing pipelines each time (see the ECG_Analysis_Example.ipynb for further information on data processing pipelines).
As input, this notebook uses the heart rate processing outputs generated from the ECG processing pipeline as shown in ECG_Processing_Example.ipynb) as well as saliva data.
Setup and Helper Functions¶
[1]:
from pathlib import Path
import re
import pandas as pd
import numpy as np
from fau_colors import cmaps
import biopsykit as bp
from biopsykit.protocols import BaseProtocol, MIST, TSST
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
%load_ext autoreload
%autoreload 2
[2]:
plt.close("all")
palette = sns.color_palette(cmaps.faculties)
sns.set_theme(context="notebook", style="ticks", font="sans-serif", palette=palette)
plt.rcParams["figure.figsize"] = (8, 4)
plt.rcParams["pdf.fonttype"] = 42
plt.rcParams["mathtext.default"] = "regular"
palette
[2]:
[3]:
def display_dict_structure(dict_to_display):
_display_dict_recursive(dict_to_display)
def _display_dict_recursive(dict_to_display):
if isinstance(dict_to_display, dict):
display(dict_to_display.keys())
_display_dict_recursive(list(dict_to_display.values())[0])
else:
display("Dataframe shape: {}".format(dict_to_display.shape))
display(dict_to_display.head())
Overview¶
The Protocol API of BioPsyKit
offers classes that represent various psychological protocols.
The classes allow to add time-series data (such as heart rate) as well as tabular data (such as saliva) to an instance of a Protocol
object. Further, it offers functions to process the data and aggregate the results on a study level (such as computing mean and standard error over all subjects within different phases during the protocol).
For this, it uses functions from the biopsykit.utils.data_processing module to split data, compute aggregated results, etc. While these functions can also be used stand-alone, without using the Protocol API, it is, however, recommended to make use of the Protocol API whenever possible since you don’t have to take care of calling each function manually, and in the right order.
See the ECG_Analysis_Example.ipynb notebook for further examples.
General Usage – BaseProtocol¶
The base class for all protocols is BaseProtocol(). It provides the general functionality. Besides that, there exist a couple of predefined subclasses that represent standardized psychological protocols, such as MIST (Montreal Imaging Stress Task) and TSST (Trier Social Stress Test).
Each protocol is expected to have a certain structure which can be specified by passing a dictionary describing the protocol structure when creating the Protocol object. The structure can be nested, up to three nested structure levels are supported:
1st level:
study part
: These can be different, disjunct parts of the complete study, such as: “Preface”, “Test”, and “Questionnaires”2nd level:
phase
: One study part of the psychological protocol can consist of different phases, such as: “Preparation”, “Stress”, “Recovery”3rd level:
subphase
: One phase can consist of different subphases, such as: “Baseline”, “Arithmetic Task”, “Feedback”
The following examples illustrate different possibilities of defining the protocol structure:
[4]:
# Example 1: study with three parts, no finer division into phases
structure = {"Preface": None, "Test": None, "Questionnaires": None}
BaseProtocol(name="Base", structure=structure)
[4]:
Base
Structure: {'Preface': None, 'Test': None, 'Questionnaires': None}
[5]:
# Example 2: study with three parts, all parts have different phases with specific durations
structure = {
"Preface": {"Questionnaires": 240, "Baseline": 60},
"Test": {"Preparation": 120, "Test": 240, "Recovery": 120},
"Recovery": {"Part1": 240, "Part2": 240},
}
BaseProtocol(name="Base", structure=structure)
[5]:
Base
Structure: {'Preface': {'Questionnaires': 240, 'Baseline': 60}, 'Test': {'Preparation': 120, 'Test': 240, 'Recovery': 120}, 'Recovery': {'Part1': 240, 'Part2': 240}}
[6]:
# Example 3: only certain study parts have different phases (example: TSST)
structure = {"Before": None, "TSST": {"Preparation": 300, "Talk": 300, "Math": 300}, "After": None}
BaseProtocol(name="Base", structure=structure)
[6]:
Base
Structure: {'Before': None, 'TSST': {'Preparation': 300, 'Talk': 300, 'Math': 300}, 'After': None}
[7]:
# Example 4: study with phases and subphases, only certain study parts have different phases (example: MIST)
structure = {
"Before": None,
"MIST": {
"MIST1": {"BL": 60, "AT": 240, "FB": 120},
"MIST2": {"BL": 60, "AT": 240, "FB": 120},
"MIST3": {"BL": 60, "AT": 240, "FB": 120},
},
"After": None,
}
BaseProtocol(name="Base", structure=structure)
[7]:
Base
Structure: {'Before': None, 'MIST': {'MIST1': {'BL': 60, 'AT': 240, 'FB': 120}, 'MIST2': {'BL': 60, 'AT': 240, 'FB': 120}, 'MIST3': {'BL': 60, 'AT': 240, 'FB': 120}}, 'After': None}
Load Data¶
Heart Rate and R Peak Data¶
Load processed heart rate time-series data of all subjects and concatenate into one dict.
[8]:
subject_data_dict_hr = {}
# or use your own path
ecg_path = bp.example_data.get_ecg_processing_results_path_example()
for file in sorted(ecg_path.glob("hr_result_*.xlsx")):
subject_id = re.findall("hr_result_(Vp\w+).xlsx", file.name)[0]
subject_data_dict_hr[subject_id] = pd.read_excel(file, sheet_name=None, index_col="time")
subject_data_dict_rpeaks = {}
for file in sorted(ecg_path.glob("rpeaks_result_*.xlsx")):
subject_id = re.findall("rpeaks_result_(Vp\w+).xlsx", file.name)[0]
subject_data_dict_rpeaks[subject_id] = pd.read_excel(file, sheet_name=None, index_col="time")
Structure of the SubjectDataDict:
[9]:
display_dict_structure(subject_data_dict_hr)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (91, 1)'
Heart_Rate | |
---|---|
time | |
2019-10-23 12:32:05.797 | 80.842105 |
2019-10-23 12:32:07.223 | 81.269841 |
2019-10-23 12:32:07.961 | 78.769231 |
2019-10-23 12:32:08.723 | 79.175258 |
2019-10-23 12:32:09.480 | 83.934426 |
Subject Conditions¶
(will be needed later)
Note: This is only an example, thus, the condition list is manually constructed. Usually, you should import a condition list from a file.
[10]:
condition_list = pd.DataFrame(
["Control", "Intervention"], columns=["condition"], index=pd.Index(["Vp01", "Vp02"], name="subject")
)
condition_list
[10]:
condition | |
---|---|
subject | |
Vp01 | Control |
Vp02 | Intervention |
Create Protocol
instance¶
Let’s assume our example protocol consists of only one study part, so we can ignore this level. The study consists of four different phases: “Baseline”, “Intervention”, “Stress”, and “Recovery”. Each of this phase has, in turn, three subphases: “Start”, “Middle”, and “End” with the durations 20, 30, and 10 seoncds, respectively. Our structure dict then looks like this:
[11]:
subphases = {"Start": 20, "Middle": 30, "End": 10}
structure = {"Baseline": subphases, "Intervention": subphases, "Stress": subphases, "Recovery": subphases}
[12]:
base_protocol = BaseProtocol(name="Base", structure=structure)
Add Heart Rate Data¶
The data to be added to the Protocol object is expected to be a SubjectDataDict, a special nested dictionary structure that contains processed data of all subjects. The dictionaries subject_data_dict_hr
and subject_data_dict_rpeaks
are already in this structure, so they can be added to the Protocol object.
Since we only have one study part, we can directly add this data without specifying a study_part
parameter. The data is added to a study part named “Study” for better internal handling. The added data can be accessed via the hr_data property.
Note: See BaseProtocol.add_hr_data() and BaseProtocol.hr_data for further information!
[13]:
base_protocol.add_hr_data(hr_data=subject_data_dict_hr, rpeak_data=subject_data_dict_rpeaks)
[14]:
display_dict_structure(base_protocol.hr_data)
dict_keys(['Study'])
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (91, 1)'
Heart_Rate | |
---|---|
time | |
2019-10-23 12:32:05.797 | 80.842105 |
2019-10-23 12:32:07.223 | 81.269841 |
2019-10-23 12:32:07.961 | 78.769231 |
2019-10-23 12:32:08.723 | 79.175258 |
2019-10-23 12:32:09.480 | 83.934426 |
Compute Aggregated Results¶
Next, we can compute Aggregated Results. For this, we can use the method BaseProtocol.compute_hr_results() which allows us to enable/disable the specific processing steps and provide different parameters to the individual step (see the ECG_Analysis_Example.ipynb notebook for further explanations on the different processing steps).
Since different kinds of aggregated results can be computed (e.g., aggregating only over phases vs. aggregating over subphases, normalization of heart rate data vs. no normalization, …) we can speficy an identifier for these results via the result_id
parameter.
BaseProtocol.compute_hr_results() can do the following steps (True
/False
indicates which steps are enabled by default – have also a look at the function documentation!):
Resample (
resample_sec
):True
Normalize (
normalize_to
):False
Select Phases (
select_phases
):False
Split into Subphases (
split_into_subphases
:False
Aggregate per Subject (
mean_per_subject
):True
Add Conditions (
add_conditions
):False
Aggregate over Phases (Without Normalization)¶
[15]:
base_protocol.compute_hr_results("hr_phases")
Aggregate over Phases, Add Subject Conditions¶
[16]:
base_protocol.compute_hr_results("hr_phases_cond", add_conditions=True, params={"add_conditions": condition_list})
Aggregate over Phases (With Normalization)¶
If, for example, we want to normalize the heart rate data to the “Baseline” phase, we can enable Normalization (normalize_to
). Since all data were normalized to the “Baseline” phase we can drop this phase for later analysis by enabling the select_phases
step. The parameters for these steps is supplied via the params
dict:
[17]:
base_protocol.compute_hr_results(
"hr_phases_norm",
normalize_to=True,
select_phases=True,
params={"normalize_to": "Baseline", "select_phases": ["Intervention", "Stress", "Recovery"]},
)
Aggregate over Subphases (With Normalization)¶
[18]:
base_protocol.compute_hr_results(
"hr_subphases_norm",
normalize_to=True,
select_phases=True,
split_into_subphases=True,
params={
"normalize_to": "Baseline",
"select_phases": ["Intervention", "Stress", "Recovery"],
"split_into_subphases": subphases,
},
)
Access Results¶
The results can then be accessed via the hr_results property:
[19]:
base_protocol.hr_results["hr_phases"].head()
[19]:
Heart_Rate | ||
---|---|---|
subject | phase | |
Vp01 | Baseline | 85.579756 |
Intervention | 77.021302 | |
Stress | 88.263152 | |
Recovery | 85.855722 | |
Vp02 | Baseline | 86.877952 |
[20]:
base_protocol.hr_results["hr_phases_cond"].head()
[20]:
Heart_Rate | |||
---|---|---|---|
condition | subject | phase | |
Control | Vp01 | Baseline | 85.579756 |
Intervention | 77.021302 | ||
Stress | 88.263152 | ||
Recovery | 85.855722 | ||
Intervention | Vp02 | Baseline | 86.877952 |
[21]:
base_protocol.hr_results["hr_phases_norm"].head()
[21]:
Heart_Rate | ||
---|---|---|
subject | phase | |
Vp01 | Intervention | -10.000559 |
Stress | 3.135550 | |
Recovery | 0.322467 | |
Vp02 | Intervention | 5.185647 |
Stress | 23.107396 |
[22]:
base_protocol.hr_results["hr_subphases_norm"].head()
[22]:
Heart_Rate | |||
---|---|---|---|
subject | phase | subphase | |
Vp01 | Intervention | Start | -8.418687 |
Middle | -10.009403 | ||
End | -10.612060 | ||
Stress | Start | 14.146760 | |
Middle | 7.265783 |
Compute Ensemble Time-Series¶
Next, we can compute Ensemble Time-Series. For this, we can use the method BaseProtocol.compute_hr_ensemble() which allows us to enable/disable the specific processing steps and provide different parameters to the individual step (see the ECG_Analysis_Example.ipynb notebook for further explanations on the different processing steps).
Since different kinds of ensemble time-series can be computed (e.g., normalization of heart rate data vs. no normalization, …), we can speficy an identifier for these results via the ensemble_id
parameter.
BaseProtocol.compute_hr_ensemble() can do the following steps (True
/False
indicates which steps are enabled by default – have also a look at the function documentation!):
Resample (
resample_sec
):True
Normalize (
normalize_to
):True
Select Phases (
select_phases
):False
Cut Phases to Shortest Duration (
cut_phases
):True
Merge Dictionary (
merge_dict
):True
Add Conditions (
add_conditions
):False
[23]:
base_protocol.compute_hr_ensemble("hr_ensemble", params={"normalize_to": "Baseline"})
[24]:
display_dict_structure(base_protocol.hr_ensemble["hr_ensemble"])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (60, 2)'
subject | Vp01 | Vp02 |
---|---|---|
time | ||
1.0 | -5.185450 | -1.907328 |
2.0 | -7.308778 | 2.764077 |
3.0 | -6.940040 | 6.703796 |
4.0 | -0.950875 | 0.947209 |
5.0 | -5.986024 | -14.090102 |
Compute HRV Parameters¶
Next, we can compute HRV Parameters. For this, we can use the method BaseProtocol.compute_hrv_results() which allows us to enable/disable the specific processing steps and provide different parameters to the individual step (see the ECG_Analysis_Example.ipynb notebook for further explanations on the different processing steps).
Since different kinds of HRV results can be computed (e.g., aggregating only over phases vs. aggregating over subphases, …) we can speficy an identifier for these results via the result_id
parameter.
BaseProtocol.compute_hrv_results() can do the following steps (True
/False
indicates which steps are enabled by default – have also a look at the function documentation!):
Select Phases (
select_phases
):False
Split into Subphases (
split_into_subphases
):False
Add Conditions (
add_conditions
):False
Additionally, the function has arguments to further specify HRV processing:
dict_levels
: The names of the dictionary levels. Corresponds to the index level names of the resulting, concatenated dataframe. Default:None
for the levels [“subject”, “phase”] (or [“subject”, “phase”, “subphase”], whensplit_into_subphases
isTrue
)hrv_params
: Dictionary with parameters to configure HRV parameter computation. Parameters are passed to EcgProcessor.hrv_process(). Examples:hrv_types
: list of HRV types to be computed. Subset of [“hrv_time”, “hrv_nonlinear”, “hrv_frequency”].correct_rpeaks
: Whether to apply R peak correction before computing HRV parameters or not
Compute HRV Parameter over Phases¶
[25]:
base_protocol.compute_hrv_results("hrv_phases")
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/envs/latest/lib/python3.9/site-packages/neurokit2/hrv/hrv_nonlinear.py:474: NeuroKitWarning: DFA_alpha2 related indices will not be calculated. The maximum duration of the windows provided for the long-term correlation is smaller than the minimum duration of windows. Refer to the `scale` argument in `nk.fractal_dfa()` for more information.
warn(
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/envs/latest/lib/python3.9/site-packages/neurokit2/hrv/hrv_nonlinear.py:474: NeuroKitWarning: DFA_alpha2 related indices will not be calculated. The maximum duration of the windows provided for the long-term correlation is smaller than the minimum duration of windows. Refer to the `scale` argument in `nk.fractal_dfa()` for more information.
warn(
[26]:
base_protocol.hrv_results["hrv_phases"].head()
[26]:
HRV_MeanNN | HRV_SDNN | HRV_SDANN1 | HRV_SDNNI1 | HRV_SDANN2 | HRV_SDNNI2 | HRV_SDANN5 | HRV_SDNNI5 | HRV_RMSSD | HRV_SDSD | ... | HRV_LZC | HRV_DFA_alpha2 | HRV_MFDFA_alpha2_Width | HRV_MFDFA_alpha2_Peak | HRV_MFDFA_alpha2_Mean | HRV_MFDFA_alpha2_Max | HRV_MFDFA_alpha2_Delta | HRV_MFDFA_alpha2_Asymmetry | HRV_MFDFA_alpha2_Fluctuation | HRV_MFDFA_alpha2_Increment | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
subject | phase | |||||||||||||||||||||
Vp01 | Baseline | 701.171875 | 44.086389 | NaN | NaN | NaN | NaN | NaN | NaN | 36.399766 | 36.605573 | ... | 1.009844 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Intervention | 779.551630 | 48.772965 | NaN | NaN | NaN | NaN | NaN | NaN | 53.253237 | 53.448631 | ... | 1.133243 | 1.108132 | 1.088136 | 1.327777 | 1.629693 | -1.286655 | -2.286655 | -0.222539 | 0.001160 | 0.107944 | |
Stress | 680.145410 | 51.211148 | 25.720464 | 45.331424 | NaN | NaN | NaN | NaN | 35.764047 | 35.806996 | ... | 0.791578 | 0.741699 | 0.265318 | 0.671193 | 0.629318 | 1.000000 | 0.664149 | -0.657832 | 0.000091 | 0.006495 | |
Recovery | 699.218750 | 50.388794 | NaN | NaN | NaN | NaN | NaN | NaN | 29.430528 | 29.506479 | ... | 0.908615 | 0.857308 | 0.294392 | 1.055181 | 1.004504 | 0.878651 | -0.099155 | -0.672143 | 0.000363 | 0.017242 | |
Vp02 | Baseline | 691.029744 | 40.551896 | NaN | NaN | NaN | NaN | NaN | NaN | 32.113406 | 32.304060 | ... | 0.998500 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 81 columns
Compute HRV Parameter over Subphases¶
Assuming we only want to compute a certain type of HRV measures (in this example, only time-domain measures), we can specify this in the hrv_params
(see EcgProcessor.hrv_process() for an overview of available arguments) argument.
[27]:
base_protocol.compute_hrv_results(
"hrv_subphases",
split_into_subphases=True,
params={"split_into_subphases": subphases},
hrv_params={"hrv_types": "hrv_time"},
)
[28]:
base_protocol.hrv_results["hrv_subphases"].head()
[28]:
HRV_MeanNN | HRV_SDNN | HRV_SDANN1 | HRV_SDNNI1 | HRV_SDANN2 | HRV_SDNNI2 | HRV_SDANN5 | HRV_SDNNI5 | HRV_RMSSD | HRV_SDSD | ... | HRV_IQRNN | HRV_SDRMSSD | HRV_Prc20NN | HRV_Prc80NN | HRV_pNN50 | HRV_pNN20 | HRV_MinNN | HRV_MaxNN | HRV_HTI | HRV_TINN | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
subject | phase | subphase | |||||||||||||||||||||
Vp01 | Baseline | Start | 728.365385 | 38.515399 | NaN | NaN | NaN | NaN | NaN | NaN | 38.121478 | 38.852299 | ... | 60.546875 | 1.010333 | 691.40625 | 761.71875 | 23.076923 | 61.538462 | 660.15625 | 789.06250 | 8.666667 | 109.3750 |
Middle | 673.029119 | 31.651441 | NaN | NaN | NaN | NaN | NaN | NaN | 32.801251 | 33.164488 | ... | 39.062500 | 0.964946 | 650.78125 | 704.68750 | 18.181818 | 43.181818 | 605.46875 | 765.62500 | 3.666667 | 54.6875 | ||
End | 724.759615 | 32.667065 | NaN | NaN | NaN | NaN | NaN | NaN | 38.356246 | 40.060349 | ... | 31.250000 | 0.851675 | 702.34375 | 752.34375 | 23.076923 | 69.230769 | 664.06250 | 769.53125 | 3.250000 | 62.5000 | ||
Intervention | Start | 762.812500 | 26.119350 | NaN | NaN | NaN | NaN | NaN | NaN | 30.912484 | 31.418938 | ... | 39.062500 | 0.844945 | 741.40625 | 781.25000 | 8.000000 | 60.000000 | 707.03125 | 820.31250 | 6.250000 | 39.0625 | |
Middle | 779.399671 | 47.293117 | NaN | NaN | NaN | NaN | NaN | NaN | 49.025151 | 49.689867 | ... | 62.500000 | 0.964670 | 739.84375 | 818.75000 | 42.105263 | 68.421053 | 671.87500 | 871.09375 | 9.500000 | 78.1250 |
5 rows × 25 columns
Add Precomputed Results¶
If processing results were already computed (e.g., in another notebook or because it’s data from an old study), the precomputed results can be imported and directly added to the Protocol
instance.
Ensemble Time-series¶
In this example, we have Ensemble Time-series data from a study that has 3 phases (Phase1, Phase2, Phase3). This data is already in the correct format, so we can directly add it to a Protocol object (that also has the correct structure – the duration of the single phases is not of importance, so we just leave it empty here).
[29]:
dict_merged_norm = bp.example_data.get_hr_ensemble_sample()
[30]:
base_protocol = BaseProtocol("TimeSeriesDemo", structure={"Phase1": None, "Phase2": None, "Phase3": None})
[31]:
base_protocol.add_hr_ensemble("hr_ensemble", dict_merged_norm)
[32]:
display_dict_structure(base_protocol.hr_ensemble["hr_ensemble"])
dict_keys(['Phase1', 'Phase2', 'Phase3'])
'Dataframe shape: (406, 28)'
Vp01 | Vp02 | Vp03 | Vp04 | Vp05 | Vp06 | Vp07 | Vp08 | Vp09 | Vp10 | ... | Vp22 | Vp24 | Vp25 | Vp26 | Vp27 | Vp28 | Vp30 | Vp31 | Vp32 | Vp33 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
time | |||||||||||||||||||||
1 | -5.896011 | 21.372677 | -13.380272 | -0.952815 | 5.662782 | -4.119631 | 0.594466 | 7.365068 | 2.589674 | -4.873666 | ... | 8.312998 | 6.064452 | -4.047215 | -5.267747 | -11.698333 | -5.405734 | -1.370884 | 3.346634 | -5.795628 | 3.491703 |
2 | -4.745362 | 21.213832 | -7.508542 | 1.678500 | 4.286097 | -5.328198 | 11.758334 | 6.751515 | 0.005992 | 3.383603 | ... | 5.708344 | 6.728987 | -6.630518 | -5.409718 | -15.351490 | -4.091093 | -0.836948 | -1.557466 | -12.685620 | 8.637926 |
3 | 1.967010 | 14.120346 | -4.631881 | -4.720704 | 0.709777 | 1.238230 | 7.367597 | 2.901178 | -2.304507 | 11.329636 | ... | 1.157688 | -8.462778 | -6.974983 | -7.243909 | -12.632172 | -3.817170 | 0.881259 | -2.667594 | -10.346344 | 2.835992 |
4 | 1.454400 | 8.943284 | -11.498010 | -10.179091 | -2.759789 | 0.394513 | -8.177153 | 1.614913 | 0.885181 | 9.373362 | ... | -0.661564 | -13.988615 | -3.315441 | -7.813189 | -8.996902 | -0.654702 | -0.186492 | -0.641279 | 0.950780 | 0.571289 |
5 | -3.319305 | -1.501998 | -8.575398 | -8.604960 | -1.664693 | -4.642004 | -7.594194 | 0.708533 | -2.946673 | 12.488329 | ... | 3.657499 | -8.304159 | 2.546755 | -3.969524 | -9.432211 | 6.311569 | 1.601839 | -0.189170 | 8.193988 | -4.193230 |
5 rows × 28 columns
MIST¶
The MIST represents the Montreal Imaging Stress Task.
Typically, the MIST has the following structure:
“Before”: The study part before conducting the “actual” MIST. Here, subjects typically fill out questionnaires and provide a neutral baseline before stress induction
“MIST”: The “actual” MIST. The MIST protocol consists of three repetitions of the same subphases: Baseline (BL), Arithmetic Tasks (AT), Feedback (FB). Each subphase has a duration that is specified in seconds.
“After”: The study part after the MIST. Here, subjects typically fill out further questionnaires and remain in the examination room to provide further saliva samples.
In summary:
Study Parts: Before, MIST, After
Phases: MIST1, MIST2, MIST3
Subphases: BL, AT, FB
Subphase Durations: 60, 240, 0 (Feedback Interval has length 0 because it’s of variable length for each MIST phase and will later be inferred from the data)
For further information please visit the documentatin of protocols.MIST.
Hence, the structure dictionary would look like the following:
structure = {
"Before": None,
"MIST": {
"MIST1": {"BL": 60, "AT": 240, "FB": 0},
"MIST2": {"BL": 60, "AT": 240, "FB": 0},
"MIST3": {"BL": 60, "AT": 240, "FB": 0}
},
"After": None
}
[33]:
structure = {
"Before": None,
"MIST": {
"MIST1": ["BL", "AT", "FB"],
"MIST2": ["BL", "AT", "FB"],
"MIST3": ["BL", "AT", "FB"],
},
"After": None,
}
[34]:
mist = MIST(structure=structure)
mist
[34]:
MIST
Structure: {'Before': None, 'MIST': {'MIST1': ['BL', 'AT', 'FB'], 'MIST2': ['BL', 'AT', 'FB'], 'MIST3': ['BL', 'AT', 'FB']}, 'After': None}
TSST¶
The TSST represents the Trier Social Stress Test.
Typically, the TSST has the following structure:
“Before”: The study part before conducting the “actual” TSST. Here, subjects typically fill out questionnaires and provide a neutral baseline before stress induction
“TSST”: The “actual” TSST. The TSST protocol consists two phases: Talk and Math, both with a duration of 5 minutes (300 seconds).
“After”: The study part after the TSST. Here, subjects typically fill out further questionnaires and remain in the examination room to provide further saliva samples.
In summary:
Study Parts: Before, TSST, After
Phases: Talk, Math
Phase Durations: 300, 300
For further information please visit the documentatin of protocols.TSST.
Hence, the structure dictionary would look like the following:
structure = {
"Before": None,
"TSST": {
"Talk": 300,
"Math": 300,
},
"After": None
}
[35]:
structure = {
"Before": None,
"TSST": {
"Talk": 300,
"Math": 300,
},
"After": None,
}
[36]:
tsst = TSST(structure=structure)
tsst
[36]:
TSST
Structure: {'Before': None, 'TSST': {'Talk': 300, 'Math': 300}, 'After': None}
Plotting¶
The Protocol API offers functions to directly visualize the processed study data of the Protocol
instance. So far, the visualization of heart rate data (hr_mean_plot() and hr_ensemble_plot()) and saliva data
(saliva_plot()) is supported.
Create Protocol object
Note: For this example we assume that data was collected during the MIST.
[37]:
mist = MIST()
mist
[37]:
MIST
Structure: {'Part1': None, 'MIST': {'MIST1': {'BL': 60, 'AT': 240, 'FB': 0}, 'MIST2': {'BL': 60, 'AT': 240, 'FB': 0}, 'MIST3': {'BL': 60, 'AT': 240, 'FB': 0}}, 'Part2': None}
Aggregated Results¶
Load Example Aggregated Results Data
[38]:
hr_result = bp.example_data.get_hr_result_sample()
# Alternatively: Load your own aggregated results
# hr_result = bp.io.load_long_format_csv("<path-to-aggregated-results-in-long-format>")
[39]:
hr_result.head()
[39]:
HR | ||||
---|---|---|---|---|
condition | subject | phase | subphase | |
Intervention | Vp01 | Phase1 | Start | -1.233771 |
Middle | -0.727680 | |||
End | -0.719957 | |||
Phase2 | Start | -2.472756 | ||
Middle | -2.044894 |
Add Aggregated Results Data
[40]:
mist.add_hr_results("hr_result", hr_result)
Plot Mean HR Plot
[41]:
fig, ax = plt.subplots()
mist.hr_mean_plot("hr_result", ax=ax);
Ensemble Time-series¶
Load Example Ensemble Data¶
[42]:
dict_hr_ensemble = bp.example_data.get_hr_ensemble_sample()
# Alternatively: Load your own ensemble time-series data
# dict_hr_ensemble = bp.io.load_pandas_dict_excel("<path-to-heart-rate-ensemble-data.xlsx>")
# Note: We rename the phase names (Phase1-3) of this example dataset to match
# it with the standard phase names of the MIST (MIST1-3)
dict_hr_ensemble = {key: value for key, value in zip(["MIST1", "MIST2", "MIST3"], dict_hr_ensemble.values())}
[43]:
display_dict_structure(dict_hr_ensemble)
dict_keys(['MIST1', 'MIST2', 'MIST3'])
'Dataframe shape: (406, 28)'
Vp01 | Vp02 | Vp03 | Vp04 | Vp05 | Vp06 | Vp07 | Vp08 | Vp09 | Vp10 | ... | Vp22 | Vp24 | Vp25 | Vp26 | Vp27 | Vp28 | Vp30 | Vp31 | Vp32 | Vp33 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
time | |||||||||||||||||||||
1 | -5.896011 | 21.372677 | -13.380272 | -0.952815 | 5.662782 | -4.119631 | 0.594466 | 7.365068 | 2.589674 | -4.873666 | ... | 8.312998 | 6.064452 | -4.047215 | -5.267747 | -11.698333 | -5.405734 | -1.370884 | 3.346634 | -5.795628 | 3.491703 |
2 | -4.745362 | 21.213832 | -7.508542 | 1.678500 | 4.286097 | -5.328198 | 11.758334 | 6.751515 | 0.005992 | 3.383603 | ... | 5.708344 | 6.728987 | -6.630518 | -5.409718 | -15.351490 | -4.091093 | -0.836948 | -1.557466 | -12.685620 | 8.637926 |
3 | 1.967010 | 14.120346 | -4.631881 | -4.720704 | 0.709777 | 1.238230 | 7.367597 | 2.901178 | -2.304507 | 11.329636 | ... | 1.157688 | -8.462778 | -6.974983 | -7.243909 | -12.632172 | -3.817170 | 0.881259 | -2.667594 | -10.346344 | 2.835992 |
4 | 1.454400 | 8.943284 | -11.498010 | -10.179091 | -2.759789 | 0.394513 | -8.177153 | 1.614913 | 0.885181 | 9.373362 | ... | -0.661564 | -13.988615 | -3.315441 | -7.813189 | -8.996902 | -0.654702 | -0.186492 | -0.641279 | 0.950780 | 0.571289 |
5 | -3.319305 | -1.501998 | -8.575398 | -8.604960 | -1.664693 | -4.642004 | -7.594194 | 0.708533 | -2.946673 | 12.488329 | ... | 3.657499 | -8.304159 | 2.546755 | -3.969524 | -9.432211 | 6.311569 | 1.601839 | -0.189170 | 8.193988 | -4.193230 |
5 rows × 28 columns
Add Ensemble Data¶
[44]:
mist.add_hr_ensemble("hr_ensemble", dict_hr_ensemble)
Plot HR Ensemble Plot¶
Note: Since the MIST contains subphases the MIST.hr_ensemble_plot() function makes use of structure property of the MIST object and automatically highlights the subphases in the HR Ensemble Plot.
[45]:
fig, ax = plt.subplots()
mist.hr_ensemble_plot("hr_ensemble", ax=ax);
Saliva¶
Just like heart rate data we can also add saliva data to the Protocols object. We just need to specify which type of data it is (e.g., “cortisol”, “amylase”, etc.) – see the (see the Saliva_Example.ipynb notebook for further information.
Saliva data can be added to the Protocol
object in two different format:
Saliva data per subject as SalivaRawDataFrame, a specialized dataframe containing “raw” (unaggregated) saliva data in a standardized format.
BioPsyKit
will then interanally compute mean and standard error.Aggregated saliva data (mean and standard error) as SalivaMeanSeDataFrame , a specialized dataframe containing mean and standard error of saliva samples in a standardized format.
SalivaRawDataFrame¶
Load Example Cortisol Data in SalivaRawDataFrame format.
Note: The sample times are provided relative to the psychological protocol. The first sample is dropped for plotting.
[46]:
saliva_data = bp.example_data.get_saliva_example(sample_times=[-30, -1, 0, 10, 20, 30, 40])
# alternatively: load your own saliva data
# saliva_data = bp.io.saliva.load_saliva_wide_format("<path-to-saliva-data>")
# drop first saliva sample (not needed because it was only assessed to control for high baseline)
saliva_data = saliva_data.drop("0", level="sample")
saliva_data.head()
[46]:
cortisol | time | |||
---|---|---|---|---|
condition | subject | sample | ||
Intervention | Vp01 | 1 | 7.03320 | -1 |
2 | 5.77670 | 0 | ||
3 | 5.25790 | 10 | ||
4 | 5.00795 | 20 | ||
5 | 4.50045 | 30 |
Add Saliva Data to MIST object
[47]:
mist.add_saliva_data(saliva_data, "cortisol")
[48]:
fig, ax = mist.saliva_plot(saliva_type="cortisol")
fig.tight_layout()
SalivaMeanSeDataFrame¶
Create Protocol object – For this example, we’re assuming the data was collected during the TSST.
[49]:
structure = {
"Before": None,
"TSST": {
"Talk": 300,
"Math": 300,
},
"After": None,
}
tsst = TSST(structure=structure)
tsst
[49]:
TSST
Structure: {'Before': None, 'TSST': {'Talk': 300, 'Math': 300}, 'After': None}
Load Example Saliva Data (cortisol, amylase, and IL-6) in SalivaMeanSeDataFrame format (or, more presicely, a dictionary of such).
[50]:
saliva_dict = bp.example_data.get_saliva_mean_se_example()
display_dict_structure(saliva_dict)
dict_keys(['cortisol', 'amylase', 'il6'])
'Dataframe shape: (14, 2)'
mean | se | |||
---|---|---|---|---|
condition | sample | time | ||
Control | 0 | -1 | 8.004 | 1.021 |
1 | 0 | 10.751 | 0.832 | |
2 | 10 | 14.330 | 1.230 | |
3 | 20 | 13.641 | 1.338 | |
4 | 45 | 10.549 | 1.036 |
Add Saliva Data to TSST object. Since it’s a dictionary, we don’t have to specify the saliva_types
– the names are automatically inferred from the dictionary keys (we can, however, also manually specify the saliva type if we want).
[51]:
tsst.add_saliva_data(saliva_dict)
display_dict_structure(tsst.saliva_data)
dict_keys(['cortisol', 'amylase', 'il6'])
'Dataframe shape: (14, 2)'
mean | se | |||
---|---|---|---|---|
condition | sample | time | ||
Control | 0 | -1 | 8.004 | 1.021 |
1 | 0 | 10.751 | 0.832 | |
2 | 10 | 14.330 | 1.230 | |
3 | 20 | 13.641 | 1.338 | |
4 | 45 | 10.549 | 1.036 |
Plot Cortisol Data¶
Note: This function will automatically plot data from all conditions. If you want to change that behavior, and plot only selected conditions (or just one), you have to manually call saliva_plot() from the module biopsykit.protocols.plotting(see below).
[52]:
fig, ax = tsst.saliva_plot(saliva_type="cortisol", legend_loc="upper right")
fig.tight_layout()
Plot Cortisol Data – Only “Intervention” Group
[53]:
fig, ax = plt.subplots()
bp.protocols.plotting.saliva_plot(
data=tsst.saliva_data["cortisol"].xs("Intervention"),
saliva_type="cortisol",
test_times=tsst.test_times,
test_title="TSST",
legend_loc="upper right",
ax=ax,
)
fig.tight_layout()
Plot Cortisol and IL-6 Data
[54]:
fig, ax = plt.subplots()
tsst.saliva_plot(
saliva_type=["cortisol", "il6"], linestyle=["-", "--"], marker=["o", "P"], legend_loc="upper right", ax=ax
);
[ ]: