ECG Analysis Example¶
This example illustrates how to further process time-series heart rate data extracted from, for instance, ECG data. This includes assigning study conditions, splitting data further into subphases, resampling data, cutting data of all subjects to equal length, normalizing data, rearranging data, as well as computing aggregated results (such as mean and standard error over all subjects for each (sub)phase.
As input, this notebook uses the heart rate processing outputs generated from the ECG processing pipeline as shown in ECG_Processing_Example.ipynb).
Setup and Helper Functions¶
[1]:
from pathlib import Path
import re
import pandas as pd
import numpy as np
from fau_colors import cmaps
import biopsykit as bp
from biopsykit.signals.ecg import EcgProcessor
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
%load_ext autoreload
%autoreload 2
[2]:
plt.close("all")
palette = sns.color_palette(cmaps.faculties)
sns.set_theme(context="notebook", style="ticks", font="sans-serif", palette=palette)
plt.rcParams["figure.figsize"] = (10, 5)
plt.rcParams["pdf.fonttype"] = 42
plt.rcParams["mathtext.default"] = "regular"
palette
[2]:
[3]:
# path to import and export example processing results
result_path = Path("./results")
[4]:
def display_dict_structure(dict_to_display):
_display_dict_recursive(dict_to_display)
def _display_dict_recursive(dict_to_display):
if isinstance(dict_to_display, dict):
display(dict_to_display.keys())
_display_dict_recursive(list(dict_to_display.values())[0])
else:
display("Dataframe shape: {}".format(dict_to_display.shape))
display(dict_to_display.head())
Overview¶
When we collect ECG data from a study, we typically extract heart rate (among others), resulting in time-series heart rate data for each subject. These data are then usually processed further in order to visualize results, perform statistical analyses, etc. How the data is further processed depends on the application. In this example, we will look at two different applications:
Aggregated Results: This means that we compute aggregations over time-series data. In the end, we will get tabular data with mean values for each subject within one time interval of the study procedure. We compute these Aggregated Results when we want to analyze differences, for example, between different study conditions, or between different phases of one study (or both).
Ensemble Time-series: This means that we have time-series data where the data of each subject has equal length. In the end, this allows us to concatenate the data of all subjects into a tabular format. We compute these Ensemble Time-series data when we want to visualize time-series data of all subjects in one plot by overlaying the data and computing the mean ± standard deviation over all subjects for each time point (a so-called ensemble plot).
Additionally, we will show how to compute heart rate variability (HRV) parameters each subject and for each (sub)phase of the recorded data.
Aggregated Results¶
For computing Aggregated Results the following processing steps will be performed:
Load Data: Load heart rate processing results per subject and concatenate them into a SubjectDataDict, a special nested dictionary structure that contains processed data of all subjects.
Resample: Resample heart rate data of all subjects to 1 Hz. This allows to better split the data further into subphases later.
Normalize Data (optional): Normalize heart rate data with respect to the mean heart rate of a baseline phase. This can allow to better compare heart rate reactions within one study since it removed biases from different resting heart rates.
Select Phases (optional): If the following steps should not be performed on the data of all phases, but only on a subset of them (e.g., because you are not interested in the first phase because it is not relevant for your analysis goal, or you only recorded it for normalizing all other phases against it) this subset phases can be specified and further processing will only be performed on this subset.
Split into Subphases (optional): If the different phases of your study consist of subphases, which you want to aggregate and analyze separately, you can further split the data in the SubjectDataDict further into different subphases, which adds a new dictionary level.
Aggregate Results: Aggregate time-series data by computing the mean heart rate per (sub)phase for each subject. The results will be concatenated into a dataframe.
Add Conditions (optional): If multiple study conditions were present during the study the study condition can be added as new index level to the dataframe.
Load Data¶
Heart Rate Data¶
Load processed heart rate time-series data of all subjects and concatenate into one dictionary.
[5]:
study_data_dict = {}
# or use your own path
ecg_path = bp.example_data.get_ecg_processing_results_path_example()
for file in sorted(ecg_path.glob("hr_result_*.xlsx")):
subject_id = re.findall("hr_result_(Vp\w+).xlsx", file.name)[0]
study_data_dict[subject_id] = pd.read_excel(file, sheet_name=None, index_col="time")
Structure of the SubjectDataDict:
[6]:
display_dict_structure(study_data_dict)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (91, 1)'
Heart_Rate | |
---|---|
time | |
2019-10-23 12:32:05.797 | 80.842105 |
2019-10-23 12:32:07.223 | 81.269841 |
2019-10-23 12:32:07.961 | 78.769231 |
2019-10-23 12:32:08.723 | 79.175258 |
2019-10-23 12:32:09.480 | 83.934426 |
Subject Conditions¶
(will be needed later)
Note: This is only an example, thus, the condition list is manually constructed. Usually, you should import a condition list from a file.
[7]:
condition_list = pd.DataFrame(
["Control", "Intervention"], columns=["condition"], index=pd.Index(["Vp01", "Vp02"], name="subject")
)
condition_list
[7]:
condition | |
---|---|
subject | |
Vp01 | Control |
Vp02 | Intervention |
Resample Data¶
Resample all heart rate values to a sampling rate of 1 Hz.
[8]:
dict_result = bp.utils.data_processing.resample_dict_sec(study_data_dict)
Structure of the resampled SubjectDataDict:
[9]:
display_dict_structure(dict_result)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (64, 1)'
Heart_Rate | |
---|---|
time | |
1.0 | 81.142060 |
2.0 | 79.324922 |
3.0 | 79.640487 |
4.0 | 84.765999 |
5.0 | 80.456931 |
Normalize Data¶
Normalize heart rate data with respect to the mean heart rate of a baseline phase. In this case, the baseline phase is called "Baseline"
. All heart rate samples are then interpreted as “heart rate change relative to baseline in percent”.
[10]:
dict_result_norm = bp.utils.data_processing.normalize_to_phase(dict_result, "Baseline")
Structure of the normalized SubjectDataDict:
[11]:
display_dict_structure(dict_result_norm)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (64, 1)'
Heart_Rate | |
---|---|
time | |
1.0 | -5.185450 |
2.0 | -7.308778 |
3.0 | -6.940040 |
4.0 | -0.950875 |
5.0 | -5.986024 |
Select Phases¶
If desired, select only specific phases, such as only “Intervention”, “Stress”, and “Recovery” (i.e., drop the “Baseline” phase because it was only used for normalizing all other phases).
[12]:
dict_result_norm_selected = bp.utils.data_processing.select_dict_phases(
dict_result_norm, phases=["Intervention", "Stress", "Recovery"]
)
Structure of the normalized SubjectDataDict with only the selected phases:
[13]:
display_dict_structure(dict_result_norm_selected)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (109, 1)'
Heart_Rate | |
---|---|
time | |
1.0 | -8.663991 |
2.0 | -7.416155 |
3.0 | -1.054581 |
4.0 | -7.982144 |
5.0 | -6.678360 |
Split into Subphases¶
By splitting the data in the SubjectDataDict further into subphases a new index level is added as the most inner dictionary level. You can specify the different subphases in multiple ways (depending on your study protocol):
Fixed Lengths, Consecutive Subphases¶
If all subphases have fixed lengths and subphases are consecutive, i.e., each subphase begins right after the previous one, you can pass a dict with subphase names as keys and subphase durations (in seconds) as values.
[14]:
dict_result_subph = bp.utils.data_processing.split_dict_into_subphases(
dict_result, subphases={"Start": 20, "Middle": 30, "End": 10}
)
Structure of the SubjectDataDict with a new subphase level added:
[15]:
display_dict_structure(dict_result_subph)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
dict_keys(['Start', 'Middle', 'End'])
'Dataframe shape: (20, 1)'
Heart_Rate | |
---|---|
time | |
1.0 | 81.142060 |
2.0 | 79.324922 |
3.0 | 79.640487 |
4.0 | 84.765999 |
5.0 | 80.456931 |
Fixed Lengths, No Consecutive Subphases¶
If all subphases have fixed lengths, but subphase are not consecutive, i.e., do not begin right after the previous subphase, you can pass a dict with subphase names as keys and and a tuple with relative start and end times (not durations!) (in seconds) as values.
[16]:
dict_result_subph = bp.utils.data_processing.split_dict_into_subphases(
dict_result, subphases={"Start": (0, 10), "Middle": (20, 40), "End": (50, 60)}
)
Structure of the SubjectDataDict with a new subphase level added:
[17]:
display_dict_structure(dict_result_subph)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
dict_keys(['Start', 'Middle', 'End'])
'Dataframe shape: (10, 1)'
Heart_Rate | |
---|---|
time | |
1.0 | 81.142060 |
2.0 | 79.324922 |
3.0 | 79.640487 |
4.0 | 84.765999 |
5.0 | 80.456931 |
Last Subphase without Fixed Length¶
If all subphases have fixed length, except the last subphase, which does not have fixed length (e.g. because the last subphase was a phase where feedback was provided to the subject and thus had variable length), but you still want to extract all data from this subphase, you can set the duration of the last subphase to 0
. The duration of this subphase is then inferred from the data by finding the subject with the longest recording. Shorter recordings of other subjects are then also
automatically cut to their respective duration.
[18]:
dict_result_subph = bp.utils.data_processing.split_dict_into_subphases(
dict_result, subphases={"Start": 20, "Middle": 30, "End": 0}
)
Structure of the SubjectDataDict with a new subphase level added:
[19]:
display_dict_structure(dict_result_subph)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
dict_keys(['Start', 'Middle', 'End'])
'Dataframe shape: (20, 1)'
Heart_Rate | |
---|---|
time | |
1.0 | 81.142060 |
2.0 | 79.324922 |
3.0 | 79.640487 |
4.0 | 84.765999 |
5.0 | 80.456931 |
Aggregate Results¶
Aggregate Over Phases¶
[20]:
df_result = bp.utils.data_processing.mean_per_subject_dict(
dict_result, dict_levels=["subject", "phase"], param_name="HR"
)
df_result.head()
[20]:
HR | ||
---|---|---|
subject | phase | |
Vp01 | Baseline | 85.579756 |
Intervention | 77.021302 | |
Stress | 88.263152 | |
Recovery | 85.855722 | |
Vp02 | Baseline | 86.877952 |
Aggregate Over Subphases¶
[21]:
df_result_subph = bp.utils.data_processing.mean_per_subject_dict(
dict_result_subph, dict_levels=["subject", "phase", "subphase"], param_name="HR"
)
df_result_subph.head()
[21]:
HR | |||
---|---|---|---|
subject | phase | subphase | |
Vp01 | Baseline | Start | 82.387680 |
Middle | 88.853619 | ||
End | 83.124445 | ||
Intervention | Start | 78.375064 | |
Middle | 77.013733 |
Add Conditions¶
[22]:
df_result_cond = bp.utils.data_processing.add_subject_conditions(df_result, condition_list)
df_result_cond.head()
[22]:
HR | |||
---|---|---|---|
condition | subject | phase | |
Control | Vp01 | Baseline | 85.579756 |
Intervention | 77.021302 | ||
Stress | 88.263152 | ||
Recovery | 85.855722 | ||
Intervention | Vp02 | Baseline | 86.877952 |
[23]:
df_result_subph_cond = bp.utils.data_processing.add_subject_conditions(df_result_subph, condition_list)
df_result_subph_cond.head()
[23]:
HR | ||||
---|---|---|---|---|
condition | subject | phase | subphase | |
Control | Vp01 | Baseline | Start | 82.387680 |
Middle | 88.853619 | |||
End | 83.124445 | |||
Intervention | Start | 78.375064 | ||
Middle | 77.013733 |
Compute Parameter¶
The resulting aggregated data can then be further processed and “population averages” can be computed by averaging the heart rate data of each (sub)phase over all subjects, resulting in mean ± standard error values per (sub)phase, and condition (if applicable). The resulting dataframe is a MeanSeDataFrame, a containing mean and standard error of time-series data in a standardized format.
Without Conditions¶
[24]:
df_result_mse = bp.utils.data_processing.mean_se_per_phase(df_result)
df_result_mse.head()
[24]:
HR | ||
---|---|---|
mean | se | |
phase | ||
Baseline | 86.228854 | 0.649098 |
Intervention | 84.202219 | 7.180917 |
Recovery | 91.139351 | 5.283629 |
Stress | 97.608169 | 9.345016 |
[25]:
df_result_subph_mse = bp.utils.data_processing.mean_se_per_phase(df_result_subph)
df_result_subph_mse.head()
[25]:
HR | |||
---|---|---|---|
mean | se | ||
phase | subphase | ||
Baseline | Start | 82.998921 | 0.611242 |
Middle | 88.822727 | 0.030892 | |
End | 85.398164 | 2.273719 | |
Intervention | Start | 82.948593 | 4.573529 |
Middle | 80.744660 | 3.730927 |
With Conditions¶
Just calling the same function, but with a different dataframe (where we added the condition as index level).
Note: In this example, the standard error is NaN
because we only have one subject per condition and can thus not compute the standard error.
[26]:
bp.utils.data_processing.mean_se_per_phase(df_result_cond).head()
[26]:
HR | |||
---|---|---|---|
mean | se | ||
condition | phase | ||
Control | Baseline | 85.579756 | NaN |
Intervention | 77.021302 | NaN | |
Stress | 88.263152 | NaN | |
Recovery | 85.855722 | NaN | |
Intervention | Baseline | 86.877952 | NaN |
[27]:
bp.utils.data_processing.mean_se_per_phase(df_result_subph_cond).head()
[27]:
HR | ||||
---|---|---|---|---|
mean | se | |||
condition | phase | subphase | ||
Control | Baseline | Start | 82.387680 | NaN |
Middle | 88.853619 | NaN | ||
End | 83.124445 | NaN | ||
Intervention | Start | 78.375064 | NaN | |
Middle | 77.013733 | NaN |
Plotting¶
The final data can then either be statistically analyzed, or plotted. Here, we plot the mean and standard error of each phase using the plotting function plotting.hr_mean_plot() from BioPsyKit
.
[28]:
fig, ax = plt.subplots(figsize=(8, 4))
bp.protocols.plotting.hr_mean_plot(df_result_mse, ax=ax);
If we plot our data with plotting.hr_mean_plot(), and our data consists of different phases, where each phase consist of subphases, the phases will be highlighted in the background for better visibility.
[29]:
fig, ax = plt.subplots(figsize=(8, 4))
bp.protocols.plotting.hr_mean_plot(df_result_subph_mse, ax=ax);
Ensemble Time-series¶
For computing Ensemble Time-series the following processing steps will be performed:
Load Data: Load heart rate processing results per subject and concatenate them into a SubjectDataDict, a special nested dictionary structure that contains processed data of all subjects.
Resample: Resample heart rate data of all subjects to 1 Hz. This allows to better split the data further into subphases later.
Normalize Data (optional): Normalize heart rate data with respect to the mean heart rate of a baseline phase. This can allow to better compare heart rate reactions within one study since it removed biases from different resting heart rates.
Select Phases (optional): If the following steps should not be performed on the data of all phases, but only on a subset of them (e.g., because you are not interested in the first phase because it is not relevant for your analysis goal, or you only recorded it for normalizing all other phases against it) this subset phases can be specified and further processing will only be performed on this subset.
Rearrange Data The data in form of a
SubjectDataDict
is rearranged into a StudyDataDict. AStudyDataDict
is constructed from aSubjectDataDict
by swapping outer (subject IDs) and inner (phase names) dictionary levels.Cut Phases to Equal Length: If the data don’t have equal length per subject (e.g., because the last phase had a flexible duration or was not timed accurately), you can cut the data to the same length in order to concatenate the time-series data afterwards. This is beneficial if you want to overlay time-series data from multiple subjects in an so-called Ensemble Plot.
Merge StudyDataDict: Now that the time-series data of all subjects have equal length within each phase they can be merged into a MergedStudyDataDict where the inner level of the nested
StudyDataDict
is removed by merging the individual data into one dataframe for each phase.Add Conditions (optional): If multiple study conditions were present during the study the
StudyDataDict
can further be split by adding a new dictionary level with study conditions to theStudyDataDict
.
Note: See Documentation for biopsykit.utils.data_processing for further information of the used functions.
Load Data¶
Heart Rate Data¶
[30]:
# or use your own path
ecg_path = bp.example_data.get_ecg_processing_results_path_example()
# Load processed heart rate time-series data of all subjects and concatenate into one dict
subject_data_dict = {}
for file in sorted(ecg_path.glob("hr_result_*.xlsx")):
subject_id = re.findall("hr_result_(Vp\w+).xlsx", file.name)[0]
subject_data_dict[subject_id] = pd.read_excel(file, sheet_name=None, index_col="time")
Structure of the SubjectDataDict:
[31]:
display_dict_structure(subject_data_dict)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (91, 1)'
Heart_Rate | |
---|---|
time | |
2019-10-23 12:32:05.797 | 80.842105 |
2019-10-23 12:32:07.223 | 81.269841 |
2019-10-23 12:32:07.961 | 78.769231 |
2019-10-23 12:32:08.723 | 79.175258 |
2019-10-23 12:32:09.480 | 83.934426 |
Subject Conditions¶
(will be needed later)
Note: This is only an example, thus, the condition list is manually constructed. Usually, you should import a condition list from a file.
[32]:
condition_list = pd.DataFrame(
["Control", "Intervention"], columns=["condition"], index=pd.Index(["Vp01", "Vp02"], name="subject")
)
condition_list
[32]:
condition | |
---|---|
subject | |
Vp01 | Control |
Vp02 | Intervention |
Resample Data¶
Resample all heart rate values to a sampling rate of 1 Hz.
[33]:
dict_result = bp.utils.data_processing.resample_dict_sec(subject_data_dict)
Structure of the resampled SubjectDataDict:
[34]:
display_dict_structure(dict_result)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (64, 1)'
Heart_Rate | |
---|---|
time | |
1.0 | 81.142060 |
2.0 | 79.324922 |
3.0 | 79.640487 |
4.0 | 84.765999 |
5.0 | 80.456931 |
Normalize Data¶
Normalize heart rate data with respect to the mean heart rate of a baseline phase. In this case, the baseline phase is called "Baseline"
. All heart rate samples are then interpreted as “heart rate change relative to baseline in percent”.
[35]:
dict_result_norm = bp.utils.data_processing.normalize_to_phase(dict_result, "Baseline")
Structure of the normalized SubjectDataDict`:
[36]:
display_dict_structure(dict_result_norm)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (64, 1)'
Heart_Rate | |
---|---|
time | |
1.0 | -5.185450 |
2.0 | -7.308778 |
3.0 | -6.940040 |
4.0 | -0.950875 |
5.0 | -5.986024 |
Select Phases¶
If desired, select only specific phases, such as only “Intervention”, “Stress”, and “Recovery” (i.e., drop the “Baseline” phase because it was only used for normalizing all other phases).
[37]:
dict_result_norm_selected = bp.utils.data_processing.select_dict_phases(
dict_result_norm, phases=["Intervention", "Stress", "Recovery"]
)
Structure of the normalized SubjectDataDict with only the selected phases:
[38]:
display_dict_structure(dict_result_norm_selected)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (109, 1)'
Heart_Rate | |
---|---|
time | |
1.0 | -8.663991 |
2.0 | -7.416155 |
3.0 | -1.054581 |
4.0 | -7.982144 |
5.0 | -6.678360 |
Rearrange Data¶
[39]:
dict_study = bp.utils.data_processing.rearrange_subject_data_dict(dict_result_norm)
Structure of the normalized StudyDataDict:
[40]:
display_dict_structure(dict_study)
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
dict_keys(['Vp01', 'Vp02'])
'Dataframe shape: (64, 1)'
Heart_Rate | |
---|---|
time | |
1.0 | -5.185450 |
2.0 | -7.308778 |
3.0 | -6.940040 |
4.0 | -0.950875 |
5.0 | -5.986024 |
Cut Data¶
Data for all subjects will be cut to the shortest duration for each phase.
Note: If only a subset of phases should be cut, these phases can be specified using the phases
parameter.
[41]:
dict_result = bp.utils.data_processing.cut_phases_to_shortest(dict_study)
Structure of the cut StudyDataDict:
[42]:
display_dict_structure(dict_result)
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
dict_keys(['Vp01', 'Vp02'])
'Dataframe shape: (60, 1)'
Heart_Rate | |
---|---|
time | |
1.0 | -5.185450 |
2.0 | -7.308778 |
3.0 | -6.940040 |
4.0 | -0.950875 |
5.0 | -5.986024 |
Merge StudyDataDict
¶
[43]:
dict_merged = bp.utils.data_processing.merge_study_data_dict(dict_result)
Structure of the MergedStudyDataDict:
[44]:
display_dict_structure(dict_merged)
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (60, 2)'
subject | Vp01 | Vp02 |
---|---|---|
time | ||
1.0 | -5.185450 | -1.907328 |
2.0 | -7.308778 | 2.764077 |
3.0 | -6.940040 | 6.703796 |
4.0 | -0.950875 | 0.947209 |
5.0 | -5.986024 | -14.090102 |
Add Conditions¶
[45]:
dict_merged_cond = bp.utils.data_processing.split_subject_conditions(dict_merged, condition_list)
Structure of the MergedStudyDataDict, split into different conditions:
[46]:
display_dict_structure(dict_merged_cond)
dict_keys(['Control', 'Intervention'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (60, 1)'
subject | Vp01 |
---|---|
time | |
1.0 | -5.185450 |
2.0 | -7.308778 |
3.0 | -6.940040 |
4.0 | -0.950875 |
5.0 | -5.986024 |
Plotting¶
After these processing steps we can finally visualize the time-series data ob all subjects in an Ensemble Plot where the data ob all subjects is overlaid and each point in the time-series is plotted as mean ± standard error over all subject. For creating Ensemble plots, BioPsykit
provides the function plotting.hr_ensemble_plot.
Note: These plots does not look as pretty as it should because this is just example data from two subjects. If you want to see plots from “actual” data, just scroll down!
[47]:
fig, ax = plt.subplots(figsize=(8, 4))
bp.protocols.plotting.hr_ensemble_plot(dict_merged, ax=ax);
If the phases have subphases they can be passed to the plotting function in order to highlight them in the background:
[48]:
subphases = {"Start": 20, "Middle": 100, "End": 20}
subphase_dict = {"Baseline": subphases, "Intervention": subphases, "Stress": subphases, "Recovery": subphases}
fig, ax = plt.subplots(figsize=(8, 4))
bp.protocols.plotting.hr_ensemble_plot(dict_merged, subphases=subphase_dict, ax=ax);
Load example data from Excel file. The data is already in the format of a MergedStudyDataDict, so it can directly be passed to plotting.hr_ensemble_plot.
[49]:
dict_merged_norm = bp.example_data.get_hr_ensemble_sample()
print(bp.utils.datatype_helper.is_merged_study_data_dict(dict_merged_norm))
display_dict_structure(dict_merged_norm)
True
dict_keys(['Phase1', 'Phase2', 'Phase3'])
'Dataframe shape: (406, 28)'
Vp01 | Vp02 | Vp03 | Vp04 | Vp05 | Vp06 | Vp07 | Vp08 | Vp09 | Vp10 | ... | Vp22 | Vp24 | Vp25 | Vp26 | Vp27 | Vp28 | Vp30 | Vp31 | Vp32 | Vp33 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
time | |||||||||||||||||||||
1 | -5.896011 | 21.372677 | -13.380272 | -0.952815 | 5.662782 | -4.119631 | 0.594466 | 7.365068 | 2.589674 | -4.873666 | ... | 8.312998 | 6.064452 | -4.047215 | -5.267747 | -11.698333 | -5.405734 | -1.370884 | 3.346634 | -5.795628 | 3.491703 |
2 | -4.745362 | 21.213832 | -7.508542 | 1.678500 | 4.286097 | -5.328198 | 11.758334 | 6.751515 | 0.005992 | 3.383603 | ... | 5.708344 | 6.728987 | -6.630518 | -5.409718 | -15.351490 | -4.091093 | -0.836948 | -1.557466 | -12.685620 | 8.637926 |
3 | 1.967010 | 14.120346 | -4.631881 | -4.720704 | 0.709777 | 1.238230 | 7.367597 | 2.901178 | -2.304507 | 11.329636 | ... | 1.157688 | -8.462778 | -6.974983 | -7.243909 | -12.632172 | -3.817170 | 0.881259 | -2.667594 | -10.346344 | 2.835992 |
4 | 1.454400 | 8.943284 | -11.498010 | -10.179091 | -2.759789 | 0.394513 | -8.177153 | 1.614913 | 0.885181 | 9.373362 | ... | -0.661564 | -13.988615 | -3.315441 | -7.813189 | -8.996902 | -0.654702 | -0.186492 | -0.641279 | 0.950780 | 0.571289 |
5 | -3.319305 | -1.501998 | -8.575398 | -8.604960 | -1.664693 | -4.642004 | -7.594194 | 0.708533 | -2.946673 | 12.488329 | ... | 3.657499 | -8.304159 | 2.546755 | -3.969524 | -9.432211 | 6.311569 | 1.601839 | -0.189170 | 8.193988 | -4.193230 |
5 rows × 28 columns
[50]:
subphases = {"Start": 60, "Middle": 240, "End": 0}
subphase_dict = {"Phase1": subphases, "Phase2": subphases, "Phase3": subphases}
fig, ax = plt.subplots(figsize=(8, 4))
bp.protocols.plotting.hr_ensemble_plot(dict_merged_norm, subphases=subphase_dict, ax=ax);
Compute Heart Rate Variability (HRV)¶
For computing Heart Rate Variability (HRV) the following processing steps will be performed:
Load Data: Load R-peak data per subject and concatenate them into a SubjectDataDict, a special nested dictionary structure that contains processed data of all subjects.
Select Phases (optional): If the following steps should not be performed on the data of all phases, but only on a subset of them (e.g., because you are not interested in the first phase because it is not relevant for your analysis goal, or you only recorded it for normalizing all other phases against it) this subset phases can be specified and further processing will only be performed on this subset.
Split into Subphases (optional): If the different phases of your study consist of subphases, which you want to aggregate and analyze separately, you can further split the data in the
SubjectDataDict
further into different subphases, which adds a new dictionary level.Compute HRV: Compute HRV parameters from R-peak data.
Add Conditions (optional): If multiple study conditions were present during the study the StudyDataDict can further be split by adding a new dictionary level with study conditions to the
StudyDataDict
.
Load Data¶
R-Peak Data¶
[51]:
# or use your own path
ecg_path = bp.example_data.get_ecg_processing_results_path_example()
# Load processed r-peaks data of all subjects and concatenate into one dict
rpeaks_subject_data_dict = {}
for file in sorted(ecg_path.glob("rpeaks_result_*.xlsx")):
subject_id = re.findall("rpeaks_result_(Vp\w+).xlsx", file.name)[0]
rpeaks_subject_data_dict[subject_id] = pd.read_excel(file, sheet_name=None, index_col="time")
[52]:
# rename to make it consistent with the other examples
dict_result_rpeaks = rpeaks_subject_data_dict
Structure of the R Peaks SubjectDataDict:
[53]:
display_dict_structure(dict_result_rpeaks)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (91, 5)'
R_Peak_Quality | R_Peak_Idx | RR_Interval | R_Peak_Outlier | Heart_Rate | |
---|---|---|---|---|---|
time | |||||
2019-10-23 12:32:05.797 | 0.701462 | 393.0 | 0.742188 | 1 | 80.842105 |
2019-10-23 12:32:07.223 | 1.000000 | 569.0 | 0.738281 | 0 | 81.269841 |
2019-10-23 12:32:07.961 | 0.782036 | 758.0 | 0.761719 | 0 | 78.769231 |
2019-10-23 12:32:08.723 | 0.652345 | 953.0 | 0.757812 | 0 | 79.175258 |
2019-10-23 12:32:09.480 | 0.925202 | 1147.0 | 0.714844 | 0 | 83.934426 |
Subject Conditions¶
(will be needed later)
Note: This is only an example, thus, the condition list is manually constructed. Usually, you should import a condition list from a file.
[54]:
condition_list = pd.DataFrame(
["Control", "Intervention"], columns=["condition"], index=pd.Index(["Vp01", "Vp02"], name="subject")
)
condition_list
[54]:
condition | |
---|---|
subject | |
Vp01 | Control |
Vp02 | Intervention |
Select Phases¶
If desired, select only specific phases, such as only “Intervention”, “Stress”, and “Recovery”.
[55]:
dict_result_rpeaks_selected = bp.utils.data_processing.select_dict_phases(
dict_result_rpeaks, phases=["Intervention", "Stress", "Recovery"]
)
Structure of the SubjectDataDict with only the selected phases:
[56]:
display_dict_structure(dict_result_rpeaks_selected)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (139, 5)'
R_Peak_Quality | R_Peak_Idx | RR_Interval | R_Peak_Outlier | Heart_Rate | |
---|---|---|---|---|---|
time | |||||
2019-10-23 12:33:11.012 | 0.877853 | 450.0 | 0.749132 | 1 | 80.092700 |
2019-10-23 12:33:12.520 | 0.897152 | 645.0 | 0.777344 | 0 | 77.185930 |
2019-10-23 12:33:13.297 | 0.891966 | 844.0 | 0.746094 | 0 | 80.418848 |
2019-10-23 12:33:14.043 | 0.465520 | 1035.0 | 0.707031 | 0 | 84.861878 |
2019-10-23 12:33:14.750 | 0.906602 | 1216.0 | 0.750000 | 0 | 80.000000 |
Split into Subphases¶
By splitting the data in the SubjectDataDict further into subphases a new index level is added as the most inner dictionary level. In this example, we assume that all subphases are consecutive and have fixed length (see the Aggregated Results example for further options).
[57]:
dict_result_rpeaks_subph = bp.utils.data_processing.split_dict_into_subphases(
dict_result_rpeaks, subphases={"Start": 20, "Middle": 30, "End": 10}
)
Structure of the SubjectDataDict with a new subphase level added:
[58]:
display_dict_structure(dict_result_rpeaks_subph)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
dict_keys(['Start', 'Middle', 'End'])
'Dataframe shape: (27, 5)'
R_Peak_Quality | R_Peak_Idx | RR_Interval | R_Peak_Outlier | Heart_Rate | |
---|---|---|---|---|---|
time | |||||
2019-10-23 12:32:05.797 | 0.701462 | 393.0 | 0.742188 | 1 | 80.842105 |
2019-10-23 12:32:07.223 | 1.000000 | 569.0 | 0.738281 | 0 | 81.269841 |
2019-10-23 12:32:07.961 | 0.782036 | 758.0 | 0.761719 | 0 | 78.769231 |
2019-10-23 12:32:08.723 | 0.652345 | 953.0 | 0.757812 | 0 | 79.175258 |
2019-10-23 12:32:09.480 | 0.925202 | 1147.0 | 0.714844 | 0 | 83.934426 |
Compute HRV¶
Iterate through all subjects and all (sub)phases, call EcgProcessor.hrv_process() and concetenate the results into one dataframe.
Without Subphases¶
[59]:
dict_hrv_subject = {}
for subject_id, data_dict in dict_result_rpeaks.items():
list_hrv_phases = []
for phase, rpeaks in data_dict.items():
list_hrv_phases.append(EcgProcessor.hrv_process(rpeaks=rpeaks, index=phase, index_name="phase"))
dict_hrv_subject[subject_id] = pd.concat(list_hrv_phases)
df_hrv = pd.concat(dict_hrv_subject, names=["subject"])
df_hrv.head()
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/envs/latest/lib/python3.9/site-packages/neurokit2/hrv/hrv_nonlinear.py:474: NeuroKitWarning: DFA_alpha2 related indices will not be calculated. The maximum duration of the windows provided for the long-term correlation is smaller than the minimum duration of windows. Refer to the `scale` argument in `nk.fractal_dfa()` for more information.
warn(
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/envs/latest/lib/python3.9/site-packages/neurokit2/hrv/hrv_nonlinear.py:474: NeuroKitWarning: DFA_alpha2 related indices will not be calculated. The maximum duration of the windows provided for the long-term correlation is smaller than the minimum duration of windows. Refer to the `scale` argument in `nk.fractal_dfa()` for more information.
warn(
[59]:
HRV_MeanNN | HRV_SDNN | HRV_SDANN1 | HRV_SDNNI1 | HRV_SDANN2 | HRV_SDNNI2 | HRV_SDANN5 | HRV_SDNNI5 | HRV_RMSSD | HRV_SDSD | ... | HRV_LZC | HRV_DFA_alpha2 | HRV_MFDFA_alpha2_Width | HRV_MFDFA_alpha2_Peak | HRV_MFDFA_alpha2_Mean | HRV_MFDFA_alpha2_Max | HRV_MFDFA_alpha2_Delta | HRV_MFDFA_alpha2_Asymmetry | HRV_MFDFA_alpha2_Fluctuation | HRV_MFDFA_alpha2_Increment | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
subject | phase | |||||||||||||||||||||
Vp01 | Baseline | 701.171875 | 44.086389 | NaN | NaN | NaN | NaN | NaN | NaN | 36.399766 | 36.605573 | ... | 1.009844 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Intervention | 779.551630 | 48.772965 | NaN | NaN | NaN | NaN | NaN | NaN | 53.253237 | 53.448631 | ... | 1.133243 | 1.108132 | 1.088136 | 1.327777 | 1.629693 | -1.286655 | -2.286655 | -0.222539 | 0.001160 | 0.107944 | |
Stress | 680.145410 | 51.211148 | 25.720464 | 45.331424 | NaN | NaN | NaN | NaN | 35.764047 | 35.806996 | ... | 0.791578 | 0.741699 | 0.265318 | 0.671193 | 0.629318 | 1.000000 | 0.664149 | -0.657832 | 0.000091 | 0.006495 | |
Recovery | 699.218750 | 50.388794 | NaN | NaN | NaN | NaN | NaN | NaN | 29.430528 | 29.506479 | ... | 0.908615 | 0.857308 | 0.294392 | 1.055181 | 1.004504 | 0.878651 | -0.099155 | -0.672143 | 0.000363 | 0.017242 | |
Vp02 | Baseline | 691.029744 | 40.551896 | NaN | NaN | NaN | NaN | NaN | NaN | 32.113406 | 32.304060 | ... | 0.998500 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 81 columns
With Subphases¶
Assuming we only want to compute a certain type of HRV measures (in this example, only time-domain measures), we can specify the hrv_type
in EcgProcessor.hrv_process() .
[60]:
dict_hrv_subject = {}
for subject_id, data_dict in dict_result_rpeaks_subph.items():
dict_hrv_phases = {}
for phase, phase_dict in data_dict.items():
list_hrv_subphases = []
for subphase, rpeaks in phase_dict.items():
list_hrv_subphases.append(
EcgProcessor.hrv_process(rpeaks=rpeaks, index=subphase, index_name="phase", hrv_types="hrv_time")
)
dict_hrv_phases[phase] = pd.concat(list_hrv_subphases)
dict_hrv_subject[subject_id] = pd.concat(dict_hrv_phases, names=["phase"])
df_hrv_subph = pd.concat(dict_hrv_subject, names=["subject"])
df_hrv_subph.head()
[60]:
HRV_MeanNN | HRV_SDNN | HRV_SDANN1 | HRV_SDNNI1 | HRV_SDANN2 | HRV_SDNNI2 | HRV_SDANN5 | HRV_SDNNI5 | HRV_RMSSD | HRV_SDSD | ... | HRV_IQRNN | HRV_SDRMSSD | HRV_Prc20NN | HRV_Prc80NN | HRV_pNN50 | HRV_pNN20 | HRV_MinNN | HRV_MaxNN | HRV_HTI | HRV_TINN | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
subject | phase | phase | |||||||||||||||||||||
Vp01 | Baseline | Start | 728.365385 | 38.515399 | NaN | NaN | NaN | NaN | NaN | NaN | 38.121478 | 38.852299 | ... | 60.546875 | 1.010333 | 691.40625 | 761.71875 | 23.076923 | 61.538462 | 660.15625 | 789.06250 | 8.666667 | 109.3750 |
Middle | 673.029119 | 31.651441 | NaN | NaN | NaN | NaN | NaN | NaN | 32.801251 | 33.164488 | ... | 39.062500 | 0.964946 | 650.78125 | 704.68750 | 18.181818 | 43.181818 | 605.46875 | 765.62500 | 3.666667 | 54.6875 | ||
End | 724.759615 | 32.667065 | NaN | NaN | NaN | NaN | NaN | NaN | 38.356246 | 40.060349 | ... | 31.250000 | 0.851675 | 702.34375 | 752.34375 | 23.076923 | 69.230769 | 664.06250 | 769.53125 | 3.250000 | 62.5000 | ||
Intervention | Start | 762.812500 | 26.119350 | NaN | NaN | NaN | NaN | NaN | NaN | 30.912484 | 31.418938 | ... | 39.062500 | 0.844945 | 741.40625 | 781.25000 | 8.000000 | 60.000000 | 707.03125 | 820.31250 | 6.250000 | 39.0625 | |
Middle | 779.399671 | 47.293117 | NaN | NaN | NaN | NaN | NaN | NaN | 49.025151 | 49.689867 | ... | 62.500000 | 0.964670 | 739.84375 | 818.75000 | 42.105263 | 68.421053 | 671.87500 | 871.09375 | 9.500000 | 78.1250 |
5 rows × 25 columns
Add Conditions¶
[61]:
df_hrv_cond = bp.utils.data_processing.add_subject_conditions(df_hrv, condition_list)
df_hrv_cond.head()
[61]:
HRV_MeanNN | HRV_SDNN | HRV_SDANN1 | HRV_SDNNI1 | HRV_SDANN2 | HRV_SDNNI2 | HRV_SDANN5 | HRV_SDNNI5 | HRV_RMSSD | HRV_SDSD | ... | HRV_LZC | HRV_DFA_alpha2 | HRV_MFDFA_alpha2_Width | HRV_MFDFA_alpha2_Peak | HRV_MFDFA_alpha2_Mean | HRV_MFDFA_alpha2_Max | HRV_MFDFA_alpha2_Delta | HRV_MFDFA_alpha2_Asymmetry | HRV_MFDFA_alpha2_Fluctuation | HRV_MFDFA_alpha2_Increment | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
condition | subject | phase | |||||||||||||||||||||
Control | Vp01 | Baseline | 701.171875 | 44.086389 | NaN | NaN | NaN | NaN | NaN | NaN | 36.399766 | 36.605573 | ... | 1.009844 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Intervention | 779.551630 | 48.772965 | NaN | NaN | NaN | NaN | NaN | NaN | 53.253237 | 53.448631 | ... | 1.133243 | 1.108132 | 1.088136 | 1.327777 | 1.629693 | -1.286655 | -2.286655 | -0.222539 | 0.001160 | 0.107944 | ||
Stress | 680.145410 | 51.211148 | 25.720464 | 45.331424 | NaN | NaN | NaN | NaN | 35.764047 | 35.806996 | ... | 0.791578 | 0.741699 | 0.265318 | 0.671193 | 0.629318 | 1.000000 | 0.664149 | -0.657832 | 0.000091 | 0.006495 | ||
Recovery | 699.218750 | 50.388794 | NaN | NaN | NaN | NaN | NaN | NaN | 29.430528 | 29.506479 | ... | 0.908615 | 0.857308 | 0.294392 | 1.055181 | 1.004504 | 0.878651 | -0.099155 | -0.672143 | 0.000363 | 0.017242 | ||
Intervention | Vp02 | Baseline | 691.029744 | 40.551896 | NaN | NaN | NaN | NaN | NaN | NaN | 32.113406 | 32.304060 | ... | 0.998500 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 81 columns
[62]:
df_hrv_subph_cond = bp.utils.data_processing.add_subject_conditions(df_hrv_subph, condition_list)
df_hrv_subph_cond.head()
[62]:
HRV_MeanNN | HRV_SDNN | HRV_SDANN1 | HRV_SDNNI1 | HRV_SDANN2 | HRV_SDNNI2 | HRV_SDANN5 | HRV_SDNNI5 | HRV_RMSSD | HRV_SDSD | ... | HRV_IQRNN | HRV_SDRMSSD | HRV_Prc20NN | HRV_Prc80NN | HRV_pNN50 | HRV_pNN20 | HRV_MinNN | HRV_MaxNN | HRV_HTI | HRV_TINN | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
condition | subject | phase | phase | |||||||||||||||||||||
Control | Vp01 | Baseline | Start | 728.365385 | 38.515399 | NaN | NaN | NaN | NaN | NaN | NaN | 38.121478 | 38.852299 | ... | 60.546875 | 1.010333 | 691.40625 | 761.71875 | 23.076923 | 61.538462 | 660.15625 | 789.06250 | 8.666667 | 109.3750 |
Middle | 673.029119 | 31.651441 | NaN | NaN | NaN | NaN | NaN | NaN | 32.801251 | 33.164488 | ... | 39.062500 | 0.964946 | 650.78125 | 704.68750 | 18.181818 | 43.181818 | 605.46875 | 765.62500 | 3.666667 | 54.6875 | |||
End | 724.759615 | 32.667065 | NaN | NaN | NaN | NaN | NaN | NaN | 38.356246 | 40.060349 | ... | 31.250000 | 0.851675 | 702.34375 | 752.34375 | 23.076923 | 69.230769 | 664.06250 | 769.53125 | 3.250000 | 62.5000 | |||
Intervention | Start | 762.812500 | 26.119350 | NaN | NaN | NaN | NaN | NaN | NaN | 30.912484 | 31.418938 | ... | 39.062500 | 0.844945 | 741.40625 | 781.25000 | 8.000000 | 60.000000 | 707.03125 | 820.31250 | 6.250000 | 39.0625 | ||
Middle | 779.399671 | 47.293117 | NaN | NaN | NaN | NaN | NaN | NaN | 49.025151 | 49.689867 | ... | 62.500000 | 0.964670 | 739.84375 | 818.75000 | 42.105263 | 68.421053 | 671.87500 | 871.09375 | 9.500000 | 78.1250 |
5 rows × 25 columns
Note at the End¶
This example notebook had the purpose to show how to “manually” process such data step by step. However, BioPsyKit
also offers an easier, object-oriented, way for performing such processing steps in the biopsykit.protocols module. For that, you just need to follow these steps:
Create a new
Prococol
instance (choose from a set of pre-defined, established laboratory protocols, such as MIST or TSST, or use the BaseProtocol)Add heart rate data (and R-peak data, if you additionally want to compute HRV), in the form of a SubjectDataDict to the
Protocol
instance via Protocol.add_hr_data().Compute aggregated results by calling Protocol.compute_hr_results(), ensemble time-series by calling Protocol.compute_hr_ensemble(), and HRV data by calling Protocol.compute_hrv_results(). The single processing steps can be enabled or disabled by passing
True
orFalse
as function arguments (have a look at the Documentation!). Additional parameters can be passed to the processing steps by adding them a dictionary and passing it to the function via theparams
argument.
[ ]: