ECG Analysis Example

This example illustrates how to further process time-series heart rate data extracted from, for instance, ECG data. This includes assigning study conditions, splitting data further into subphases, resampling data, cutting data of all subjects to equal length, normalizing data, rearranging data, as well as computing aggregated results (such as mean and standard error over all subjects for each (sub)phase.

As input, this notebook uses the heart rate processing outputs generated from the ECG processing pipeline as shown in ECG_Processing_Example.ipynb).

Setup and Helper Functions

[1]:
from pathlib import Path

import re

import pandas as pd
import numpy as np

from fau_colors import cmaps

import biopsykit as bp
from biopsykit.signals.ecg import EcgProcessor

import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
%load_ext autoreload
%autoreload 2
[2]:
plt.close("all")

palette = sns.color_palette(cmaps.faculties)
sns.set_theme(context="notebook", style="ticks", font="sans-serif", palette=palette)

plt.rcParams["figure.figsize"] = (10, 5)
plt.rcParams["pdf.fonttype"] = 42
plt.rcParams["mathtext.default"] = "regular"

palette
[2]:
[3]:
# path to import and export example processing results
result_path = Path("./results")
[4]:
def display_dict_structure(dict_to_display):
    _display_dict_recursive(dict_to_display)


def _display_dict_recursive(dict_to_display):
    if isinstance(dict_to_display, dict):
        display(dict_to_display.keys())
        _display_dict_recursive(list(dict_to_display.values())[0])
    else:
        display("Dataframe shape: {}".format(dict_to_display.shape))
        display(dict_to_display.head())

Overview

When we collect ECG data from a study, we typically extract heart rate (among others), resulting in time-series heart rate data for each subject. These data are then usually processed further in order to visualize results, perform statistical analyses, etc. How the data is further processed depends on the application. In this example, we will look at two different applications:

  1. Aggregated Results: This means that we compute aggregations over time-series data. In the end, we will get tabular data with mean values for each subject within one time interval of the study procedure. We compute these Aggregated Results when we want to analyze differences, for example, between different study conditions, or between different phases of one study (or both).

  2. Ensemble Time-series: This means that we have time-series data where the data of each subject has equal length. In the end, this allows us to concatenate the data of all subjects into a tabular format. We compute these Ensemble Time-series data when we want to visualize time-series data of all subjects in one plot by overlaying the data and computing the mean ± standard deviation over all subjects for each time point (a so-called ensemble plot).

Additionally, we will show how to compute heart rate variability (HRV) parameters each subject and for each (sub)phase of the recorded data.

Aggregated Results

For computing Aggregated Results the following processing steps will be performed:

  1. Load Data: Load heart rate processing results per subject and concatenate them into a SubjectDataDict, a special nested dictionary structure that contains processed data of all subjects.

  2. Resample: Resample heart rate data of all subjects to 1 Hz. This allows to better split the data further into subphases later.

  3. Normalize Data (optional): Normalize heart rate data with respect to the mean heart rate of a baseline phase. This can allow to better compare heart rate reactions within one study since it removed biases from different resting heart rates.

  4. Select Phases (optional): If the following steps should not be performed on the data of all phases, but only on a subset of them (e.g., because you are not interested in the first phase because it is not relevant for your analysis goal, or you only recorded it for normalizing all other phases against it) this subset phases can be specified and further processing will only be performed on this subset.

  5. Split into Subphases (optional): If the different phases of your study consist of subphases, which you want to aggregate and analyze separately, you can further split the data in the SubjectDataDict further into different subphases, which adds a new dictionary level.

  6. Aggregate Results: Aggregate time-series data by computing the mean heart rate per (sub)phase for each subject. The results will be concatenated into a dataframe.

  7. Add Conditions (optional): If multiple study conditions were present during the study the study condition can be added as new index level to the dataframe.

Load Data

Heart Rate Data

Load processed heart rate time-series data of all subjects and concatenate into one dictionary.

[5]:
study_data_dict = {}
# or use your own path
ecg_path = bp.example_data.get_ecg_processing_results_path_example()

for file in sorted(ecg_path.glob("hr_result_*.xlsx")):
    subject_id = re.findall("hr_result_(Vp\w+).xlsx", file.name)[0]
    study_data_dict[subject_id] = pd.read_excel(file, sheet_name=None, index_col="time")

Structure of the SubjectDataDict:

[6]:
display_dict_structure(study_data_dict)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (91, 1)'
Heart_Rate
time
2019-10-23 12:32:05.797 80.842105
2019-10-23 12:32:07.223 81.269841
2019-10-23 12:32:07.961 78.769231
2019-10-23 12:32:08.723 79.175258
2019-10-23 12:32:09.480 83.934426

Subject Conditions

(will be needed later)

Note: This is only an example, thus, the condition list is manually constructed. Usually, you should import a condition list from a file.

[7]:
condition_list = pd.DataFrame(
    ["Control", "Intervention"], columns=["condition"], index=pd.Index(["Vp01", "Vp02"], name="subject")
)
condition_list
[7]:
condition
subject
Vp01 Control
Vp02 Intervention

Resample Data

Resample all heart rate values to a sampling rate of 1 Hz.

[8]:
dict_result = bp.utils.data_processing.resample_dict_sec(study_data_dict)

Structure of the resampled SubjectDataDict:

[9]:
display_dict_structure(dict_result)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (64, 1)'
Heart_Rate
time
1.0 81.142060
2.0 79.324922
3.0 79.640487
4.0 84.765999
5.0 80.456931

Normalize Data

Normalize heart rate data with respect to the mean heart rate of a baseline phase. In this case, the baseline phase is called "Baseline". All heart rate samples are then interpreted as “heart rate change relative to baseline in percent”.

[10]:
dict_result_norm = bp.utils.data_processing.normalize_to_phase(dict_result, "Baseline")

Structure of the normalized SubjectDataDict:

[11]:
display_dict_structure(dict_result_norm)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (64, 1)'
Heart_Rate
time
1.0 -5.185450
2.0 -7.308778
3.0 -6.940040
4.0 -0.950875
5.0 -5.986024

Select Phases

If desired, select only specific phases, such as only “Intervention”, “Stress”, and “Recovery” (i.e., drop the “Baseline” phase because it was only used for normalizing all other phases).

[12]:
dict_result_norm_selected = bp.utils.data_processing.select_dict_phases(
    dict_result_norm, phases=["Intervention", "Stress", "Recovery"]
)

Structure of the normalized SubjectDataDict with only the selected phases:

[13]:
display_dict_structure(dict_result_norm_selected)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (109, 1)'
Heart_Rate
time
1.0 -8.663991
2.0 -7.416155
3.0 -1.054581
4.0 -7.982144
5.0 -6.678360

Split into Subphases

By splitting the data in the SubjectDataDict further into subphases a new index level is added as the most inner dictionary level. You can specify the different subphases in multiple ways (depending on your study protocol):

Fixed Lengths, Consecutive Subphases

If all subphases have fixed lengths and subphases are consecutive, i.e., each subphase begins right after the previous one, you can pass a dict with subphase names as keys and subphase durations (in seconds) as values.

[14]:
dict_result_subph = bp.utils.data_processing.split_dict_into_subphases(
    dict_result, subphases={"Start": 20, "Middle": 30, "End": 10}
)

Structure of the SubjectDataDict with a new subphase level added:

[15]:
display_dict_structure(dict_result_subph)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
dict_keys(['Start', 'Middle', 'End'])
'Dataframe shape: (20, 1)'
Heart_Rate
time
1.0 81.142060
2.0 79.324922
3.0 79.640487
4.0 84.765999
5.0 80.456931

Fixed Lengths, No Consecutive Subphases

If all subphases have fixed lengths, but subphase are not consecutive, i.e., do not begin right after the previous subphase, you can pass a dict with subphase names as keys and and a tuple with relative start and end times (not durations!) (in seconds) as values.

[16]:
dict_result_subph = bp.utils.data_processing.split_dict_into_subphases(
    dict_result, subphases={"Start": (0, 10), "Middle": (20, 40), "End": (50, 60)}
)

Structure of the SubjectDataDict with a new subphase level added:

[17]:
display_dict_structure(dict_result_subph)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
dict_keys(['Start', 'Middle', 'End'])
'Dataframe shape: (10, 1)'
Heart_Rate
time
1.0 81.142060
2.0 79.324922
3.0 79.640487
4.0 84.765999
5.0 80.456931

Last Subphase without Fixed Length

If all subphases have fixed length, except the last subphase, which does not have fixed length (e.g. because the last subphase was a phase where feedback was provided to the subject and thus had variable length), but you still want to extract all data from this subphase, you can set the duration of the last subphase to 0. The duration of this subphase is then inferred from the data by finding the subject with the longest recording. Shorter recordings of other subjects are then also automatically cut to their respective duration.

[18]:
dict_result_subph = bp.utils.data_processing.split_dict_into_subphases(
    dict_result, subphases={"Start": 20, "Middle": 30, "End": 0}
)

Structure of the SubjectDataDict with a new subphase level added:

[19]:
display_dict_structure(dict_result_subph)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
dict_keys(['Start', 'Middle', 'End'])
'Dataframe shape: (20, 1)'
Heart_Rate
time
1.0 81.142060
2.0 79.324922
3.0 79.640487
4.0 84.765999
5.0 80.456931

Aggregate Results

Aggregate Over Phases

[20]:
df_result = bp.utils.data_processing.mean_per_subject_dict(
    dict_result, dict_levels=["subject", "phase"], param_name="HR"
)

df_result.head()
[20]:
HR
subject phase
Vp01 Baseline 85.579756
Intervention 77.021302
Stress 88.263152
Recovery 85.855722
Vp02 Baseline 86.877952

Aggregate Over Subphases

[21]:
df_result_subph = bp.utils.data_processing.mean_per_subject_dict(
    dict_result_subph, dict_levels=["subject", "phase", "subphase"], param_name="HR"
)

df_result_subph.head()
[21]:
HR
subject phase subphase
Vp01 Baseline Start 82.387680
Middle 88.853619
End 83.124445
Intervention Start 78.375064
Middle 77.013733

Add Conditions

[22]:
df_result_cond = bp.utils.data_processing.add_subject_conditions(df_result, condition_list)
df_result_cond.head()
[22]:
HR
condition subject phase
Control Vp01 Baseline 85.579756
Intervention 77.021302
Stress 88.263152
Recovery 85.855722
Intervention Vp02 Baseline 86.877952
[23]:
df_result_subph_cond = bp.utils.data_processing.add_subject_conditions(df_result_subph, condition_list)
df_result_subph_cond.head()
[23]:
HR
condition subject phase subphase
Control Vp01 Baseline Start 82.387680
Middle 88.853619
End 83.124445
Intervention Start 78.375064
Middle 77.013733

Compute Parameter

The resulting aggregated data can then be further processed and “population averages” can be computed by averaging the heart rate data of each (sub)phase over all subjects, resulting in mean ± standard error values per (sub)phase, and condition (if applicable). The resulting dataframe is a MeanSeDataFrame, a containing mean and standard error of time-series data in a standardized format.

Without Conditions

[24]:
df_result_mse = bp.utils.data_processing.mean_se_per_phase(df_result)
df_result_mse.head()
[24]:
HR
mean se
phase
Baseline 86.228854 0.649098
Intervention 84.202219 7.180917
Recovery 91.139351 5.283629
Stress 97.608169 9.345016
[25]:
df_result_subph_mse = bp.utils.data_processing.mean_se_per_phase(df_result_subph)
df_result_subph_mse.head()
[25]:
HR
mean se
phase subphase
Baseline Start 82.998921 0.611242
Middle 88.822727 0.030892
End 85.398164 2.273719
Intervention Start 82.948593 4.573529
Middle 80.744660 3.730927

With Conditions

Just calling the same function, but with a different dataframe (where we added the condition as index level).

Note: In this example, the standard error is NaN because we only have one subject per condition and can thus not compute the standard error.

[26]:
bp.utils.data_processing.mean_se_per_phase(df_result_cond).head()
[26]:
HR
mean se
condition phase
Control Baseline 85.579756 NaN
Intervention 77.021302 NaN
Stress 88.263152 NaN
Recovery 85.855722 NaN
Intervention Baseline 86.877952 NaN
[27]:
bp.utils.data_processing.mean_se_per_phase(df_result_subph_cond).head()
[27]:
HR
mean se
condition phase subphase
Control Baseline Start 82.387680 NaN
Middle 88.853619 NaN
End 83.124445 NaN
Intervention Start 78.375064 NaN
Middle 77.013733 NaN

Plotting

The final data can then either be statistically analyzed, or plotted. Here, we plot the mean and standard error of each phase using the plotting function plotting.hr_mean_plot() from BioPsyKit.

[28]:
fig, ax = plt.subplots(figsize=(8, 4))
bp.protocols.plotting.hr_mean_plot(df_result_mse, ax=ax);
../../_images/examples__notebooks_ECG_Analysis_Example_70_0.svg

If we plot our data with plotting.hr_mean_plot(), and our data consists of different phases, where each phase consist of subphases, the phases will be highlighted in the background for better visibility.

[29]:
fig, ax = plt.subplots(figsize=(8, 4))
bp.protocols.plotting.hr_mean_plot(df_result_subph_mse, ax=ax);
../../_images/examples__notebooks_ECG_Analysis_Example_72_0.svg

Ensemble Time-series

For computing Ensemble Time-series the following processing steps will be performed:

  1. Load Data: Load heart rate processing results per subject and concatenate them into a SubjectDataDict, a special nested dictionary structure that contains processed data of all subjects.

  2. Resample: Resample heart rate data of all subjects to 1 Hz. This allows to better split the data further into subphases later.

  3. Normalize Data (optional): Normalize heart rate data with respect to the mean heart rate of a baseline phase. This can allow to better compare heart rate reactions within one study since it removed biases from different resting heart rates.

  4. Select Phases (optional): If the following steps should not be performed on the data of all phases, but only on a subset of them (e.g., because you are not interested in the first phase because it is not relevant for your analysis goal, or you only recorded it for normalizing all other phases against it) this subset phases can be specified and further processing will only be performed on this subset.

  5. Rearrange Data The data in form of a SubjectDataDict is rearranged into a StudyDataDict. A StudyDataDict is constructed from a SubjectDataDict by swapping outer (subject IDs) and inner (phase names) dictionary levels.

  6. Cut Phases to Equal Length: If the data don’t have equal length per subject (e.g., because the last phase had a flexible duration or was not timed accurately), you can cut the data to the same length in order to concatenate the time-series data afterwards. This is beneficial if you want to overlay time-series data from multiple subjects in an so-called Ensemble Plot.

  7. Merge StudyDataDict: Now that the time-series data of all subjects have equal length within each phase they can be merged into a MergedStudyDataDict where the inner level of the nested StudyDataDict is removed by merging the individual data into one dataframe for each phase.

  8. Add Conditions (optional): If multiple study conditions were present during the study the StudyDataDict can further be split by adding a new dictionary level with study conditions to the StudyDataDict.

Note: See Documentation for biopsykit.utils.data_processing for further information of the used functions.

Load Data

Heart Rate Data

[30]:
# or use your own path
ecg_path = bp.example_data.get_ecg_processing_results_path_example()

# Load processed heart rate time-series data of all subjects and concatenate into one dict
subject_data_dict = {}
for file in sorted(ecg_path.glob("hr_result_*.xlsx")):
    subject_id = re.findall("hr_result_(Vp\w+).xlsx", file.name)[0]
    subject_data_dict[subject_id] = pd.read_excel(file, sheet_name=None, index_col="time")

Structure of the SubjectDataDict:

[31]:
display_dict_structure(subject_data_dict)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (91, 1)'
Heart_Rate
time
2019-10-23 12:32:05.797 80.842105
2019-10-23 12:32:07.223 81.269841
2019-10-23 12:32:07.961 78.769231
2019-10-23 12:32:08.723 79.175258
2019-10-23 12:32:09.480 83.934426

Subject Conditions

(will be needed later)

Note: This is only an example, thus, the condition list is manually constructed. Usually, you should import a condition list from a file.

[32]:
condition_list = pd.DataFrame(
    ["Control", "Intervention"], columns=["condition"], index=pd.Index(["Vp01", "Vp02"], name="subject")
)
condition_list
[32]:
condition
subject
Vp01 Control
Vp02 Intervention

Resample Data

Resample all heart rate values to a sampling rate of 1 Hz.

[33]:
dict_result = bp.utils.data_processing.resample_dict_sec(subject_data_dict)

Structure of the resampled SubjectDataDict:

[34]:
display_dict_structure(dict_result)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (64, 1)'
Heart_Rate
time
1.0 81.142060
2.0 79.324922
3.0 79.640487
4.0 84.765999
5.0 80.456931

Normalize Data

Normalize heart rate data with respect to the mean heart rate of a baseline phase. In this case, the baseline phase is called "Baseline". All heart rate samples are then interpreted as “heart rate change relative to baseline in percent”.

[35]:
dict_result_norm = bp.utils.data_processing.normalize_to_phase(dict_result, "Baseline")

Structure of the normalized SubjectDataDict`:

[36]:
display_dict_structure(dict_result_norm)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (64, 1)'
Heart_Rate
time
1.0 -5.185450
2.0 -7.308778
3.0 -6.940040
4.0 -0.950875
5.0 -5.986024

Select Phases

If desired, select only specific phases, such as only “Intervention”, “Stress”, and “Recovery” (i.e., drop the “Baseline” phase because it was only used for normalizing all other phases).

[37]:
dict_result_norm_selected = bp.utils.data_processing.select_dict_phases(
    dict_result_norm, phases=["Intervention", "Stress", "Recovery"]
)

Structure of the normalized SubjectDataDict with only the selected phases:

[38]:
display_dict_structure(dict_result_norm_selected)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (109, 1)'
Heart_Rate
time
1.0 -8.663991
2.0 -7.416155
3.0 -1.054581
4.0 -7.982144
5.0 -6.678360

Rearrange Data

[39]:
dict_study = bp.utils.data_processing.rearrange_subject_data_dict(dict_result_norm)

Structure of the normalized StudyDataDict:

[40]:
display_dict_structure(dict_study)
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
dict_keys(['Vp01', 'Vp02'])
'Dataframe shape: (64, 1)'
Heart_Rate
time
1.0 -5.185450
2.0 -7.308778
3.0 -6.940040
4.0 -0.950875
5.0 -5.986024

Cut Data

Data for all subjects will be cut to the shortest duration for each phase.

Note: If only a subset of phases should be cut, these phases can be specified using the phases parameter.

[41]:
dict_result = bp.utils.data_processing.cut_phases_to_shortest(dict_study)

Structure of the cut StudyDataDict:

[42]:
display_dict_structure(dict_result)
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
dict_keys(['Vp01', 'Vp02'])
'Dataframe shape: (60, 1)'
Heart_Rate
time
1.0 -5.185450
2.0 -7.308778
3.0 -6.940040
4.0 -0.950875
5.0 -5.986024

Merge StudyDataDict

[43]:
dict_merged = bp.utils.data_processing.merge_study_data_dict(dict_result)

Structure of the MergedStudyDataDict:

[44]:
display_dict_structure(dict_merged)
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (60, 2)'
subject Vp01 Vp02
time
1.0 -5.185450 -1.907328
2.0 -7.308778 2.764077
3.0 -6.940040 6.703796
4.0 -0.950875 0.947209
5.0 -5.986024 -14.090102

Add Conditions

[45]:
dict_merged_cond = bp.utils.data_processing.split_subject_conditions(dict_merged, condition_list)

Structure of the MergedStudyDataDict, split into different conditions:

[46]:
display_dict_structure(dict_merged_cond)
dict_keys(['Control', 'Intervention'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (60, 1)'
subject Vp01
time
1.0 -5.185450
2.0 -7.308778
3.0 -6.940040
4.0 -0.950875
5.0 -5.986024

Plotting

After these processing steps we can finally visualize the time-series data ob all subjects in an Ensemble Plot where the data ob all subjects is overlaid and each point in the time-series is plotted as mean ± standard error over all subject. For creating Ensemble plots, BioPsykit provides the function plotting.hr_ensemble_plot.

Note: These plots does not look as pretty as it should because this is just example data from two subjects. If you want to see plots from “actual” data, just scroll down!

[47]:
fig, ax = plt.subplots(figsize=(8, 4))
bp.protocols.plotting.hr_ensemble_plot(dict_merged, ax=ax);
../../_images/examples__notebooks_ECG_Analysis_Example_119_0.svg

If the phases have subphases they can be passed to the plotting function in order to highlight them in the background:

[48]:
subphases = {"Start": 20, "Middle": 100, "End": 20}
subphase_dict = {"Baseline": subphases, "Intervention": subphases, "Stress": subphases, "Recovery": subphases}

fig, ax = plt.subplots(figsize=(8, 4))
bp.protocols.plotting.hr_ensemble_plot(dict_merged, subphases=subphase_dict, ax=ax);
../../_images/examples__notebooks_ECG_Analysis_Example_121_0.svg

Load example data from Excel file. The data is already in the format of a MergedStudyDataDict, so it can directly be passed to plotting.hr_ensemble_plot.

[49]:
dict_merged_norm = bp.example_data.get_hr_ensemble_sample()

print(bp.utils.datatype_helper.is_merged_study_data_dict(dict_merged_norm))
display_dict_structure(dict_merged_norm)
True
dict_keys(['Phase1', 'Phase2', 'Phase3'])
'Dataframe shape: (406, 28)'
Vp01 Vp02 Vp03 Vp04 Vp05 Vp06 Vp07 Vp08 Vp09 Vp10 ... Vp22 Vp24 Vp25 Vp26 Vp27 Vp28 Vp30 Vp31 Vp32 Vp33
time
1 -5.896011 21.372677 -13.380272 -0.952815 5.662782 -4.119631 0.594466 7.365068 2.589674 -4.873666 ... 8.312998 6.064452 -4.047215 -5.267747 -11.698333 -5.405734 -1.370884 3.346634 -5.795628 3.491703
2 -4.745362 21.213832 -7.508542 1.678500 4.286097 -5.328198 11.758334 6.751515 0.005992 3.383603 ... 5.708344 6.728987 -6.630518 -5.409718 -15.351490 -4.091093 -0.836948 -1.557466 -12.685620 8.637926
3 1.967010 14.120346 -4.631881 -4.720704 0.709777 1.238230 7.367597 2.901178 -2.304507 11.329636 ... 1.157688 -8.462778 -6.974983 -7.243909 -12.632172 -3.817170 0.881259 -2.667594 -10.346344 2.835992
4 1.454400 8.943284 -11.498010 -10.179091 -2.759789 0.394513 -8.177153 1.614913 0.885181 9.373362 ... -0.661564 -13.988615 -3.315441 -7.813189 -8.996902 -0.654702 -0.186492 -0.641279 0.950780 0.571289
5 -3.319305 -1.501998 -8.575398 -8.604960 -1.664693 -4.642004 -7.594194 0.708533 -2.946673 12.488329 ... 3.657499 -8.304159 2.546755 -3.969524 -9.432211 6.311569 1.601839 -0.189170 8.193988 -4.193230

5 rows × 28 columns

[50]:
subphases = {"Start": 60, "Middle": 240, "End": 0}
subphase_dict = {"Phase1": subphases, "Phase2": subphases, "Phase3": subphases}

fig, ax = plt.subplots(figsize=(8, 4))
bp.protocols.plotting.hr_ensemble_plot(dict_merged_norm, subphases=subphase_dict, ax=ax);
../../_images/examples__notebooks_ECG_Analysis_Example_124_0.svg

Compute Heart Rate Variability (HRV)

For computing Heart Rate Variability (HRV) the following processing steps will be performed:

  1. Load Data: Load R-peak data per subject and concatenate them into a SubjectDataDict, a special nested dictionary structure that contains processed data of all subjects.

  2. Select Phases (optional): If the following steps should not be performed on the data of all phases, but only on a subset of them (e.g., because you are not interested in the first phase because it is not relevant for your analysis goal, or you only recorded it for normalizing all other phases against it) this subset phases can be specified and further processing will only be performed on this subset.

  3. Split into Subphases (optional): If the different phases of your study consist of subphases, which you want to aggregate and analyze separately, you can further split the data in the SubjectDataDict further into different subphases, which adds a new dictionary level.

  4. Compute HRV: Compute HRV parameters from R-peak data.

  5. Add Conditions (optional): If multiple study conditions were present during the study the StudyDataDict can further be split by adding a new dictionary level with study conditions to the StudyDataDict.

Load Data

R-Peak Data

[51]:
# or use your own path
ecg_path = bp.example_data.get_ecg_processing_results_path_example()

# Load processed r-peaks data of all subjects and concatenate into one dict
rpeaks_subject_data_dict = {}
for file in sorted(ecg_path.glob("rpeaks_result_*.xlsx")):
    subject_id = re.findall("rpeaks_result_(Vp\w+).xlsx", file.name)[0]
    rpeaks_subject_data_dict[subject_id] = pd.read_excel(file, sheet_name=None, index_col="time")
[52]:
# rename to make it consistent with the other examples
dict_result_rpeaks = rpeaks_subject_data_dict

Structure of the R Peaks SubjectDataDict:

[53]:
display_dict_structure(dict_result_rpeaks)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (91, 5)'
R_Peak_Quality R_Peak_Idx RR_Interval R_Peak_Outlier Heart_Rate
time
2019-10-23 12:32:05.797 0.701462 393.0 0.742188 1 80.842105
2019-10-23 12:32:07.223 1.000000 569.0 0.738281 0 81.269841
2019-10-23 12:32:07.961 0.782036 758.0 0.761719 0 78.769231
2019-10-23 12:32:08.723 0.652345 953.0 0.757812 0 79.175258
2019-10-23 12:32:09.480 0.925202 1147.0 0.714844 0 83.934426

Subject Conditions

(will be needed later)

Note: This is only an example, thus, the condition list is manually constructed. Usually, you should import a condition list from a file.

[54]:
condition_list = pd.DataFrame(
    ["Control", "Intervention"], columns=["condition"], index=pd.Index(["Vp01", "Vp02"], name="subject")
)
condition_list
[54]:
condition
subject
Vp01 Control
Vp02 Intervention

Select Phases

If desired, select only specific phases, such as only “Intervention”, “Stress”, and “Recovery”.

[55]:
dict_result_rpeaks_selected = bp.utils.data_processing.select_dict_phases(
    dict_result_rpeaks, phases=["Intervention", "Stress", "Recovery"]
)

Structure of the SubjectDataDict with only the selected phases:

[56]:
display_dict_structure(dict_result_rpeaks_selected)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (139, 5)'
R_Peak_Quality R_Peak_Idx RR_Interval R_Peak_Outlier Heart_Rate
time
2019-10-23 12:33:11.012 0.877853 450.0 0.749132 1 80.092700
2019-10-23 12:33:12.520 0.897152 645.0 0.777344 0 77.185930
2019-10-23 12:33:13.297 0.891966 844.0 0.746094 0 80.418848
2019-10-23 12:33:14.043 0.465520 1035.0 0.707031 0 84.861878
2019-10-23 12:33:14.750 0.906602 1216.0 0.750000 0 80.000000

Split into Subphases

By splitting the data in the SubjectDataDict further into subphases a new index level is added as the most inner dictionary level. In this example, we assume that all subphases are consecutive and have fixed length (see the Aggregated Results example for further options).

[57]:
dict_result_rpeaks_subph = bp.utils.data_processing.split_dict_into_subphases(
    dict_result_rpeaks, subphases={"Start": 20, "Middle": 30, "End": 10}
)

Structure of the SubjectDataDict with a new subphase level added:

[58]:
display_dict_structure(dict_result_rpeaks_subph)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
dict_keys(['Start', 'Middle', 'End'])
'Dataframe shape: (27, 5)'
R_Peak_Quality R_Peak_Idx RR_Interval R_Peak_Outlier Heart_Rate
time
2019-10-23 12:32:05.797 0.701462 393.0 0.742188 1 80.842105
2019-10-23 12:32:07.223 1.000000 569.0 0.738281 0 81.269841
2019-10-23 12:32:07.961 0.782036 758.0 0.761719 0 78.769231
2019-10-23 12:32:08.723 0.652345 953.0 0.757812 0 79.175258
2019-10-23 12:32:09.480 0.925202 1147.0 0.714844 0 83.934426

Compute HRV

Iterate through all subjects and all (sub)phases, call EcgProcessor.hrv_process() and concetenate the results into one dataframe.

Without Subphases

[59]:
dict_hrv_subject = {}
for subject_id, data_dict in dict_result_rpeaks.items():
    list_hrv_phases = []
    for phase, rpeaks in data_dict.items():
        list_hrv_phases.append(EcgProcessor.hrv_process(rpeaks=rpeaks, index=phase, index_name="phase"))
    dict_hrv_subject[subject_id] = pd.concat(list_hrv_phases)

df_hrv = pd.concat(dict_hrv_subject, names=["subject"])
df_hrv.head()
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/envs/latest/lib/python3.9/site-packages/neurokit2/hrv/hrv_nonlinear.py:474: NeuroKitWarning: DFA_alpha2 related indices will not be calculated. The maximum duration of the windows provided for the long-term correlation is smaller than the minimum duration of windows. Refer to the `scale` argument in `nk.fractal_dfa()` for more information.
  warn(
/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/envs/latest/lib/python3.9/site-packages/neurokit2/hrv/hrv_nonlinear.py:474: NeuroKitWarning: DFA_alpha2 related indices will not be calculated. The maximum duration of the windows provided for the long-term correlation is smaller than the minimum duration of windows. Refer to the `scale` argument in `nk.fractal_dfa()` for more information.
  warn(
[59]:
HRV_MeanNN HRV_SDNN HRV_SDANN1 HRV_SDNNI1 HRV_SDANN2 HRV_SDNNI2 HRV_SDANN5 HRV_SDNNI5 HRV_RMSSD HRV_SDSD ... HRV_LZC HRV_DFA_alpha2 HRV_MFDFA_alpha2_Width HRV_MFDFA_alpha2_Peak HRV_MFDFA_alpha2_Mean HRV_MFDFA_alpha2_Max HRV_MFDFA_alpha2_Delta HRV_MFDFA_alpha2_Asymmetry HRV_MFDFA_alpha2_Fluctuation HRV_MFDFA_alpha2_Increment
subject phase
Vp01 Baseline 701.171875 44.086389 NaN NaN NaN NaN NaN NaN 36.399766 36.605573 ... 1.009844 NaN NaN NaN NaN NaN NaN NaN NaN NaN
Intervention 779.551630 48.772965 NaN NaN NaN NaN NaN NaN 53.253237 53.448631 ... 1.133243 1.108132 1.088136 1.327777 1.629693 -1.286655 -2.286655 -0.222539 0.001160 0.107944
Stress 680.145410 51.211148 25.720464 45.331424 NaN NaN NaN NaN 35.764047 35.806996 ... 0.791578 0.741699 0.265318 0.671193 0.629318 1.000000 0.664149 -0.657832 0.000091 0.006495
Recovery 699.218750 50.388794 NaN NaN NaN NaN NaN NaN 29.430528 29.506479 ... 0.908615 0.857308 0.294392 1.055181 1.004504 0.878651 -0.099155 -0.672143 0.000363 0.017242
Vp02 Baseline 691.029744 40.551896 NaN NaN NaN NaN NaN NaN 32.113406 32.304060 ... 0.998500 NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 81 columns

With Subphases

Assuming we only want to compute a certain type of HRV measures (in this example, only time-domain measures), we can specify the hrv_type in EcgProcessor.hrv_process() .

[60]:
dict_hrv_subject = {}
for subject_id, data_dict in dict_result_rpeaks_subph.items():
    dict_hrv_phases = {}
    for phase, phase_dict in data_dict.items():
        list_hrv_subphases = []
        for subphase, rpeaks in phase_dict.items():
            list_hrv_subphases.append(
                EcgProcessor.hrv_process(rpeaks=rpeaks, index=subphase, index_name="phase", hrv_types="hrv_time")
            )
        dict_hrv_phases[phase] = pd.concat(list_hrv_subphases)

    dict_hrv_subject[subject_id] = pd.concat(dict_hrv_phases, names=["phase"])

df_hrv_subph = pd.concat(dict_hrv_subject, names=["subject"])
df_hrv_subph.head()
[60]:
HRV_MeanNN HRV_SDNN HRV_SDANN1 HRV_SDNNI1 HRV_SDANN2 HRV_SDNNI2 HRV_SDANN5 HRV_SDNNI5 HRV_RMSSD HRV_SDSD ... HRV_IQRNN HRV_SDRMSSD HRV_Prc20NN HRV_Prc80NN HRV_pNN50 HRV_pNN20 HRV_MinNN HRV_MaxNN HRV_HTI HRV_TINN
subject phase phase
Vp01 Baseline Start 728.365385 38.515399 NaN NaN NaN NaN NaN NaN 38.121478 38.852299 ... 60.546875 1.010333 691.40625 761.71875 23.076923 61.538462 660.15625 789.06250 8.666667 109.3750
Middle 673.029119 31.651441 NaN NaN NaN NaN NaN NaN 32.801251 33.164488 ... 39.062500 0.964946 650.78125 704.68750 18.181818 43.181818 605.46875 765.62500 3.666667 54.6875
End 724.759615 32.667065 NaN NaN NaN NaN NaN NaN 38.356246 40.060349 ... 31.250000 0.851675 702.34375 752.34375 23.076923 69.230769 664.06250 769.53125 3.250000 62.5000
Intervention Start 762.812500 26.119350 NaN NaN NaN NaN NaN NaN 30.912484 31.418938 ... 39.062500 0.844945 741.40625 781.25000 8.000000 60.000000 707.03125 820.31250 6.250000 39.0625
Middle 779.399671 47.293117 NaN NaN NaN NaN NaN NaN 49.025151 49.689867 ... 62.500000 0.964670 739.84375 818.75000 42.105263 68.421053 671.87500 871.09375 9.500000 78.1250

5 rows × 25 columns

Add Conditions

[61]:
df_hrv_cond = bp.utils.data_processing.add_subject_conditions(df_hrv, condition_list)
df_hrv_cond.head()
[61]:
HRV_MeanNN HRV_SDNN HRV_SDANN1 HRV_SDNNI1 HRV_SDANN2 HRV_SDNNI2 HRV_SDANN5 HRV_SDNNI5 HRV_RMSSD HRV_SDSD ... HRV_LZC HRV_DFA_alpha2 HRV_MFDFA_alpha2_Width HRV_MFDFA_alpha2_Peak HRV_MFDFA_alpha2_Mean HRV_MFDFA_alpha2_Max HRV_MFDFA_alpha2_Delta HRV_MFDFA_alpha2_Asymmetry HRV_MFDFA_alpha2_Fluctuation HRV_MFDFA_alpha2_Increment
condition subject phase
Control Vp01 Baseline 701.171875 44.086389 NaN NaN NaN NaN NaN NaN 36.399766 36.605573 ... 1.009844 NaN NaN NaN NaN NaN NaN NaN NaN NaN
Intervention 779.551630 48.772965 NaN NaN NaN NaN NaN NaN 53.253237 53.448631 ... 1.133243 1.108132 1.088136 1.327777 1.629693 -1.286655 -2.286655 -0.222539 0.001160 0.107944
Stress 680.145410 51.211148 25.720464 45.331424 NaN NaN NaN NaN 35.764047 35.806996 ... 0.791578 0.741699 0.265318 0.671193 0.629318 1.000000 0.664149 -0.657832 0.000091 0.006495
Recovery 699.218750 50.388794 NaN NaN NaN NaN NaN NaN 29.430528 29.506479 ... 0.908615 0.857308 0.294392 1.055181 1.004504 0.878651 -0.099155 -0.672143 0.000363 0.017242
Intervention Vp02 Baseline 691.029744 40.551896 NaN NaN NaN NaN NaN NaN 32.113406 32.304060 ... 0.998500 NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 81 columns

[62]:
df_hrv_subph_cond = bp.utils.data_processing.add_subject_conditions(df_hrv_subph, condition_list)
df_hrv_subph_cond.head()
[62]:
HRV_MeanNN HRV_SDNN HRV_SDANN1 HRV_SDNNI1 HRV_SDANN2 HRV_SDNNI2 HRV_SDANN5 HRV_SDNNI5 HRV_RMSSD HRV_SDSD ... HRV_IQRNN HRV_SDRMSSD HRV_Prc20NN HRV_Prc80NN HRV_pNN50 HRV_pNN20 HRV_MinNN HRV_MaxNN HRV_HTI HRV_TINN
condition subject phase phase
Control Vp01 Baseline Start 728.365385 38.515399 NaN NaN NaN NaN NaN NaN 38.121478 38.852299 ... 60.546875 1.010333 691.40625 761.71875 23.076923 61.538462 660.15625 789.06250 8.666667 109.3750
Middle 673.029119 31.651441 NaN NaN NaN NaN NaN NaN 32.801251 33.164488 ... 39.062500 0.964946 650.78125 704.68750 18.181818 43.181818 605.46875 765.62500 3.666667 54.6875
End 724.759615 32.667065 NaN NaN NaN NaN NaN NaN 38.356246 40.060349 ... 31.250000 0.851675 702.34375 752.34375 23.076923 69.230769 664.06250 769.53125 3.250000 62.5000
Intervention Start 762.812500 26.119350 NaN NaN NaN NaN NaN NaN 30.912484 31.418938 ... 39.062500 0.844945 741.40625 781.25000 8.000000 60.000000 707.03125 820.31250 6.250000 39.0625
Middle 779.399671 47.293117 NaN NaN NaN NaN NaN NaN 49.025151 49.689867 ... 62.500000 0.964670 739.84375 818.75000 42.105263 68.421053 671.87500 871.09375 9.500000 78.1250

5 rows × 25 columns

Note at the End

This example notebook had the purpose to show how to “manually” process such data step by step. However, BioPsyKit also offers an easier, object-oriented, way for performing such processing steps in the biopsykit.protocols module. For that, you just need to follow these steps:

  1. Create a new Prococol instance (choose from a set of pre-defined, established laboratory protocols, such as MIST or TSST, or use the BaseProtocol)

  2. Add heart rate data (and R-peak data, if you additionally want to compute HRV), in the form of a SubjectDataDict to the Protocol instance via Protocol.add_hr_data().

  3. Compute aggregated results by calling Protocol.compute_hr_results(), ensemble time-series by calling Protocol.compute_hr_ensemble(), and HRV data by calling Protocol.compute_hrv_results(). The single processing steps can be enabled or disabled by passing True or False as function arguments (have a look at the Documentation!). Additional parameters can be passed to the processing steps by adding them a dictionary and passing it to the function via the params argument.

[ ]:

Download Notebook
(Right-Click -> Save Link As...)