ECG Analysis Example¶

This example illustrates how to further process time-series heart rate data extracted from, for instance, ECG data. This includes assigning study conditions, splitting data further into subphases, resampling data, cutting data of all subjects to equal length, normalizing data, rearranging data, as well as computing aggregated results (such as mean and standard error over all subjects for each (sub)phase.

As input, this notebook uses the heart rate processing outputs generated from the ECG processing pipeline as shown in ECG_Processing_Example.ipynb).

Setup and Helper Functions¶

[1]:

from pathlib import Path

import re

import pandas as pd
import numpy as np

from fau_colors import cmaps

import biopsykit as bp
from biopsykit.signals.ecg import EcgProcessor

import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
%load_ext autoreload
%autoreload 2

[2]:

plt.close("all")

palette = sns.color_palette(cmaps.faculties)
sns.set_theme(context="notebook", style="ticks", font="sans-serif", palette=palette)

plt.rcParams["figure.figsize"] = (10, 5)
plt.rcParams["pdf.fonttype"] = 42
plt.rcParams["mathtext.default"] = "regular"

palette

[2]:

[3]:

# path to import and export example processing results
result_path = Path("./results")

[4]:

def display_dict_structure(dict_to_display):
    _display_dict_recursive(dict_to_display)


def _display_dict_recursive(dict_to_display):
    if isinstance(dict_to_display, dict):
        display(dict_to_display.keys())
        _display_dict_recursive(list(dict_to_display.values())[0])
    else:
        display("Dataframe shape: {}".format(dict_to_display.shape))
        display(dict_to_display.head())

Overview¶

When we collect ECG data from a study, we typically extract heart rate (among others), resulting in time-series heart rate data for each subject. These data are then usually processed further in order to visualize results, perform statistical analyses, etc. How the data is further processed depends on the application. In this example, we will look at two different applications:

Aggregated Results: This means that we compute aggregations over time-series data. In the end, we will get tabular data with mean values for each subject within one time interval of the study procedure. We compute these Aggregated Results when we want to analyze differences, for example, between different study conditions, or between different phases of one study (or both).
Ensemble Time-series: This means that we have time-series data where the data of each subject has equal length. In the end, this allows us to concatenate the data of all subjects into a tabular format. We compute these Ensemble Time-series data when we want to visualize time-series data of all subjects in one plot by overlaying the data and computing the mean ± standard deviation over all subjects for each time point (a so-called ensemble plot).

Additionally, we will show how to compute heart rate variability (HRV) parameters each subject and for each (sub)phase of the recorded data.

Aggregated Results¶

For computing Aggregated Results the following processing steps will be performed:

Load Data: Load heart rate processing results per subject and concatenate them into a SubjectDataDict, a special nested dictionary structure that contains processed data of all subjects.
Resample: Resample heart rate data of all subjects to 1 Hz. This allows to better split the data further into subphases later.
Normalize Data (optional): Normalize heart rate data with respect to the mean heart rate of a baseline phase. This can allow to better compare heart rate reactions within one study since it removed biases from different resting heart rates.
Select Phases (optional): If the following steps should not be performed on the data of all phases, but only on a subset of them (e.g., because you are not interested in the first phase because it is not relevant for your analysis goal, or you only recorded it for normalizing all other phases against it) this subset phases can be specified and further processing will only be performed on this subset.
Split into Subphases (optional): If the different phases of your study consist of subphases, which you want to aggregate and analyze separately, you can further split the data in the SubjectDataDict further into different subphases, which adds a new dictionary level.
Aggregate Results: Aggregate time-series data by computing the mean heart rate per (sub)phase for each subject. The results will be concatenated into a dataframe.
Add Conditions (optional): If multiple study conditions were present during the study the study condition can be added as new index level to the dataframe.

Load Data¶

Heart Rate Data¶

Load processed heart rate time-series data of all subjects and concatenate into one dictionary.

[5]:

study_data_dict = {}
# or use your own path
ecg_path = bp.example_data.get_ecg_processing_results_path_example()

for file in sorted(ecg_path.glob("hr_result_*.xlsx")):
    subject_id = re.findall("hr_result_(Vp\w+).xlsx", file.name)[0]
    study_data_dict[subject_id] = pd.read_excel(file, sheet_name=None, index_col="time")

Structure of the SubjectDataDict:

[6]:

display_dict_structure(study_data_dict)

dict_keys(['Vp01', 'Vp02'])

dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])

'Dataframe shape: (91, 1)'

	Heart_Rate
time
2019-10-23 12:32:05.797	80.842105
2019-10-23 12:32:07.223	81.269841
2019-10-23 12:32:07.961	78.769231
2019-10-23 12:32:08.723	79.175258
2019-10-23 12:32:09.480	83.934426

Subject Conditions¶

(will be needed later)

Note: This is only an example, thus, the condition list is manually constructed. Usually, you should import a condition list from a file.

[7]:

condition_list = pd.DataFrame(
    ["Control", "Intervention"], columns=["condition"], index=pd.Index(["Vp01", "Vp02"], name="subject")
)
condition_list

[7]:

	condition
subject
Vp01	Control
Vp02	Intervention

Resample Data¶

Resample all heart rate values to a sampling rate of 1 Hz.

[8]:

dict_result = bp.utils.data_processing.resample_dict_sec(study_data_dict)

Structure of the resampled SubjectDataDict:

[9]:

display_dict_structure(dict_result)

dict_keys(['Vp01', 'Vp02'])

dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])

'Dataframe shape: (64, 1)'

	Heart_Rate
time
1.0	81.142060
2.0	79.324922
3.0	79.640487
4.0	84.765999
5.0	80.456931

Normalize Data¶

Normalize heart rate data with respect to the mean heart rate of a baseline phase. In this case, the baseline phase is called "Baseline". All heart rate samples are then interpreted as “heart rate change relative to baseline in percent”.

[10]:

dict_result_norm = bp.utils.data_processing.normalize_to_phase(dict_result, "Baseline")

Structure of the normalized SubjectDataDict:

[11]:

display_dict_structure(dict_result_norm)

dict_keys(['Vp01', 'Vp02'])

dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])

'Dataframe shape: (64, 1)'

	Heart_Rate
time
1.0	-5.185450
2.0	-7.308778
3.0	-6.940040
4.0	-0.950875
5.0	-5.986024

Select Phases¶

If desired, select only specific phases, such as only “Intervention”, “Stress”, and “Recovery” (i.e., drop the “Baseline” phase because it was only used for normalizing all other phases).

[12]:

dict_result_norm_selected = bp.utils.data_processing.select_dict_phases(
    dict_result_norm, phases=["Intervention", "Stress", "Recovery"]
)

Structure of the normalized SubjectDataDict with only the selected phases:

[13]:

display_dict_structure(dict_result_norm_selected)

dict_keys(['Vp01', 'Vp02'])

dict_keys(['Intervention', 'Stress', 'Recovery'])

'Dataframe shape: (109, 1)'

	Heart_Rate
time
1.0	-8.663991
2.0	-7.416155
3.0	-1.054581
4.0	-7.982144
5.0	-6.678360

Split into Subphases¶

By splitting the data in the SubjectDataDict further into subphases a new index level is added as the most inner dictionary level. You can specify the different subphases in multiple ways (depending on your study protocol):

Fixed Lengths, Consecutive Subphases¶

If all subphases have fixed lengths and subphases are consecutive, i.e., each subphase begins right after the previous one, you can pass a dict with subphase names as keys and subphase durations (in seconds) as values.

[14]:

dict_result_subph = bp.utils.data_processing.split_dict_into_subphases(
    dict_result, subphases={"Start": 20, "Middle": 30, "End": 10}
)

Structure of the SubjectDataDict with a new subphase level added:

[15]:

display_dict_structure(dict_result_subph)

dict_keys(['Vp01', 'Vp02'])

dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])

dict_keys(['Start', 'Middle', 'End'])

'Dataframe shape: (20, 1)'

	Heart_Rate
time
1.0	81.142060
2.0	79.324922
3.0	79.640487
4.0	84.765999
5.0	80.456931

Fixed Lengths, No Consecutive Subphases¶

If all subphases have fixed lengths, but subphase are not consecutive, i.e., do not begin right after the previous subphase, you can pass a dict with subphase names as keys and and a tuple with relative start and end times (not durations!) (in seconds) as values.

[16]:

dict_result_subph = bp.utils.data_processing.split_dict_into_subphases(
    dict_result, subphases={"Start": (0, 10), "Middle": (20, 40), "End": (50, 60)}
)

Structure of the SubjectDataDict with a new subphase level added:

[17]:

display_dict_structure(dict_result_subph)

dict_keys(['Vp01', 'Vp02'])

dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])

dict_keys(['Start', 'Middle', 'End'])

'Dataframe shape: (10, 1)'

	Heart_Rate
time
1.0	81.142060
2.0	79.324922
3.0	79.640487
4.0	84.765999
5.0	80.456931

Last Subphase without Fixed Length¶

If all subphases have fixed length, except the last subphase, which does not have fixed length (e.g. because the last subphase was a phase where feedback was provided to the subject and thus had variable length), but you still want to extract all data from this subphase, you can set the duration of the last subphase to 0. The duration of this subphase is then inferred from the data by finding the subject with the longest recording. Shorter recordings of other subjects are then also automatically cut to their respective duration.

[18]:

dict_result_subph = bp.utils.data_processing.split_dict_into_subphases(
    dict_result, subphases={"Start": 20, "Middle": 30, "End": 0}
)

Structure of the SubjectDataDict with a new subphase level added:

[19]:

display_dict_structure(dict_result_subph)

dict_keys(['Vp01', 'Vp02'])

dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])

dict_keys(['Start', 'Middle', 'End'])

'Dataframe shape: (20, 1)'

	Heart_Rate
time
1.0	81.142060
2.0	79.324922
3.0	79.640487
4.0	84.765999
5.0	80.456931

Aggregate Results¶

Aggregate Over Phases¶

[20]:

df_result = bp.utils.data_processing.mean_per_subject_dict(
    dict_result, dict_levels=["subject", "phase"], param_name="HR"
)

df_result.head()

[20]:

		HR
subject	phase
Vp01	Baseline	85.579756
	Intervention	77.021302
	Stress	88.263152
	Recovery	85.855722
Vp02	Baseline	86.877952

Aggregate Over Subphases¶

[21]:

df_result_subph = bp.utils.data_processing.mean_per_subject_dict(
    dict_result_subph, dict_levels=["subject", "phase", "subphase"], param_name="HR"
)

df_result_subph.head()

[21]:

			HR
subject	phase	subphase
Vp01	Baseline	Start	82.387680
		Middle	88.853619
		End	83.124445
	Intervention	Start	78.375064
	Intervention	Middle	77.013733

Add Conditions¶

[22]:

df_result_cond = bp.utils.data_processing.add_subject_conditions(df_result, condition_list)
df_result_cond.head()

[22]:

			HR
condition	subject	phase
Control	Vp01	Baseline	85.579756
		Intervention	77.021302
		Stress	88.263152
		Recovery	85.855722
Intervention	Vp02	Baseline	86.877952

[23]:

df_result_subph_cond = bp.utils.data_processing.add_subject_conditions(df_result_subph, condition_list)
df_result_subph_cond.head()

[23]:

				HR
condition	subject	phase	subphase
Control	Vp01	Baseline	Start	82.387680
			Middle	88.853619
			End	83.124445
		Intervention	Start	78.375064
		Intervention	Middle	77.013733

Compute Parameter¶

The resulting aggregated data can then be further processed and “population averages” can be computed by averaging the heart rate data of each (sub)phase over all subjects, resulting in mean ± standard error values per (sub)phase, and condition (if applicable). The resulting dataframe is a MeanSeDataFrame, a containing mean and standard error of time-series data in a standardized format.

Without Conditions¶

[24]:

df_result_mse = bp.utils.data_processing.mean_se_per_phase(df_result)
df_result_mse.head()

[24]:

	HR
	mean	se
phase
Baseline	86.228854	0.649098
Intervention	84.202219	7.180917
Recovery	91.139351	5.283629
Stress	97.608169	9.345016

[25]:

df_result_subph_mse = bp.utils.data_processing.mean_se_per_phase(df_result_subph)
df_result_subph_mse.head()

[25]:

		HR
		mean	se
phase	subphase
Baseline	Start	82.998921	0.611242
	Middle	88.822727	0.030892
	End	85.398164	2.273719
Intervention	Start	82.948593	4.573529
Intervention	Middle	80.744660	3.730927

With Conditions¶

Just calling the same function, but with a different dataframe (where we added the condition as index level).

Note: In this example, the standard error is NaN because we only have one subject per condition and can thus not compute the standard error.

[26]:

bp.utils.data_processing.mean_se_per_phase(df_result_cond).head()

[26]:

		HR
		mean	se
condition	phase
Control	Baseline	85.579756	NaN
	Intervention	77.021302	NaN
	Stress	88.263152	NaN
	Recovery	85.855722	NaN
Intervention	Baseline	86.877952	NaN

[27]:

bp.utils.data_processing.mean_se_per_phase(df_result_subph_cond).head()

[27]:

			HR
			mean	se
condition	phase	subphase
Control	Baseline	Start	82.387680	NaN
		Middle	88.853619	NaN
		End	83.124445	NaN
	Intervention	Start	78.375064	NaN
	Intervention	Middle	77.013733	NaN

Plotting¶

The final data can then either be statistically analyzed, or plotted. Here, we plot the mean and standard error of each phase using the plotting function plotting.hr_mean_plot() from BioPsyKit.

[28]:

fig, ax = plt.subplots(figsize=(8, 4))
bp.protocols.plotting.hr_mean_plot(df_result_mse, ax=ax);

../../_images/examples__notebooks_ECG_Analysis_Example_70_0.svg

If we plot our data with plotting.hr_mean_plot(), and our data consists of different phases, where each phase consist of subphases, the phases will be highlighted in the background for better visibility.

[29]:

fig, ax = plt.subplots(figsize=(8, 4))
bp.protocols.plotting.hr_mean_plot(df_result_subph_mse, ax=ax);

../../_images/examples__notebooks_ECG_Analysis_Example_72_0.svg

Ensemble Time-series¶

For computing Ensemble Time-series the following processing steps will be performed:

Load Data: Load heart rate processing results per subject and concatenate them into a SubjectDataDict, a special nested dictionary structure that contains processed data of all subjects.
Resample: Resample heart rate data of all subjects to 1 Hz. This allows to better split the data further into subphases later.
Normalize Data (optional): Normalize heart rate data with respect to the mean heart rate of a baseline phase. This can allow to better compare heart rate reactions within one study since it removed biases from different resting heart rates.
Select Phases (optional): If the following steps should not be performed on the data of all phases, but only on a subset of them (e.g., because you are not interested in the first phase because it is not relevant for your analysis goal, or you only recorded it for normalizing all other phases against it) this subset phases can be specified and further processing will only be performed on this subset.
Rearrange Data The data in form of a SubjectDataDict is rearranged into a StudyDataDict. A StudyDataDict is constructed from a SubjectDataDict by swapping outer (subject IDs) and inner (phase names) dictionary levels.
Cut Phases to Equal Length: If the data don’t have equal length per subject (e.g., because the last phase had a flexible duration or was not timed accurately), you can cut the data to the same length in order to concatenate the time-series data afterwards. This is beneficial if you want to overlay time-series data from multiple subjects in an so-called Ensemble Plot.
Merge StudyDataDict: Now that the time-series data of all subjects have equal length within each phase they can be merged into a MergedStudyDataDict where the inner level of the nested StudyDataDict is removed by merging the individual data into one dataframe for each phase.
Add Conditions (optional): If multiple study conditions were present during the study the StudyDataDict can further be split by adding a new dictionary level with study conditions to the StudyDataDict.

Note: See Documentation for biopsykit.utils.data_processing for further information of the used functions.

Load Data¶

Heart Rate Data¶

[30]:

# or use your own path
ecg_path = bp.example_data.get_ecg_processing_results_path_example()

# Load processed heart rate time-series data of all subjects and concatenate into one dict
subject_data_dict = {}
for file in sorted(ecg_path.glob("hr_result_*.xlsx")):
    subject_id = re.findall("hr_result_(Vp\w+).xlsx", file.name)[0]
    subject_data_dict[subject_id] = pd.read_excel(file, sheet_name=None, index_col="time")

Structure of the SubjectDataDict:

[31]:

display_dict_structure(subject_data_dict)

dict_keys(['Vp01', 'Vp02'])

dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])

'Dataframe shape: (91, 1)'

	Heart_Rate
time
2019-10-23 12:32:05.797	80.842105
2019-10-23 12:32:07.223	81.269841
2019-10-23 12:32:07.961	78.769231
2019-10-23 12:32:08.723	79.175258
2019-10-23 12:32:09.480	83.934426

Subject Conditions¶

(will be needed later)

Note: This is only an example, thus, the condition list is manually constructed. Usually, you should import a condition list from a file.

[32]:

condition_list = pd.DataFrame(
    ["Control", "Intervention"], columns=["condition"], index=pd.Index(["Vp01", "Vp02"], name="subject")
)
condition_list

[32]:

	condition
subject
Vp01	Control
Vp02	Intervention

Resample Data¶

Resample all heart rate values to a sampling rate of 1 Hz.

[33]:

dict_result = bp.utils.data_processing.resample_dict_sec(subject_data_dict)

Structure of the resampled SubjectDataDict:

[34]:

display_dict_structure(dict_result)

dict_keys(['Vp01', 'Vp02'])

dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])

'Dataframe shape: (64, 1)'

	Heart_Rate
time
1.0	81.142060
2.0	79.324922
3.0	79.640487
4.0	84.765999
5.0	80.456931

Normalize Data¶

[35]:

dict_result_norm = bp.utils.data_processing.normalize_to_phase(dict_result, "Baseline")

Structure of the normalized SubjectDataDict`:

[36]:

display_dict_structure(dict_result_norm)

dict_keys(['Vp01', 'Vp02'])

dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])

'Dataframe shape: (64, 1)'

	Heart_Rate
time
1.0	-5.185450
2.0	-7.308778
3.0	-6.940040
4.0	-0.950875
5.0	-5.986024

Select Phases¶

[37]:

dict_result_norm_selected = bp.utils.data_processing.select_dict_phases(
    dict_result_norm, phases=["Intervention", "Stress", "Recovery"]
)

Structure of the normalized SubjectDataDict with only the selected phases:

[38]:

display_dict_structure(dict_result_norm_selected)

dict_keys(['Vp01', 'Vp02'])

dict_keys(['Intervention', 'Stress', 'Recovery'])

'Dataframe shape: (109, 1)'

	Heart_Rate
time
1.0	-8.663991
2.0	-7.416155
3.0	-1.054581
4.0	-7.982144
5.0	-6.678360

Rearrange Data¶

[39]:

dict_study = bp.utils.data_processing.rearrange_subject_data_dict(dict_result_norm)

Structure of the normalized StudyDataDict:

[40]:

display_dict_structure(dict_study)

dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])

dict_keys(['Vp01', 'Vp02'])

'Dataframe shape: (64, 1)'

	Heart_Rate
time
1.0	-5.185450
2.0	-7.308778
3.0	-6.940040
4.0	-0.950875
5.0	-5.986024

Cut Data¶

Data for all subjects will be cut to the shortest duration for each phase.

Note: If only a subset of phases should be cut, these phases can be specified using the phases parameter.

[41]:

dict_result = bp.utils.data_processing.cut_phases_to_shortest(dict_study)

Structure of the cut StudyDataDict:

[42]:

display_dict_structure(dict_result)

dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])

dict_keys(['Vp01', 'Vp02'])

'Dataframe shape: (60, 1)'

	Heart_Rate
time
1.0	-5.185450
2.0	-7.308778
3.0	-6.940040
4.0	-0.950875
5.0	-5.986024

Merge `StudyDataDict`¶

[43]:

dict_merged = bp.utils.data_processing.merge_study_data_dict(dict_result)

Structure of the MergedStudyDataDict:

[44]:

display_dict_structure(dict_merged)

dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])

'Dataframe shape: (60, 2)'

subject	Vp01	Vp02
time
1.0	-5.185450	-1.907328
2.0	-7.308778	2.764077
3.0	-6.940040	6.703796
4.0	-0.950875	0.947209
5.0	-5.986024	-14.090102

Add Conditions¶

[45]:

dict_merged_cond = bp.utils.data_processing.split_subject_conditions(dict_merged, condition_list)

Structure of the MergedStudyDataDict, split into different conditions:

[46]:

display_dict_structure(dict_merged_cond)

dict_keys(['Control', 'Intervention'])

dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])

'Dataframe shape: (60, 1)'

subject	Vp01
time
1.0	-5.185450
2.0	-7.308778
3.0	-6.940040
4.0	-0.950875
5.0	-5.986024

Plotting¶

After these processing steps we can finally visualize the time-series data ob all subjects in an Ensemble Plot where the data ob all subjects is overlaid and each point in the time-series is plotted as mean ± standard error over all subject. For creating Ensemble plots, BioPsykit provides the function plotting.hr_ensemble_plot.

Note: These plots does not look as pretty as it should because this is just example data from two subjects. If you want to see plots from “actual” data, just scroll down!

[47]:

fig, ax = plt.subplots(figsize=(8, 4))
bp.protocols.plotting.hr_ensemble_plot(dict_merged, ax=ax);

../../_images/examples__notebooks_ECG_Analysis_Example_119_0.svg

If the phases have subphases they can be passed to the plotting function in order to highlight them in the background:

[48]:

subphases = {"Start": 20, "Middle": 100, "End": 20}
subphase_dict = {"Baseline": subphases, "Intervention": subphases, "Stress": subphases, "Recovery": subphases}

fig, ax = plt.subplots(figsize=(8, 4))
bp.protocols.plotting.hr_ensemble_plot(dict_merged, subphases=subphase_dict, ax=ax);

../../_images/examples__notebooks_ECG_Analysis_Example_121_0.svg

Load example data from Excel file. The data is already in the format of a MergedStudyDataDict, so it can directly be passed to plotting.hr_ensemble_plot.

[49]:

dict_merged_norm = bp.example_data.get_hr_ensemble_sample()

print(bp.utils.datatype_helper.is_merged_study_data_dict(dict_merged_norm))
display_dict_structure(dict_merged_norm)

True

dict_keys(['Phase1', 'Phase2', 'Phase3'])

'Dataframe shape: (406, 28)'

	Vp01	Vp02	Vp03	Vp04	Vp05	Vp06	Vp07	Vp08	Vp09	Vp10	...	Vp22	Vp24	Vp25	Vp26	Vp27	Vp28	Vp30	Vp31	Vp32	Vp33
time
1	-5.896011	21.372677	-13.380272	-0.952815	5.662782	-4.119631	0.594466	7.365068	2.589674	-4.873666	...	8.312998	6.064452	-4.047215	-5.267747	-11.698333	-5.405734	-1.370884	3.346634	-5.795628	3.491703
2	-4.745362	21.213832	-7.508542	1.678500	4.286097	-5.328198	11.758334	6.751515	0.005992	3.383603	...	5.708344	6.728987	-6.630518	-5.409718	-15.351490	-4.091093	-0.836948	-1.557466	-12.685620	8.637926
3	1.967010	14.120346	-4.631881	-4.720704	0.709777	1.238230	7.367597	2.901178	-2.304507	11.329636	...	1.157688	-8.462778	-6.974983	-7.243909	-12.632172	-3.817170	0.881259	-2.667594	-10.346344	2.835992
4	1.454400	8.943284	-11.498010	-10.179091	-2.759789	0.394513	-8.177153	1.614913	0.885181	9.373362	...	-0.661564	-13.988615	-3.315441	-7.813189	-8.996902	-0.654702	-0.186492	-0.641279	0.950780	0.571289
5	-3.319305	-1.501998	-8.575398	-8.604960	-1.664693	-4.642004	-7.594194	0.708533	-2.946673	12.488329	...	3.657499	-8.304159	2.546755	-3.969524	-9.432211	6.311569	1.601839	-0.189170	8.193988	-4.193230

5 rows × 28 columns

[50]:

subphases = {"Start": 60, "Middle": 240, "End": 0}
subphase_dict = {"Phase1": subphases, "Phase2": subphases, "Phase3": subphases}

fig, ax = plt.subplots(figsize=(8, 4))
bp.protocols.plotting.hr_ensemble_plot(dict_merged_norm, subphases=subphase_dict, ax=ax);

../../_images/examples__notebooks_ECG_Analysis_Example_124_0.svg

Compute Heart Rate Variability (HRV)¶

For computing Heart Rate Variability (HRV) the following processing steps will be performed:

Load Data: Load R-peak data per subject and concatenate them into a SubjectDataDict, a special nested dictionary structure that contains processed data of all subjects.
Select Phases (optional): If the following steps should not be performed on the data of all phases, but only on a subset of them (e.g., because you are not interested in the first phase because it is not relevant for your analysis goal, or you only recorded it for normalizing all other phases against it) this subset phases can be specified and further processing will only be performed on this subset.
Split into Subphases (optional): If the different phases of your study consist of subphases, which you want to aggregate and analyze separately, you can further split the data in the SubjectDataDict further into different subphases, which adds a new dictionary level.
Compute HRV: Compute HRV parameters from R-peak data.
Add Conditions (optional): If multiple study conditions were present during the study the StudyDataDict can further be split by adding a new dictionary level with study conditions to the StudyDataDict.

Load Data¶

R-Peak Data¶

[51]:

# or use your own path
ecg_path = bp.example_data.get_ecg_processing_results_path_example()

# Load processed r-peaks data of all subjects and concatenate into one dict
rpeaks_subject_data_dict = {}
for file in sorted(ecg_path.glob("rpeaks_result_*.xlsx")):
    subject_id = re.findall("rpeaks_result_(Vp\w+).xlsx", file.name)[0]
    rpeaks_subject_data_dict[subject_id] = pd.read_excel(file, sheet_name=None, index_col="time")

[52]:

# rename to make it consistent with the other examples
dict_result_rpeaks = rpeaks_subject_data_dict

Structure of the R Peaks SubjectDataDict:

[53]:

display_dict_structure(dict_result_rpeaks)

dict_keys(['Vp01', 'Vp02'])

dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])

'Dataframe shape: (91, 5)'

	R_Peak_Quality	R_Peak_Idx	RR_Interval	R_Peak_Outlier	Heart_Rate
time
2019-10-23 12:32:05.797	0.701462	393.0	0.742188	1	80.842105
2019-10-23 12:32:07.223	1.000000	569.0	0.738281	0	81.269841
2019-10-23 12:32:07.961	0.782036	758.0	0.761719	0	78.769231
2019-10-23 12:32:08.723	0.652345	953.0	0.757812	0	79.175258
2019-10-23 12:32:09.480	0.925202	1147.0	0.714844	0	83.934426

Subject Conditions¶

(will be needed later)

Note: This is only an example, thus, the condition list is manually constructed. Usually, you should import a condition list from a file.

[54]:

condition_list = pd.DataFrame(
    ["Control", "Intervention"], columns=["condition"], index=pd.Index(["Vp01", "Vp02"], name="subject")
)
condition_list

[54]:

	condition
subject
Vp01	Control
Vp02	Intervention

Select Phases¶

If desired, select only specific phases, such as only “Intervention”, “Stress”, and “Recovery”.

[55]:

dict_result_rpeaks_selected = bp.utils.data_processing.select_dict_phases(
    dict_result_rpeaks, phases=["Intervention", "Stress", "Recovery"]
)

Structure of the SubjectDataDict with only the selected phases:

[56]:

display_dict_structure(dict_result_rpeaks_selected)

dict_keys(['Vp01', 'Vp02'])

dict_keys(['Intervention', 'Stress', 'Recovery'])

'Dataframe shape: (139, 5)'

	R_Peak_Quality	R_Peak_Idx	RR_Interval	R_Peak_Outlier	Heart_Rate
time
2019-10-23 12:33:11.012	0.877853	450.0	0.749132	1	80.092700
2019-10-23 12:33:12.520	0.897152	645.0	0.777344	0	77.185930
2019-10-23 12:33:13.297	0.891966	844.0	0.746094	0	80.418848
2019-10-23 12:33:14.043	0.465520	1035.0	0.707031	0	84.861878
2019-10-23 12:33:14.750	0.906602	1216.0	0.750000	0	80.000000

Split into Subphases¶

By splitting the data in the SubjectDataDict further into subphases a new index level is added as the most inner dictionary level. In this example, we assume that all subphases are consecutive and have fixed length (see the Aggregated Results example for further options).

[57]:

dict_result_rpeaks_subph = bp.utils.data_processing.split_dict_into_subphases(
    dict_result_rpeaks, subphases={"Start": 20, "Middle": 30, "End": 10}
)

Structure of the SubjectDataDict with a new subphase level added:

[58]:

display_dict_structure(dict_result_rpeaks_subph)

dict_keys(['Vp01', 'Vp02'])

dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])

dict_keys(['Start', 'Middle', 'End'])

'Dataframe shape: (27, 5)'

	R_Peak_Quality	R_Peak_Idx	RR_Interval	R_Peak_Outlier	Heart_Rate
time
2019-10-23 12:32:05.797	0.701462	393.0	0.742188	1	80.842105
2019-10-23 12:32:07.223	1.000000	569.0	0.738281	0	81.269841
2019-10-23 12:32:07.961	0.782036	758.0	0.761719	0	78.769231
2019-10-23 12:32:08.723	0.652345	953.0	0.757812	0	79.175258
2019-10-23 12:32:09.480	0.925202	1147.0	0.714844	0	83.934426

Compute HRV¶

Iterate through all subjects and all (sub)phases, call EcgProcessor.hrv_process() and concetenate the results into one dataframe.

Without Subphases¶

[59]:

dict_hrv_subject = {}
for subject_id, data_dict in dict_result_rpeaks.items():
    list_hrv_phases = []
    for phase, rpeaks in data_dict.items():
        list_hrv_phases.append(EcgProcessor.hrv_process(rpeaks=rpeaks, index=phase, index_name="phase"))
    dict_hrv_subject[subject_id] = pd.concat(list_hrv_phases)

df_hrv = pd.concat(dict_hrv_subject, names=["subject"])
df_hrv.head()

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/envs/latest/lib/python3.9/site-packages/neurokit2/hrv/hrv_nonlinear.py:474: NeuroKitWarning: DFA_alpha2 related indices will not be calculated. The maximum duration of the windows provided for the long-term correlation is smaller than the minimum duration of windows. Refer to the `scale` argument in `nk.fractal_dfa()` for more information.
  warn(

/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/envs/latest/lib/python3.9/site-packages/neurokit2/hrv/hrv_nonlinear.py:474: NeuroKitWarning: DFA_alpha2 related indices will not be calculated. The maximum duration of the windows provided for the long-term correlation is smaller than the minimum duration of windows. Refer to the `scale` argument in `nk.fractal_dfa()` for more information.
  warn(

[59]:

		HRV_MeanNN	HRV_SDNN	HRV_SDANN1	HRV_SDNNI1	HRV_SDANN2	HRV_SDNNI2	HRV_SDANN5	HRV_SDNNI5	HRV_RMSSD	HRV_SDSD	...	HRV_LZC	HRV_DFA_alpha2	HRV_MFDFA_alpha2_Width	HRV_MFDFA_alpha2_Peak	HRV_MFDFA_alpha2_Mean	HRV_MFDFA_alpha2_Max	HRV_MFDFA_alpha2_Delta	HRV_MFDFA_alpha2_Asymmetry	HRV_MFDFA_alpha2_Fluctuation	HRV_MFDFA_alpha2_Increment
subject	phase
Vp01	Baseline	701.171875	44.086389	NaN	NaN	NaN	NaN	NaN	NaN	36.399766	36.605573	...	1.009844	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
	Intervention	779.551630	48.772965	NaN	NaN	NaN	NaN	NaN	NaN	53.253237	53.448631	...	1.133243	1.108132	1.088136	1.327777	1.629693	-1.286655	-2.286655	-0.222539	0.001160	0.107944
	Stress	680.145410	51.211148	25.720464	45.331424	NaN	NaN	NaN	NaN	35.764047	35.806996	...	0.791578	0.741699	0.265318	0.671193	0.629318	1.000000	0.664149	-0.657832	0.000091	0.006495
	Recovery	699.218750	50.388794	NaN	NaN	NaN	NaN	NaN	NaN	29.430528	29.506479	...	0.908615	0.857308	0.294392	1.055181	1.004504	0.878651	-0.099155	-0.672143	0.000363	0.017242
Vp02	Baseline	691.029744	40.551896	NaN	NaN	NaN	NaN	NaN	NaN	32.113406	32.304060	...	0.998500	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

5 rows × 81 columns

With Subphases¶

Assuming we only want to compute a certain type of HRV measures (in this example, only time-domain measures), we can specify the hrv_type in EcgProcessor.hrv_process() .

[60]:

dict_hrv_subject = {}
for subject_id, data_dict in dict_result_rpeaks_subph.items():
    dict_hrv_phases = {}
    for phase, phase_dict in data_dict.items():
        list_hrv_subphases = []
        for subphase, rpeaks in phase_dict.items():
            list_hrv_subphases.append(
                EcgProcessor.hrv_process(rpeaks=rpeaks, index=subphase, index_name="phase", hrv_types="hrv_time")
            )
        dict_hrv_phases[phase] = pd.concat(list_hrv_subphases)

    dict_hrv_subject[subject_id] = pd.concat(dict_hrv_phases, names=["phase"])

df_hrv_subph = pd.concat(dict_hrv_subject, names=["subject"])
df_hrv_subph.head()

[60]:

			HRV_MeanNN	HRV_SDNN	HRV_SDANN1	HRV_SDNNI1	HRV_SDANN2	HRV_SDNNI2	HRV_SDANN5	HRV_SDNNI5	HRV_RMSSD	HRV_SDSD	...	HRV_IQRNN	HRV_SDRMSSD	HRV_Prc20NN	HRV_Prc80NN	HRV_pNN50	HRV_pNN20	HRV_MinNN	HRV_MaxNN	HRV_HTI	HRV_TINN
subject	phase	phase
Vp01	Baseline	Start	728.365385	38.515399	NaN	NaN	NaN	NaN	NaN	NaN	38.121478	38.852299	...	60.546875	1.010333	691.40625	761.71875	23.076923	61.538462	660.15625	789.06250	8.666667	109.3750
		Middle	673.029119	31.651441	NaN	NaN	NaN	NaN	NaN	NaN	32.801251	33.164488	...	39.062500	0.964946	650.78125	704.68750	18.181818	43.181818	605.46875	765.62500	3.666667	54.6875
		End	724.759615	32.667065	NaN	NaN	NaN	NaN	NaN	NaN	38.356246	40.060349	...	31.250000	0.851675	702.34375	752.34375	23.076923	69.230769	664.06250	769.53125	3.250000	62.5000
	Intervention	Start	762.812500	26.119350	NaN	NaN	NaN	NaN	NaN	NaN	30.912484	31.418938	...	39.062500	0.844945	741.40625	781.25000	8.000000	60.000000	707.03125	820.31250	6.250000	39.0625
	Intervention	Middle	779.399671	47.293117	NaN	NaN	NaN	NaN	NaN	NaN	49.025151	49.689867	...	62.500000	0.964670	739.84375	818.75000	42.105263	68.421053	671.87500	871.09375	9.500000	78.1250

5 rows × 25 columns

Add Conditions¶

[61]:

df_hrv_cond = bp.utils.data_processing.add_subject_conditions(df_hrv, condition_list)
df_hrv_cond.head()

[61]:

			HRV_MeanNN	HRV_SDNN	HRV_SDANN1	HRV_SDNNI1	HRV_SDANN2	HRV_SDNNI2	HRV_SDANN5	HRV_SDNNI5	HRV_RMSSD	HRV_SDSD	...	HRV_LZC	HRV_DFA_alpha2	HRV_MFDFA_alpha2_Width	HRV_MFDFA_alpha2_Peak	HRV_MFDFA_alpha2_Mean	HRV_MFDFA_alpha2_Max	HRV_MFDFA_alpha2_Delta	HRV_MFDFA_alpha2_Asymmetry	HRV_MFDFA_alpha2_Fluctuation	HRV_MFDFA_alpha2_Increment
condition	subject	phase
Control	Vp01	Baseline	701.171875	44.086389	NaN	NaN	NaN	NaN	NaN	NaN	36.399766	36.605573	...	1.009844	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
		Intervention	779.551630	48.772965	NaN	NaN	NaN	NaN	NaN	NaN	53.253237	53.448631	...	1.133243	1.108132	1.088136	1.327777	1.629693	-1.286655	-2.286655	-0.222539	0.001160	0.107944
		Stress	680.145410	51.211148	25.720464	45.331424	NaN	NaN	NaN	NaN	35.764047	35.806996	...	0.791578	0.741699	0.265318	0.671193	0.629318	1.000000	0.664149	-0.657832	0.000091	0.006495
		Recovery	699.218750	50.388794	NaN	NaN	NaN	NaN	NaN	NaN	29.430528	29.506479	...	0.908615	0.857308	0.294392	1.055181	1.004504	0.878651	-0.099155	-0.672143	0.000363	0.017242
Intervention	Vp02	Baseline	691.029744	40.551896	NaN	NaN	NaN	NaN	NaN	NaN	32.113406	32.304060	...	0.998500	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

5 rows × 81 columns

[62]:

df_hrv_subph_cond = bp.utils.data_processing.add_subject_conditions(df_hrv_subph, condition_list)
df_hrv_subph_cond.head()

[62]:

				HRV_MeanNN	HRV_SDNN	HRV_SDANN1	HRV_SDNNI1	HRV_SDANN2	HRV_SDNNI2	HRV_SDANN5	HRV_SDNNI5	HRV_RMSSD	HRV_SDSD	...	HRV_IQRNN	HRV_SDRMSSD	HRV_Prc20NN	HRV_Prc80NN	HRV_pNN50	HRV_pNN20	HRV_MinNN	HRV_MaxNN	HRV_HTI	HRV_TINN
condition	subject	phase	phase
Control	Vp01	Baseline	Start	728.365385	38.515399	NaN	NaN	NaN	NaN	NaN	NaN	38.121478	38.852299	...	60.546875	1.010333	691.40625	761.71875	23.076923	61.538462	660.15625	789.06250	8.666667	109.3750
			Middle	673.029119	31.651441	NaN	NaN	NaN	NaN	NaN	NaN	32.801251	33.164488	...	39.062500	0.964946	650.78125	704.68750	18.181818	43.181818	605.46875	765.62500	3.666667	54.6875
			End	724.759615	32.667065	NaN	NaN	NaN	NaN	NaN	NaN	38.356246	40.060349	...	31.250000	0.851675	702.34375	752.34375	23.076923	69.230769	664.06250	769.53125	3.250000	62.5000
		Intervention	Start	762.812500	26.119350	NaN	NaN	NaN	NaN	NaN	NaN	30.912484	31.418938	...	39.062500	0.844945	741.40625	781.25000	8.000000	60.000000	707.03125	820.31250	6.250000	39.0625
		Intervention	Middle	779.399671	47.293117	NaN	NaN	NaN	NaN	NaN	NaN	49.025151	49.689867	...	62.500000	0.964670	739.84375	818.75000	42.105263	68.421053	671.87500	871.09375	9.500000	78.1250

5 rows × 25 columns

Note at the End¶

This example notebook had the purpose to show how to “manually” process such data step by step. However, BioPsyKit also offers an easier, object-oriented, way for performing such processing steps in the biopsykit.protocols module. For that, you just need to follow these steps:

Create a new Prococol instance (choose from a set of pre-defined, established laboratory protocols, such as MIST or TSST, or use the BaseProtocol)
Add heart rate data (and R-peak data, if you additionally want to compute HRV), in the form of a SubjectDataDict to the Protocol instance via Protocol.add_hr_data().
Compute aggregated results by calling Protocol.compute_hr_results(), ensemble time-series by calling Protocol.compute_hr_ensemble(), and HRV data by calling Protocol.compute_hrv_results(). The single processing steps can be enabled or disabled by passing True or False as function arguments (have a look at the Documentation!). Additional parameters can be passed to the processing steps by adding them a dictionary and passing it to the function via the params argument.

[ ]:

Download Notebook
(Right-Click -> Save Link As...)

ECG Processing Example EEG Example

ECG Analysis Example¶

Setup and Helper Functions¶

Overview¶

Aggregated Results¶

Load Data¶

Heart Rate Data¶

Subject Conditions¶

Resample Data¶

Normalize Data¶

Select Phases¶

Split into Subphases¶

Fixed Lengths, Consecutive Subphases¶

Fixed Lengths, No Consecutive Subphases¶

Last Subphase without Fixed Length¶

Aggregate Results¶

Aggregate Over Phases¶

Aggregate Over Subphases¶

Add Conditions¶

Compute Parameter¶

Without Conditions¶

With Conditions¶

Plotting¶

Ensemble Time-series¶

Load Data¶

Heart Rate Data¶

Subject Conditions¶

Resample Data¶

Normalize Data¶

Select Phases¶

Rearrange Data¶

Cut Data¶

Merge StudyDataDict¶

Add Conditions¶

Plotting¶

Compute Heart Rate Variability (HRV)¶

Load Data¶

R-Peak Data¶

Subject Conditions¶

Select Phases¶

Split into Subphases¶

Compute HRV¶

Without Subphases¶

With Subphases¶

Add Conditions¶

Note at the End¶

Merge `StudyDataDict`¶