ECG Processing Example¶
This example illustrates how to import and process electrocardiogram (ECG) data recorded per subject and how to save intermediate processing results so that further analysis can be performed (e.g., in ECG_Analysis_Example.ipynb or in Protocol_Example.ipynb).
Setup and Helper Functions¶
[1]:
from pathlib import Path
import re
import pandas as pd
import numpy as np
from fau_colors import cmaps
import biopsykit as bp
from biopsykit.signals.ecg import EcgProcessor
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
%load_ext autoreload
%autoreload 2
[2]:
plt.close("all")
palette = sns.color_palette(cmaps.faculties)
sns.set_theme(context="notebook", style="ticks", font="sans-serif", palette=palette)
plt.rcParams["figure.figsize"] = (8, 4)
plt.rcParams["pdf.fonttype"] = 42
plt.rcParams["mathtext.default"] = "regular"
palette
[2]:
[3]:
def display_dict_structure(dict_to_display):
_display_dict_recursive(dict_to_display)
def _display_dict_recursive(dict_to_display):
if isinstance(dict_to_display, dict):
display(dict_to_display.keys())
_display_dict_recursive(list(dict_to_display.values())[0])
else:
display("Dataframe shape: {}".format(dict_to_display.shape))
display(dict_to_display.head())
Example 1: 1 Dataset, 1 Phase¶
This example assumes that we have one dataset with only one phase, i.e. the dataset does not need to be split into multiple parts internally.
[4]:
ecg_data, sampling_rate = bp.example_data.get_ecg_example()
ep = EcgProcessor(data=ecg_data, sampling_rate=sampling_rate)
Process ECG Signal¶
Calling ep.ecg_process() will perform R peak detection and perform outlier correcrtion with the default outlier correction pipeline.
[5]:
ep.ecg_process()
Optional: Use other outlier correction parameters¶
Calling ep.outlier_params_default() will list available outlier correction parameters and their default parameter. See the doumentation of ep.outlier_corrections() for further information.
[6]:
# List available outlier correction parameters and their default parameter. See the doumentation of 'EcgProcessor.outlier_corrections()' for further information.
# print(ep.outlier_params_default())
# ep.ecg_process(outlier_correction=['statistical_rr', 'statistical_rr_diff', 'physiological'], outlier_params={'statistical_rr': 2.576, 'statistical_rr_diff': 1.96, 'physiological': (50, 180)})
ECG and Heart Rate Result¶
[7]:
display_dict_structure(ep.ecg_result)
dict_keys(['Data'])
'Dataframe shape: (157790, 6)'
ECG_Raw | ECG_Clean | ECG_Quality | ECG_R_Peaks | R_Peak_Outlier | Heart_Rate | |
---|---|---|---|---|---|---|
time | ||||||
2019-10-23 12:31:51.003906+02:00 | 0.0 | -441.904597 | 0.0 | 0 | 0.0 | 83.883495 |
2019-10-23 12:31:51.007812+02:00 | -1.0 | -436.324537 | 0.0 | 0 | 0.0 | 83.883495 |
2019-10-23 12:31:51.011718+02:00 | -3.0 | -430.760248 | 0.0 | 0 | 0.0 | 83.883495 |
2019-10-23 12:31:51.015625+02:00 | -4.0 | -425.033019 | 0.0 | 0 | 0.0 | 83.883495 |
2019-10-23 12:31:51.019531+02:00 | -5.0 | -419.369665 | 0.0 | 0 | 0.0 | 83.883495 |
[8]:
# Get heart rate and print resulting heart rate
hr_data = ep.heart_rate["Data"]
hr_data.head()
[8]:
Heart_Rate | |
---|---|
time | |
2019-10-23 12:31:51.582031+02:00 | 83.883495 |
2019-10-23 12:31:53.093750+02:00 | 88.275862 |
2019-10-23 12:31:53.773437+02:00 | 81.702128 |
2019-10-23 12:31:54.507812+02:00 | 79.585492 |
2019-10-23 12:31:55.261718+02:00 | 79.175258 |
[9]:
# Plot an overview of the acquired data - with ECG Signal
fig, axs = bp.signals.ecg.plotting.ecg_plot(ep, key="Data")
[10]:
# Plot an overview of the acquired data - without ECG signal
fig, axs = bp.signals.ecg.plotting.ecg_plot(ep, key="Data", plot_ecg_signal=False)
Note: See the documentation of ep.ecg_result(), ep.heart_rate() and plotting.ecg_plot() for further information.
Compute Heart Rate Variability (HRV)¶
Heart rate variability (HRV) is the physiological phenomenon of varying time intervals of consecutive heart beats. HRV is an important marker for the activity of the autonomic nervous system (ANS) since it can be used to assess the activities of the two brances of the ANS: The sympathetic nervous system (SNS) and the parasympathetic nervous system (PNS).
The function EcgProcessor.hrv_process() computes HRV over the complete data. If you want to compute HRV over different subintervals, you need to split the data first.
[11]:
ep.hrv_process(ep, "Data", index="Vp01", index_name="subject_id")
[11]:
HRV_MeanNN | HRV_SDNN | HRV_SDANN1 | HRV_SDNNI1 | HRV_SDANN2 | HRV_SDNNI2 | HRV_SDANN5 | HRV_SDNNI5 | HRV_RMSSD | HRV_SDSD | ... | HRV_SampEn | HRV_ShanEn | HRV_FuzzyEn | HRV_MSEn | HRV_CMSEn | HRV_RCMSEn | HRV_CD | HRV_HFD | HRV_KFD | HRV_LZC | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
subject_id | |||||||||||||||||||||
Vp01 | 702.266876 | 59.242368 | 40.538539 | 45.273643 | 16.463248 | 55.552245 | NaN | NaN | 36.06283 | 36.08348 | ... | 1.46064 | 5.859088 | 1.072725 | 1.226666 | 1.344431 | 1.887899 | 1.773796 | 1.786412 | 3.944204 | 0.804974 |
1 rows × 81 columns
Example 2: 1 Dataset, Multiple Phases¶
This example illustrates the pipeline for one single dataset which contains data from multiple phases.
[13]:
ecg_data, sampling_rate = bp.example_data.get_ecg_example()
For this example, we use the example ECG Data and assume we want to split it into 4 phases (names: Preparation, Stress, Recovery, End) of 3 minutes.
[14]:
# provide list of edge times (start times of the phases and the total end time)
time_intervals = pd.Series(["12:32", "12:35", "12:38", "12:41"], index=["Preparation", "Stress", "Recovery", "End"])
# alternatively: provide dict with start-end times and names per phase
# time_intervals = {"Preparation": ("12:32", "12:35"), "Stress": ("12:35", "12:38"), "Recovery": ("12:38", "12:41")}
Process all Phases¶
[15]:
ep = EcgProcessor(data=ecg_data, sampling_rate=sampling_rate, time_intervals=time_intervals)
ep.ecg_process()
Compute HRV parameters for each Phase¶
Using List Comprehension (EcgProcessor.hrv_process() for each phase) and concat the results into one dataframe using pd.concat().
[16]:
hrv_result = pd.concat([ep.hrv_process(ep, key=key, index=key) for key in ep.phases])
# alternatively: call EcgProcessor.hrv_batch_process()
# hrv_result = ep.hrv_batch_process()
Plot one Phase¶
[17]:
fig, axs = bp.signals.ecg.plotting.ecg_plot(ep, key="Stress")
[18]:
fig, axs = bp.signals.ecg.plotting.hrv_plot(ep, key="Stress")
Example 3: Multiple Subjects, Multiple Phases per Recording (Example Processing)¶
[19]:
ecg_data, sampling_rate = bp.example_data.get_ecg_example()
ecg_data_02, sampling_rate = bp.example_data.get_ecg_example_02()
For this example, we use two ECG example datasets, where the last phase (“Recovery”) differs in length
[20]:
subject_dict = {
"Vp01": {
"Data": ecg_data,
"Time": pd.Series(["12:32", "12:35", "12:38", "12:41"], index=["Preparation", "Stress", "Recovery", "End"]),
},
"Vp02": {
"Data": ecg_data_02,
# The last phase of Vp02 has a length of only 1 minute to demonstrate the functionality of cutting to equal length
"Time": pd.Series(["12:55", "12:58", "13:01", "13:02"], index=["Preparation", "Stress", "Recovery", "End"]),
},
}
Note: For the further steps of the Processing Pipeline, please refer to Example 4.
Example 4: Multiple Subjects, Multiple Phases per Recording (Batch Processing)¶
This example illustrates how to set up a proper data analysis pipeline to process multiple subjects in a loop. It consists of the following steps:
Get Time Information: Load Excel sheet with time information (to process multiple subjects in a loop).
Query Data: Iterate through a folder and search for all files with a specific file pattern optional: Iterate through a folder that contains subfolders for each subjects where data is stored optional: Extract Subject ID either from data filename or from folder name
Process Data:
Load ECG Dataset, split into phases based on time information from Excel sheet.
Perform ECG processing with outlier correction.
Compute Heart Rate for each subject and each phase.
Store and Export Intermediate Processing Results: Heart rate processing results are stored in a SubjectDataDict, a special nested dictionary structure that contains processed data of all subjects, and export processed heart rate data as well as R-peaks (needed for computing HRV parameters) per subject (as Excel files).
Further heart rate processing steps (such as resampling data, splitting data into subphases, normalizing data, computing aggregated results, computing HRV parameters, …) will be performed in ECG_Analysis_Example.ipynb.
Get Time Information¶
[21]:
df_time_log = bp.example_data.get_time_log_example()
# Alternatively: load your own file
# df_time_log = bp.io.load_time_log("<path-to-time-log-file>")
[22]:
df_time_log
[22]:
phase | Baseline | Intervention | Stress | Recovery | End | |
---|---|---|---|---|---|---|
subject | condition | |||||
Vp01 | Intervention | 12:32:05 | 12:33:10 | 12:35:00 | 12:39:45 | 12:42:00 |
Vp02 | Control | 12:54:50 | 12:55:50 | 12:57:55 | 13:00:00 | 13:02:10 |
Query Data¶
[23]:
# path to folder containing ECG raw data
ecg_base_path = bp.example_data.get_ecg_path_example()
# Alternatively: Use your own data
# ecg_base_path = Path("<path-to-ecg-data-folder>")
file_list = list(sorted(ecg_base_path.glob("*.bin")))
print(file_list)
[PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/example_data/ecg/ecg_sample_Vp01.bin'), PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/example_data/ecg/ecg_sample_Vp02.bin')]
Process Data¶
[24]:
from tqdm.auto import tqdm # to visualize for-loop progress
# folder to save processing results (comment to skip)
ecg_export_path = Path("./results/ecg")
# create directory and its parent directories (if it not already exists)
bp.utils.file_handling.mkdirs(ecg_export_path)
# dicionaries to store processing results
dict_hr_subjects = {}
dict_hrv_subjects = {}
# for loop to iterate though subjects
for file_path in tqdm(file_list, desc="Subjects"):
# optional: extract Subject ID from file name; multiple ways, e.g. using regex or by splitting the filename string
subject_id = file_path.name.split(".")[0].split("_")[-1]
# optional: if data is stored in subfolders: additional .glob() on file_path, get subject_id from folder name
# ecg_files = list(sorted(file_path.glob("*")))
# subject_id = file_path.name
# check if folder contains data
# if len(ecg_files) == 0:
# print("No data for subject {}.".format(subject_id))
# the next step depends on the file structure:
# if you only have one recording per subject: load this file
# df, fs = bp.io.load_dataset_nilspod(ecg_files[0])
# if you have more than one recording per subject: loop through them, load the files and e.g. put them into a dictionary
# load dataset
data, fs = bp.io.nilspod.load_dataset_nilspod(file_path)
ep = EcgProcessor(data=data, time_intervals=df_time_log.loc[subject_id], sampling_rate=256.0)
ep.ecg_process(title=subject_id)
# save ecg processing result into HR subject dict
dict_hr_subjects[subject_id] = ep.heart_rate
dict_hrv_subjects[subject_id] = ep.rpeaks
# save HR data and R-Peak data to files in export folder (comment to skip)
# bp.io.ecg.write_hr_phase_dict(ep.heart_rate, ecg_export_path.joinpath("hr_result_{}.xlsx".format(subject_id)))
# bp.io.ecg.write_hr_phase_dict(ep.rpeaks, ecg_export_path.joinpath("rpeaks_result_{}.xlsx".format(subject_id)))
Structure of the SubjectDataDict:
[25]:
display_dict_structure(dict_hr_subjects)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (91, 1)'
Heart_Rate | |
---|---|
time | |
2019-10-23 12:32:05.796875+02:00 | 80.842105 |
2019-10-23 12:32:07.222656+02:00 | 81.269841 |
2019-10-23 12:32:07.960937+02:00 | 78.769231 |
2019-10-23 12:32:08.722656+02:00 | 79.175258 |
2019-10-23 12:32:09.480468+02:00 | 83.934426 |
[26]:
display_dict_structure(dict_hrv_subjects)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (91, 5)'
R_Peak_Quality | R_Peak_Idx | RR_Interval | R_Peak_Outlier | Heart_Rate | |
---|---|---|---|---|---|
time | |||||
2019-10-23 12:32:05.796875+02:00 | 0.649461 | 393.0 | 0.742188 | 1.0 | 80.842105 |
2019-10-23 12:32:07.222656+02:00 | 0.974285 | 569.0 | 0.738281 | 0.0 | 81.269841 |
2019-10-23 12:32:07.960937+02:00 | 0.801104 | 758.0 | 0.761719 | 0.0 | 78.769231 |
2019-10-23 12:32:08.722656+02:00 | 0.633372 | 953.0 | 0.757812 | 0.0 | 79.175258 |
2019-10-23 12:32:09.480468+02:00 | 0.966305 | 1147.0 | 0.714844 | 0.0 | 83.934426 |
[ ]: