ECG Processing Example

This example illustrates how to import and process electrocardiogram (ECG) data recorded per subject and how to save intermediate processing results so that further analysis can be performed (e.g., in ECG_Analysis_Example.ipynb or in Protocol_Example.ipynb).

Setup and Helper Functions

[1]:
from pathlib import Path

import re

import pandas as pd
import numpy as np

from fau_colors import cmaps
import biopsykit as bp
from biopsykit.signals.ecg import EcgProcessor

import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
%load_ext autoreload
%autoreload 2
[2]:
plt.close("all")

palette = sns.color_palette(cmaps.faculties)
sns.set_theme(context="notebook", style="ticks", font="sans-serif", palette=palette)

plt.rcParams["figure.figsize"] = (8, 4)
plt.rcParams["pdf.fonttype"] = 42
plt.rcParams["mathtext.default"] = "regular"

palette
[2]:
[3]:
def display_dict_structure(dict_to_display):
    _display_dict_recursive(dict_to_display)


def _display_dict_recursive(dict_to_display):
    if isinstance(dict_to_display, dict):
        display(dict_to_display.keys())
        _display_dict_recursive(list(dict_to_display.values())[0])
    else:
        display("Dataframe shape: {}".format(dict_to_display.shape))
        display(dict_to_display.head())

Example 1: 1 Dataset, 1 Phase

This example assumes that we have one dataset with only one phase, i.e. the dataset does not need to be split into multiple parts internally.

[4]:
ecg_data, sampling_rate = bp.example_data.get_ecg_example()
ep = EcgProcessor(data=ecg_data, sampling_rate=sampling_rate)

Process ECG Signal

Calling ep.ecg_process() will perform R peak detection and perform outlier correcrtion with the default outlier correction pipeline.

[5]:
ep.ecg_process()

Optional: Use other outlier correction parameters

Calling ep.outlier_params_default() will list available outlier correction parameters and their default parameter. See the doumentation of ep.outlier_corrections() for further information.

[6]:
# List available outlier correction parameters and their default parameter. See the doumentation of 'EcgProcessor.outlier_corrections()' for further information.
# print(ep.outlier_params_default())
# ep.ecg_process(outlier_correction=['statistical_rr', 'statistical_rr_diff', 'physiological'], outlier_params={'statistical_rr': 2.576, 'statistical_rr_diff': 1.96, 'physiological': (50, 180)})

ECG and Heart Rate Result

[7]:
display_dict_structure(ep.ecg_result)
dict_keys(['Data'])
'Dataframe shape: (157790, 6)'
ECG_Raw ECG_Clean ECG_Quality ECG_R_Peaks R_Peak_Outlier Heart_Rate
time
2019-10-23 12:31:51.003906+02:00 0.0 -441.904597 0.0 0 0.0 83.883495
2019-10-23 12:31:51.007812+02:00 -1.0 -436.324537 0.0 0 0.0 83.883495
2019-10-23 12:31:51.011718+02:00 -3.0 -430.760248 0.0 0 0.0 83.883495
2019-10-23 12:31:51.015625+02:00 -4.0 -425.033019 0.0 0 0.0 83.883495
2019-10-23 12:31:51.019531+02:00 -5.0 -419.369665 0.0 0 0.0 83.883495
[8]:
# Get heart rate and print resulting heart rate
hr_data = ep.heart_rate["Data"]
hr_data.head()
[8]:
Heart_Rate
time
2019-10-23 12:31:51.582031+02:00 83.883495
2019-10-23 12:31:53.093750+02:00 88.275862
2019-10-23 12:31:53.773437+02:00 81.702128
2019-10-23 12:31:54.507812+02:00 79.585492
2019-10-23 12:31:55.261718+02:00 79.175258
[9]:
# Plot an overview of the acquired data - with ECG Signal
fig, axs = bp.signals.ecg.plotting.ecg_plot(ep, key="Data")
../../_images/examples__notebooks_ECG_Processing_Example_18_0.svg
[10]:
# Plot an overview of the acquired data - without ECG signal
fig, axs = bp.signals.ecg.plotting.ecg_plot(ep, key="Data", plot_ecg_signal=False)
../../_images/examples__notebooks_ECG_Processing_Example_19_0.svg

Note: See the documentation of ep.ecg_result(), ep.heart_rate() and plotting.ecg_plot() for further information.

Compute Heart Rate Variability (HRV)

Heart rate variability (HRV) is the physiological phenomenon of varying time intervals of consecutive heart beats. HRV is an important marker for the activity of the autonomic nervous system (ANS) since it can be used to assess the activities of the two brances of the ANS: The sympathetic nervous system (SNS) and the parasympathetic nervous system (PNS).

The function EcgProcessor.hrv_process() computes HRV over the complete data. If you want to compute HRV over different subintervals, you need to split the data first.

[11]:
ep.hrv_process(ep, "Data", index="Vp01", index_name="subject_id")
[11]:
HRV_MeanNN HRV_SDNN HRV_SDANN1 HRV_SDNNI1 HRV_SDANN2 HRV_SDNNI2 HRV_SDANN5 HRV_SDNNI5 HRV_RMSSD HRV_SDSD ... HRV_SampEn HRV_ShanEn HRV_FuzzyEn HRV_MSEn HRV_CMSEn HRV_RCMSEn HRV_CD HRV_HFD HRV_KFD HRV_LZC
subject_id
Vp01 702.266876 59.242368 40.538539 45.273643 16.463248 55.552245 NaN NaN 36.06283 36.08348 ... 1.46064 5.859088 1.072725 1.226666 1.344431 1.887899 1.773796 1.786412 3.944204 0.804974

1 rows × 81 columns

Plot HRV Overview

[12]:
fig, axs = bp.signals.ecg.plotting.hrv_plot(ep, "Data")
../../_images/examples__notebooks_ECG_Processing_Example_25_0.svg

Example 2: 1 Dataset, Multiple Phases

This example illustrates the pipeline for one single dataset which contains data from multiple phases.

[13]:
ecg_data, sampling_rate = bp.example_data.get_ecg_example()

For this example, we use the example ECG Data and assume we want to split it into 4 phases (names: Preparation, Stress, Recovery, End) of 3 minutes.

[14]:
# provide list of edge times (start times of the phases and the total end time)
time_intervals = pd.Series(["12:32", "12:35", "12:38", "12:41"], index=["Preparation", "Stress", "Recovery", "End"])
# alternatively: provide dict with start-end times and names per phase
# time_intervals = {"Preparation": ("12:32", "12:35"), "Stress": ("12:35", "12:38"), "Recovery": ("12:38", "12:41")}

Process all Phases

[15]:
ep = EcgProcessor(data=ecg_data, sampling_rate=sampling_rate, time_intervals=time_intervals)
ep.ecg_process()

Compute HRV parameters for each Phase

Using List Comprehension (EcgProcessor.hrv_process() for each phase) and concat the results into one dataframe using pd.concat().

[16]:
hrv_result = pd.concat([ep.hrv_process(ep, key=key, index=key) for key in ep.phases])
# alternatively: call EcgProcessor.hrv_batch_process()
# hrv_result = ep.hrv_batch_process()

Plot one Phase

[17]:
fig, axs = bp.signals.ecg.plotting.ecg_plot(ep, key="Stress")
../../_images/examples__notebooks_ECG_Processing_Example_37_0.svg
[18]:
fig, axs = bp.signals.ecg.plotting.hrv_plot(ep, key="Stress")
../../_images/examples__notebooks_ECG_Processing_Example_38_0.svg

Example 3: Multiple Subjects, Multiple Phases per Recording (Example Processing)

[19]:
ecg_data, sampling_rate = bp.example_data.get_ecg_example()
ecg_data_02, sampling_rate = bp.example_data.get_ecg_example_02()

For this example, we use two ECG example datasets, where the last phase (“Recovery”) differs in length

[20]:
subject_dict = {
    "Vp01": {
        "Data": ecg_data,
        "Time": pd.Series(["12:32", "12:35", "12:38", "12:41"], index=["Preparation", "Stress", "Recovery", "End"]),
    },
    "Vp02": {
        "Data": ecg_data_02,
        # The last phase of Vp02 has a length of only 1 minute to demonstrate the functionality of cutting to equal length
        "Time": pd.Series(["12:55", "12:58", "13:01", "13:02"], index=["Preparation", "Stress", "Recovery", "End"]),
    },
}

Note: For the further steps of the Processing Pipeline, please refer to Example 4.

Example 4: Multiple Subjects, Multiple Phases per Recording (Batch Processing)

This example illustrates how to set up a proper data analysis pipeline to process multiple subjects in a loop. It consists of the following steps:

  1. Get Time Information: Load Excel sheet with time information (to process multiple subjects in a loop).

  2. Query Data: Iterate through a folder and search for all files with a specific file pattern optional: Iterate through a folder that contains subfolders for each subjects where data is stored optional: Extract Subject ID either from data filename or from folder name

  3. Process Data:

    1. Load ECG Dataset, split into phases based on time information from Excel sheet.

    2. Perform ECG processing with outlier correction.

    3. Compute Heart Rate for each subject and each phase.

    4. Store and Export Intermediate Processing Results: Heart rate processing results are stored in a SubjectDataDict, a special nested dictionary structure that contains processed data of all subjects, and export processed heart rate data as well as R-peaks (needed for computing HRV parameters) per subject (as Excel files).

Further heart rate processing steps (such as resampling data, splitting data into subphases, normalizing data, computing aggregated results, computing HRV parameters, …) will be performed in ECG_Analysis_Example.ipynb.

Get Time Information

[21]:
df_time_log = bp.example_data.get_time_log_example()
# Alternatively: load your own file
# df_time_log = bp.io.load_time_log("<path-to-time-log-file>")
[22]:
df_time_log
[22]:
phase Baseline Intervention Stress Recovery End
subject condition
Vp01 Intervention 12:32:05 12:33:10 12:35:00 12:39:45 12:42:00
Vp02 Control 12:54:50 12:55:50 12:57:55 13:00:00 13:02:10

Query Data

[23]:
# path to folder containing ECG raw data
ecg_base_path = bp.example_data.get_ecg_path_example()
# Alternatively: Use your own data
# ecg_base_path = Path("<path-to-ecg-data-folder>")
file_list = list(sorted(ecg_base_path.glob("*.bin")))
print(file_list)
[PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/example_data/ecg/ecg_sample_Vp01.bin'), PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/example_data/ecg/ecg_sample_Vp02.bin')]

Process Data

[24]:
from tqdm.auto import tqdm  # to visualize for-loop progress

# folder to save processing results (comment to skip)
ecg_export_path = Path("./results/ecg")
# create directory and its parent directories (if it not already exists)
bp.utils.file_handling.mkdirs(ecg_export_path)

# dicionaries to store processing results
dict_hr_subjects = {}
dict_hrv_subjects = {}

# for loop to iterate though subjects
for file_path in tqdm(file_list, desc="Subjects"):
    # optional: extract Subject ID from file name; multiple ways, e.g. using regex or by splitting the filename string
    subject_id = file_path.name.split(".")[0].split("_")[-1]

    # optional: if data is stored in subfolders: additional .glob() on file_path, get subject_id from folder name
    # ecg_files = list(sorted(file_path.glob("*")))
    # subject_id = file_path.name

    # check if folder contains data
    # if len(ecg_files) == 0:
    # print("No data for subject {}.".format(subject_id))

    # the next step depends on the file structure:
    # if you only have one recording per subject: load this file
    # df, fs = bp.io.load_dataset_nilspod(ecg_files[0])
    # if you have more than one recording per subject: loop through them, load the files and e.g. put them into a dictionary

    # load dataset
    data, fs = bp.io.nilspod.load_dataset_nilspod(file_path)

    ep = EcgProcessor(data=data, time_intervals=df_time_log.loc[subject_id], sampling_rate=256.0)
    ep.ecg_process(title=subject_id)

    # save ecg processing result into HR subject dict
    dict_hr_subjects[subject_id] = ep.heart_rate
    dict_hrv_subjects[subject_id] = ep.rpeaks

    # save HR data and R-Peak data to files in export folder (comment to skip)
    # bp.io.ecg.write_hr_phase_dict(ep.heart_rate, ecg_export_path.joinpath("hr_result_{}.xlsx".format(subject_id)))
    # bp.io.ecg.write_hr_phase_dict(ep.rpeaks, ecg_export_path.joinpath("rpeaks_result_{}.xlsx".format(subject_id)))

Structure of the SubjectDataDict:

[25]:
display_dict_structure(dict_hr_subjects)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (91, 1)'
Heart_Rate
time
2019-10-23 12:32:05.796875+02:00 80.842105
2019-10-23 12:32:07.222656+02:00 81.269841
2019-10-23 12:32:07.960937+02:00 78.769231
2019-10-23 12:32:08.722656+02:00 79.175258
2019-10-23 12:32:09.480468+02:00 83.934426
[26]:
display_dict_structure(dict_hrv_subjects)
dict_keys(['Vp01', 'Vp02'])
dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])
'Dataframe shape: (91, 5)'
R_Peak_Quality R_Peak_Idx RR_Interval R_Peak_Outlier Heart_Rate
time
2019-10-23 12:32:05.796875+02:00 0.649461 393.0 0.742188 1.0 80.842105
2019-10-23 12:32:07.222656+02:00 0.974285 569.0 0.738281 0.0 81.269841
2019-10-23 12:32:07.960937+02:00 0.801104 758.0 0.761719 0.0 78.769231
2019-10-23 12:32:08.722656+02:00 0.633372 953.0 0.757812 0.0 79.175258
2019-10-23 12:32:09.480468+02:00 0.966305 1147.0 0.714844 0.0 83.934426
[ ]:

Download Notebook
(Right-Click -> Save Link As...)