ECG Processing Example — BioPsyKit 0.10.2 documentation

Setup and Helper Functions¶

[1]:

from pathlib import Path

import re

import pandas as pd
import numpy as np

from fau_colors import cmaps
import biopsykit as bp
from biopsykit.signals.ecg import EcgProcessor

import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
%load_ext autoreload
%autoreload 2

[2]:

plt.close("all")

palette = sns.color_palette(cmaps.faculties)
sns.set_theme(context="notebook", style="ticks", font="sans-serif", palette=palette)

plt.rcParams["figure.figsize"] = (8, 4)
plt.rcParams["pdf.fonttype"] = 42
plt.rcParams["mathtext.default"] = "regular"

palette

[2]:

[3]:

def display_dict_structure(dict_to_display):
    _display_dict_recursive(dict_to_display)


def _display_dict_recursive(dict_to_display):
    if isinstance(dict_to_display, dict):
        display(dict_to_display.keys())
        _display_dict_recursive(list(dict_to_display.values())[0])
    else:
        display("Dataframe shape: {}".format(dict_to_display.shape))
        display(dict_to_display.head())

Example 1: 1 Dataset, 1 Phase¶

This example assumes that we have one dataset with only one phase, i.e. the dataset does not need to be split into multiple parts internally.

[4]:

ecg_data, sampling_rate = bp.example_data.get_ecg_example()
ep = EcgProcessor(data=ecg_data, sampling_rate=sampling_rate)

Process ECG Signal¶

Calling ep.ecg_process() will perform R peak detection and perform outlier correcrtion with the default outlier correction pipeline.

[5]:

ep.ecg_process()

Optional: Use other outlier correction parameters¶

Calling ep.outlier_params_default() will list available outlier correction parameters and their default parameter. See the doumentation of ep.outlier_corrections() for further information.

[6]:

# List available outlier correction parameters and their default parameter. See the doumentation of 'EcgProcessor.outlier_corrections()' for further information.
# print(ep.outlier_params_default())
# ep.ecg_process(outlier_correction=['statistical_rr', 'statistical_rr_diff', 'physiological'], outlier_params={'statistical_rr': 2.576, 'statistical_rr_diff': 1.96, 'physiological': (50, 180)})

ECG and Heart Rate Result¶

[7]:

display_dict_structure(ep.ecg_result)

dict_keys(['Data'])

'Dataframe shape: (157790, 6)'

	ECG_Raw	ECG_Clean	ECG_Quality	ECG_R_Peaks	R_Peak_Outlier	Heart_Rate
time
2019-10-23 12:31:51.003906+02:00	0.0	-441.904597	0.0	0	0.0	83.883495
2019-10-23 12:31:51.007812+02:00	-1.0	-436.324537	0.0	0	0.0	83.883495
2019-10-23 12:31:51.011718+02:00	-3.0	-430.760248	0.0	0	0.0	83.883495
2019-10-23 12:31:51.015625+02:00	-4.0	-425.033019	0.0	0	0.0	83.883495
2019-10-23 12:31:51.019531+02:00	-5.0	-419.369665	0.0	0	0.0	83.883495

[8]:

# Get heart rate and print resulting heart rate
hr_data = ep.heart_rate["Data"]
hr_data.head()

[8]:

	Heart_Rate
time
2019-10-23 12:31:51.582031+02:00	83.883495
2019-10-23 12:31:53.093750+02:00	88.275862
2019-10-23 12:31:53.773437+02:00	81.702128
2019-10-23 12:31:54.507812+02:00	79.585492
2019-10-23 12:31:55.261718+02:00	79.175258

[9]:

# Plot an overview of the acquired data - with ECG Signal
fig, axs = bp.signals.ecg.plotting.ecg_plot(ep, key="Data")

../../_images/examples__notebooks_ECG_Processing_Example_18_0.svg

[10]:

# Plot an overview of the acquired data - without ECG signal
fig, axs = bp.signals.ecg.plotting.ecg_plot(ep, key="Data", plot_ecg_signal=False)

../../_images/examples__notebooks_ECG_Processing_Example_19_0.svg

Note: See the documentation of ep.ecg_result(), ep.heart_rate() and plotting.ecg_plot() for further information.

Compute Heart Rate Variability (HRV)¶

Heart rate variability (HRV) is the physiological phenomenon of varying time intervals of consecutive heart beats. HRV is an important marker for the activity of the autonomic nervous system (ANS) since it can be used to assess the activities of the two brances of the ANS: The sympathetic nervous system (SNS) and the parasympathetic nervous system (PNS).

The function EcgProcessor.hrv_process() computes HRV over the complete data. If you want to compute HRV over different subintervals, you need to split the data first.

[11]:

ep.hrv_process(ep, "Data", index="Vp01", index_name="subject_id")

[11]:

	HRV_MeanNN	HRV_SDNN	HRV_SDANN1	HRV_SDNNI1	HRV_SDANN2	HRV_SDNNI2	HRV_SDANN5	HRV_SDNNI5	HRV_RMSSD	HRV_SDSD	...	HRV_SampEn	HRV_ShanEn	HRV_FuzzyEn	HRV_MSEn	HRV_CMSEn	HRV_RCMSEn	HRV_CD	HRV_HFD	HRV_KFD	HRV_LZC
subject_id
Vp01	702.266876	59.242368	40.538539	45.273643	16.463248	55.552245	NaN	NaN	36.06283	36.08348	...	1.46064	5.859088	1.072725	1.226666	1.344431	1.887899	1.773796	1.786412	3.944204	0.804974

1 rows × 81 columns

Plot HRV Overview¶

[12]:

fig, axs = bp.signals.ecg.plotting.hrv_plot(ep, "Data")

../../_images/examples__notebooks_ECG_Processing_Example_25_0.svg

Example 2: 1 Dataset, Multiple Phases¶

This example illustrates the pipeline for one single dataset which contains data from multiple phases.

[13]:

ecg_data, sampling_rate = bp.example_data.get_ecg_example()

For this example, we use the example ECG Data and assume we want to split it into 4 phases (names: Preparation, Stress, Recovery, End) of 3 minutes.

[14]:

# provide list of edge times (start times of the phases and the total end time)
time_intervals = pd.Series(["12:32", "12:35", "12:38", "12:41"], index=["Preparation", "Stress", "Recovery", "End"])
# alternatively: provide dict with start-end times and names per phase
# time_intervals = {"Preparation": ("12:32", "12:35"), "Stress": ("12:35", "12:38"), "Recovery": ("12:38", "12:41")}

Process all Phases¶

[15]:

ep = EcgProcessor(data=ecg_data, sampling_rate=sampling_rate, time_intervals=time_intervals)
ep.ecg_process()

Compute HRV parameters for each Phase¶

Using List Comprehension (EcgProcessor.hrv_process() for each phase) and concat the results into one dataframe using pd.concat().

[16]:

hrv_result = pd.concat([ep.hrv_process(ep, key=key, index=key) for key in ep.phases])
# alternatively: call EcgProcessor.hrv_batch_process()
# hrv_result = ep.hrv_batch_process()

Plot one Phase¶

[17]:

fig, axs = bp.signals.ecg.plotting.ecg_plot(ep, key="Stress")

../../_images/examples__notebooks_ECG_Processing_Example_37_0.svg

[18]:

fig, axs = bp.signals.ecg.plotting.hrv_plot(ep, key="Stress")

../../_images/examples__notebooks_ECG_Processing_Example_38_0.svg

Example 3: Multiple Subjects, Multiple Phases per Recording (Example Processing)¶

[19]:

ecg_data, sampling_rate = bp.example_data.get_ecg_example()
ecg_data_02, sampling_rate = bp.example_data.get_ecg_example_02()

For this example, we use two ECG example datasets, where the last phase (“Recovery”) differs in length

[20]:

subject_dict = {
    "Vp01": {
        "Data": ecg_data,
        "Time": pd.Series(["12:32", "12:35", "12:38", "12:41"], index=["Preparation", "Stress", "Recovery", "End"]),
    },
    "Vp02": {
        "Data": ecg_data_02,
        # The last phase of Vp02 has a length of only 1 minute to demonstrate the functionality of cutting to equal length
        "Time": pd.Series(["12:55", "12:58", "13:01", "13:02"], index=["Preparation", "Stress", "Recovery", "End"]),
    },
}

Note: For the further steps of the Processing Pipeline, please refer to Example 4.

Example 4: Multiple Subjects, Multiple Phases per Recording (Batch Processing)¶

This example illustrates how to set up a proper data analysis pipeline to process multiple subjects in a loop. It consists of the following steps:

Get Time Information: Load Excel sheet with time information (to process multiple subjects in a loop).
Query Data: Iterate through a folder and search for all files with a specific file pattern optional: Iterate through a folder that contains subfolders for each subjects where data is stored optional: Extract Subject ID either from data filename or from folder name
Process Data:
1. Load ECG Dataset, split into phases based on time information from Excel sheet.
2. Perform ECG processing with outlier correction.
3. Compute Heart Rate for each subject and each phase.
4. Store and Export Intermediate Processing Results: Heart rate processing results are stored in a SubjectDataDict, a special nested dictionary structure that contains processed data of all subjects, and export processed heart rate data as well as R-peaks (needed for computing HRV parameters) per subject (as Excel files).

Further heart rate processing steps (such as resampling data, splitting data into subphases, normalizing data, computing aggregated results, computing HRV parameters, …) will be performed in ECG_Analysis_Example.ipynb.

Get Time Information¶

[21]:

df_time_log = bp.example_data.get_time_log_example()
# Alternatively: load your own file
# df_time_log = bp.io.load_time_log("<path-to-time-log-file>")

[22]:

df_time_log

[22]:

	phase	Baseline	Intervention	Stress	Recovery	End
subject	condition
Vp01	Intervention	12:32:05	12:33:10	12:35:00	12:39:45	12:42:00
Vp02	Control	12:54:50	12:55:50	12:57:55	13:00:00	13:02:10

Query Data¶

[23]:

# path to folder containing ECG raw data
ecg_base_path = bp.example_data.get_ecg_path_example()
# Alternatively: Use your own data
# ecg_base_path = Path("<path-to-ecg-data-folder>")
file_list = list(sorted(ecg_base_path.glob("*.bin")))
print(file_list)

[PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/example_data/ecg/ecg_sample_Vp01.bin'), PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/biopsykit/checkouts/latest/example_data/ecg/ecg_sample_Vp02.bin')]

Process Data¶

[24]:

from tqdm.auto import tqdm  # to visualize for-loop progress

# folder to save processing results (comment to skip)
ecg_export_path = Path("./results/ecg")
# create directory and its parent directories (if it not already exists)
bp.utils.file_handling.mkdirs(ecg_export_path)

# dicionaries to store processing results
dict_hr_subjects = {}
dict_hrv_subjects = {}

# for loop to iterate though subjects
for file_path in tqdm(file_list, desc="Subjects"):
    # optional: extract Subject ID from file name; multiple ways, e.g. using regex or by splitting the filename string
    subject_id = file_path.name.split(".")[0].split("_")[-1]

    # optional: if data is stored in subfolders: additional .glob() on file_path, get subject_id from folder name
    # ecg_files = list(sorted(file_path.glob("*")))
    # subject_id = file_path.name

    # check if folder contains data
    # if len(ecg_files) == 0:
    # print("No data for subject {}.".format(subject_id))

    # the next step depends on the file structure:
    # if you only have one recording per subject: load this file
    # df, fs = bp.io.load_dataset_nilspod(ecg_files[0])
    # if you have more than one recording per subject: loop through them, load the files and e.g. put them into a dictionary

    # load dataset
    data, fs = bp.io.nilspod.load_dataset_nilspod(file_path)

    ep = EcgProcessor(data=data, time_intervals=df_time_log.loc[subject_id], sampling_rate=256.0)
    ep.ecg_process(title=subject_id)

    # save ecg processing result into HR subject dict
    dict_hr_subjects[subject_id] = ep.heart_rate
    dict_hrv_subjects[subject_id] = ep.rpeaks

    # save HR data and R-Peak data to files in export folder (comment to skip)
    # bp.io.ecg.write_hr_phase_dict(ep.heart_rate, ecg_export_path.joinpath("hr_result_{}.xlsx".format(subject_id)))
    # bp.io.ecg.write_hr_phase_dict(ep.rpeaks, ecg_export_path.joinpath("rpeaks_result_{}.xlsx".format(subject_id)))

Structure of the SubjectDataDict:

[25]:

display_dict_structure(dict_hr_subjects)

dict_keys(['Vp01', 'Vp02'])

dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])

'Dataframe shape: (91, 1)'

	Heart_Rate
time
2019-10-23 12:32:05.796875+02:00	80.842105
2019-10-23 12:32:07.222656+02:00	81.269841
2019-10-23 12:32:07.960937+02:00	78.769231
2019-10-23 12:32:08.722656+02:00	79.175258
2019-10-23 12:32:09.480468+02:00	83.934426

[26]:

display_dict_structure(dict_hrv_subjects)

dict_keys(['Vp01', 'Vp02'])

dict_keys(['Baseline', 'Intervention', 'Stress', 'Recovery'])

'Dataframe shape: (91, 5)'

	R_Peak_Quality	R_Peak_Idx	RR_Interval	R_Peak_Outlier	Heart_Rate
time
2019-10-23 12:32:05.796875+02:00	0.649461	393.0	0.742188	1.0	80.842105
2019-10-23 12:32:07.222656+02:00	0.974285	569.0	0.738281	0.0	81.269841
2019-10-23 12:32:07.960937+02:00	0.801104	758.0	0.761719	0.0	78.769231
2019-10-23 12:32:08.722656+02:00	0.633372	953.0	0.757812	0.0	79.175258
2019-10-23 12:32:09.480468+02:00	0.966305	1147.0	0.714844	0.0	83.934426

[ ]:

ECG Processing Example¶