biopsykit.io package¶

Module providing input/output functions.

biopsykit.io.load_long_format_csv(file_path, index_cols=None)[source]¶

Load dataframe stored as long-format from file.

Parameters

file_path (Path or str) – path to file. Must be a csv file
index_cols (str or list of str, optional) – column name (or list of such) of index columns to be used as MultiIndex in the resulting long-format dataframe or None to use all columns except the last one as index columns. Default: None

Returns

dataframe in long-format

Return type

DataFrame

biopsykit.io.load_time_log(file_path, subject_col=None, condition_col=None, additional_index_cols=None, phase_cols=None, continuous_time=True, **kwargs)[source]¶

Load time log information from file.

This function can be used to load a file containing “time logs”, i.e., information about start and stop times of recordings or recording phases per subject.

Parameters

file_path (Path or str) – path to time log file. Must either be an Excel or csv file
subject_col (str, optional) – name of column containing subject IDs or None to use default column name subject. According to BioPsyKit’s convention, the subject ID column is expected to have the name subject. If the subject ID column in the file has another name, the column will be renamed in the dataframe returned by this function.
condition_col (str, optional) – name of column containing condition assignments or None to use default column name condition. According to BioPsyKit’s convention, the condition column is expected to have the name condition. If the condition column in the file has another name, the column will be renamed in the dataframe returned by this function.
additional_index_cols (str, list of str, optional) – additional index levels to be added to the dataframe. Can either be a string or a list strings to indicate column name(s) that should be used as index level(s), or None for no additional index levels. Default: None
phase_cols (list of str or dict, optional) – list of column names that contain time log information or None to use all columns. If the column names of the time log dataframe should have different names than the columns in the file, a dict specifying the mapping (column_name : new_column_name) can be passed. Default: None
continuous_time (bool, optional) – flag indicating whether phases are continuous, i.e., whether the end of the previous phase is also the beginning of the next phase or not. Default: True. If continuous_time is set to False, the start and end columns of all phases must have the suffixes “_start” and “_end”, respectively
**kwargs – Additional parameters that are passed to pandas.read_csv() or pandas.read_excel()

Returns

dataframe with time log information

Return type

DataFrame

Raises

FileExtensionError – if file format is none of [“.xls”, “.xlsx”, “.csv”]
ValidationError – if continuous_time is False, but “start” and “end” time columns of each phase do not match or none of these columns were found in the dataframe

Examples

>>> import biopsykit as bp
>>> file_path = "./timelog.csv"
>>> # Example 1:
>>> # load time log file into a pandas dataframe
>>> data = bp.io.load_time_log(file_path)
>>> # Example 2:
>>> # load time log file into a pandas dataframe and specify the "ID" column
>>> # (instead of the default "subject" column) in the time log file to be the index of the dataframe
>>> data = bp.io.load_time_log(file_path, subject_col="ID")
>>> # Example 3:
>>> # load time log file into a pandas dataframe and specify the columns "Phase1", "Phase2", and "Phase3"
>>> # to be used for extracting time information
>>> data = bp.io.load_time_log(
>>>     file_path, phase_cols=["Phase1", "Phase2", "Phase3"]
>>> )
>>> # Example 4:
>>> # load time log file into a pandas dataframe and specify the column "ID" as subject column, the column "Group"
>>> # as condition column, as well as the column "Time" as additional index column.
>>> data = bp.io.load_time_log(file_path,
>>>     subject_col="ID",
>>>     condition_col="Group",
>>>     additional_index_cols=["Time"],
>>>     phase_cols=["Phase1", "Phase2", "Phase3"]
>>> )

biopsykit.io.load_atimelogger_file(file_path, timezone=None)[source]¶

Load time log file exported from the aTimeLogger app.

The resulting dataframe will have one row and start and end times of the single phases as columns.

Parameters

file_path (Path or str) – path to time log file. Must a csv file
timezone (str or datetime.tzinfo, optional) – timezone of the time logs, either as string or as tzinfo object. Default: ‘Europe/Berlin’

Returns

time log dataframe

Return type

DataFrame

See also

convert_time_log_datetime(): convert timelog dataframe into dictionary

aTimeLogger app

biopsykit.io.load_subject_condition_list(file_path, subject_col=None, condition_col=None, return_dict=False, **kwargs)[source]¶

Load subject condition assignment from file.

This function can be used to load a file that contains the assignment of subject IDs to study conditions. It will return a dataframe or a dictionary that complies with BioPsyKit’s naming convention, i.e., the subject ID index will be named subject and the condition column will be named condition.

Parameters

file_path (Path or str) – path to time log file. Must either be an Excel or csv file
subject_col (str, optional) – name of column containing subject IDs or None to use default column name subject. According to BioPsyKit’s convention, the subject ID column is expected to have the name subject. If the subject ID column in the file has another name, the column will be renamed in the dataframe returned by this function.
condition_col (str, optional) – name of column containing condition assignments or None to use default column name condition. According to BioPsyKit’s convention, the condition column is expected to have the name condition. If the condition column in the file has another name, the column will be renamed in the dataframe returned by this function.
return_dict (bool, optional) – whether to return a dict with subject IDs per condition (True) or a dataframe (False). Default: False
**kwargs – Additional parameters that are passed tos pandas.read_csv() or pandas.read_excel()

Returns

SubjectConditionDataFrame or
SubjectConditionDict – a standardized pandas dataframe with subject IDs and condition assignments (if return_dict is False) or a standardized dict with subject IDs per group (if return_dict is True)

Raises

FileExtensionError – if file is not a csv or Excel file
ValidationError – if result is not a SubjectConditionDataFrame or a SubjectConditionDict

Return type

Union[biopsykit.utils.datatype_helper._SubjectConditionDataFrame, pandas.core.frame.DataFrame, Dict[str, numpy.ndarray]]

biopsykit.io.load_questionnaire_data(file_path, subject_col=None, condition_col=None, additional_index_cols=None, replace_missing_vals=True, remove_nan_rows=True, sheet_name=0, **kwargs)[source]¶

Load questionnaire data from file.

The resulting dataframe will comply with BioPsyKit’s naming conventions, i.e., the subject ID index will be named subject and a potential condition index will be named condition.

Parameters

file_path (Path or str) – path to time log file. Must either be an Excel or csv file
subject_col (str, optional) – name of column containing subject IDs or None to use default column name subject. According to BioPsyKit’s convention, the subject ID column is expected to have the name subject. If the subject ID column in the file has another name, the column will be renamed in the dataframe returned by this function.
condition_col (str, optional) – name of column containing condition assignments or None to use default column name condition. According to BioPsyKit’s convention, the condition column is expected to have the name condition. If the condition column in the file has another name, the column will be renamed in the dataframe returned by this function.
additional_index_cols (str, list of str, optional) – additional index levels to be added to the dataframe. Can either be a string or a list strings to indicate column name(s) that should be used as index level(s), or None for no additional index levels. Default: None
replace_missing_vals (bool, optional) – True to replace encoded “missing values” from software like SPSS (e.g. -77, -99, or -66) to “actual” missing values (NaN). Default: True
remove_nan_rows (bool, optional) – True to remove rows that only contain NaN values (except the index cols), False to keep NaN rows. Default: True
sheet_name (str or int, optional) – sheet_name identifier (str) or sheet_name index (int) if file is an Excel file. Default: 0 (i.e. first sheet in Excel file)

Returns

dataframe with imported questionnaire data

Return type

DataFrame

Raises

FileExtensionError – if file format is none of [“.xls”, “.xlsx”, “.csv”]

biopsykit.io.load_pandas_dict_excel(file_path, index_col='time', timezone=None)[source]¶

Load Excel file containing pandas dataframes with time series data of one subject.

Parameters

file_path (Path or str) – path to file
index_col (str, optional) – name of index columns of dataframe or None if no index column is present. Default: “time”
timezone (str or datetime.tzinfo, optional) – timezone of the acquired data for localization (since Excel does not support localized timestamps), either as string of as tzinfo object. Default: “Europe/Berlin”

Returns

dictionary with multiple pandas dataframes

Return type

dict

Raises

FileExtensionError – if file is no Excel file (“.xls” or “.xlsx”)

See also

write_pandas_dict_excel: Write dictionary with dataframes to file

biopsykit.io.load_codebook(file_path, **kwargs)[source]¶

Load codebook from file.

A codebook is used to convert numerical values from a dataframe (e.g., from questionnaire data) to categorical values.

Parameters

file_path (Path or str) – file path to codebook
**kwargs – additional arguments to pass to pandas.read_csv() or pandas.read_excel()

Returns

CodebookDataFrame, a dataframe in a standardized format

Return type

DataFrame

See also

apply_codebook(): apply codebook to data

biopsykit.io.convert_time_log_datetime(time_log, dataset=None, df=None, date=None, timezone=None)[source]¶

Convert the time log information into datetime objects.

This function converts time log information (containing only time, but no date) into datetime objects, thus, adds the start date of the recording. To specify the recording date, either a NilsPod Dataset or a pandas dataframe with a DatetimeIndex must be supplied from which the recording date can be extracted. As an alternative, the date can be specified explicitly via date parameter.

Parameters

time_log (DataFrame) – pandas dataframe with time log information
dataset (Dataset, optional) – NilsPod Dataset object extract time and date information. Default: None
df (DataFrame, optional) – dataframe with DatetimeIndex to extract time and date information. Default: None
date (str or datetime, optional) – datetime object or date string used to convert time log information into datetime. If date is a string, it must be supplied in a common date format, e.g. “dd.mm.yyyy” or “dd/mm/yyyy”. Default: None
timezone (str or datetime.tzinfo, optional) – timezone of the acquired data to convert, either as string of as tzinfo object. Default: “Europe/Berlin”

Returns

pandas dataframe with log time converted into datetime

Return type

DataFrame

Raises

ValueError – if none of dataset, df and date are supplied as argument, or if index of df is not a DatetimeIndex

biopsykit.io.convert_time_log_dict(timelog, time_format='time')[source]¶

Convert time log into dictionary.

The resulting dictionary will have the phase names as keys and a tuple with start and end times as values.

Parameters

timelog (DataFrame or Series) – dataframe or series containing timelog information
time_format ("str" or "time", optional) – “str” to convert entries in dictionary to string, “time” to keep them as time objects. Default: “time”

Returns

dictionary with start and end times of each phase

Return type

dict

See also

biopsykit.utils.data_processing.split_data(): split data based on time intervals

biopsykit.io.write_pandas_dict_excel(data_dict, file_path, index_col=True)[source]¶

Write a dictionary with pandas dataframes to an Excel file.

Parameters

data_dict (dict) – dictionary with pandas dataframes
file_path (Path or str) – path to exported Excel file
index_col (bool, optional) – True to include dataframe index in Excel export, False otherwise. Default: True

Raises

FileExtensionError – if file_path is not an Excel file

biopsykit.io.write_result_dict(result_dict, file_path, index_name='subject')[source]¶

Write dictionary with processing results (e.g. HR, HRV, RSA) to csv file.

The keys in the dictionary should be the subject IDs (or any other identifier), the values should be DataFrame. The index level(s) of the exported dataframe can be specified by the index_col parameter.

The dictionary will be concatenated to one large dataframe which will then be saved as csv file.

Parameters

result_dict (dict) – Dictionary containing processing results for all subjects. The keys in the dictionary should be the Subject IDs (or any other identifier), the values should be pandas dataframes
file_path (Path, str) – path to file
index_name (str, optional) – name of the index resulting from concatenting dataframes. Default: subject

Raises

FileExtensionError – if file_path is not a csv or Excel file

Examples

>>>
>>> from biopsykit.io import write_result_dict
>>>
>>> file_path = "./param_results.csv"
>>>
>>> dict_param_output = {
>>> 'S01' : pd.DataFrame(), # e.g., dataframe from mist_param_subphases,
>>> 'S02' : pd.DataFrame(),
>>> # ...
>>> }
>>>
>>> write_result_dict(dict_param_output, file_path=file_path, index_name="subject")

Submodules¶

biopsykit.classification.utils module biopsykit.io.biomarker module