biopsykit.io.io module¶

Module containing different I/O functions to load time log data, subject condition lists, questionnaire data, etc.

biopsykit.io.io.load_long_format_csv(file_path, index_cols=None)[source]¶

Load dataframe stored as long-format from file.

Parameters

file_path (Path or str) – path to file. Must be a csv file
index_cols (str or list of str, optional) – column name (or list of such) of index columns to be used as MultiIndex in the resulting long-format dataframe or None to use all columns except the last one as index columns. Default: None

Returns

dataframe in long-format

Return type

DataFrame

biopsykit.io.io.load_time_log(file_path, subject_col=None, condition_col=None, additional_index_cols=None, phase_cols=None, continuous_time=True, **kwargs)[source]¶

Load time log information from file.

This function can be used to load a file containing “time logs”, i.e., information about start and stop times of recordings or recording phases per subject.

Parameters

file_path (Path or str) – path to time log file. Must either be an Excel or csv file
subject_col (str, optional) – name of column containing subject IDs or None to use default column name subject. According to BioPsyKit’s convention, the subject ID column is expected to have the name subject. If the subject ID column in the file has another name, the column will be renamed in the dataframe returned by this function.
condition_col (str, optional) – name of column containing condition assignments or None to use default column name condition. According to BioPsyKit’s convention, the condition column is expected to have the name condition. If the condition column in the file has another name, the column will be renamed in the dataframe returned by this function.
additional_index_cols (str, list of str, optional) – additional index levels to be added to the dataframe. Can either be a string or a list strings to indicate column name(s) that should be used as index level(s), or None for no additional index levels. Default: None
phase_cols (list of str or dict, optional) – list of column names that contain time log information or None to use all columns. If the column names of the time log dataframe should have different names than the columns in the file, a dict specifying the mapping (column_name : new_column_name) can be passed. Default: None
continuous_time (bool, optional) – flag indicating whether phases are continuous, i.e., whether the end of the previous phase is also the beginning of the next phase or not. Default: True. If continuous_time is set to False, the start and end columns of all phases must have the suffixes “_start” and “_end”, respectively
**kwargs – Additional parameters that are passed to pandas.read_csv() or pandas.read_excel()

Returns

dataframe with time log information

Return type

DataFrame

Raises

FileExtensionError – if file format is none of [“.xls”, “.xlsx”, “.csv”]
ValidationError – if continuous_time is False, but “start” and “end” time columns of each phase do not match or none of these columns were found in the dataframe

Examples

>>> import biopsykit as bp
>>> file_path = "./timelog.csv"
>>> # Example 1:
>>> # load time log file into a pandas dataframe
>>> data = bp.io.load_time_log(file_path)
>>> # Example 2:
>>> # load time log file into a pandas dataframe and specify the "ID" column
>>> # (instead of the default "subject" column) in the time log file to be the index of the dataframe
>>> data = bp.io.load_time_log(file_path, subject_col="ID")
>>> # Example 3:
>>> # load time log file into a pandas dataframe and specify the columns "Phase1", "Phase2", and "Phase3"
>>> # to be used for extracting time information
>>> data = bp.io.load_time_log(
>>>     file_path, phase_cols=["Phase1", "Phase2", "Phase3"]
>>> )
>>> # Example 4:
>>> # load time log file into a pandas dataframe and specify the column "ID" as subject column, the column "Group"
>>> # as condition column, as well as the column "Time" as additional index column.
>>> data = bp.io.load_time_log(file_path,
>>>     subject_col="ID",
>>>     condition_col="Group",
>>>     additional_index_cols=["Time"],
>>>     phase_cols=["Phase1", "Phase2", "Phase3"]
>>> )

biopsykit.io.io.load_subject_condition_list(file_path, subject_col=None, condition_col=None, return_dict=False, **kwargs)[source]¶

Load subject condition assignment from file.

This function can be used to load a file that contains the assignment of subject IDs to study conditions. It will return a dataframe or a dictionary that complies with BioPsyKit’s naming convention, i.e., the subject ID index will be named subject and the condition column will be named condition.

Parameters

file_path (Path or str) – path to time log file. Must either be an Excel or csv file
subject_col (str, optional) – name of column containing subject IDs or None to use default column name subject. According to BioPsyKit’s convention, the subject ID column is expected to have the name subject. If the subject ID column in the file has another name, the column will be renamed in the dataframe returned by this function.
condition_col (str, optional) – name of column containing condition assignments or None to use default column name condition. According to BioPsyKit’s convention, the condition column is expected to have the name condition. If the condition column in the file has another name, the column will be renamed in the dataframe returned by this function.
return_dict (bool, optional) – whether to return a dict with subject IDs per condition (True) or a dataframe (False). Default: False
**kwargs – Additional parameters that are passed tos pandas.read_csv() or pandas.read_excel()

Returns

SubjectConditionDataFrame or
SubjectConditionDict – a standardized pandas dataframe with subject IDs and condition assignments (if return_dict is False) or a standardized dict with subject IDs per group (if return_dict is True)

Raises

FileExtensionError – if file is not a csv or Excel file
ValidationError – if result is not a SubjectConditionDataFrame or a SubjectConditionDict

Return type

Union[biopsykit.utils.datatype_helper._SubjectConditionDataFrame, pandas.core.frame.DataFrame, Dict[str, numpy.ndarray]]

biopsykit.io.io.load_questionnaire_data(file_path, subject_col=None, condition_col=None, additional_index_cols=None, replace_missing_vals=True, remove_nan_rows=True, sheet_name=0, **kwargs)[source]¶

Load questionnaire data from file.

The resulting dataframe will comply with BioPsyKit’s naming conventions, i.e., the subject ID index will be named subject and a potential condition index will be named condition.

Parameters

file_path (Path or str) – path to time log file. Must either be an Excel or csv file
subject_col (str, optional) – name of column containing subject IDs or None to use default column name subject. According to BioPsyKit’s convention, the subject ID column is expected to have the name subject. If the subject ID column in the file has another name, the column will be renamed in the dataframe returned by this function.
condition_col (str, optional) – name of column containing condition assignments or None to use default column name condition. According to BioPsyKit’s convention, the condition column is expected to have the name condition. If the condition column in the file has another name, the column will be renamed in the dataframe returned by this function.
additional_index_cols (str, list of str, optional) – additional index levels to be added to the dataframe. Can either be a string or a list strings to indicate column name(s) that should be used as index level(s), or None for no additional index levels. Default: None
replace_missing_vals (bool, optional) – True to replace encoded “missing values” from software like SPSS (e.g. -77, -99, or -66) to “actual” missing values (NaN). Default: True
remove_nan_rows (bool, optional) – True to remove rows that only contain NaN values (except the index cols), False to keep NaN rows. Default: True
sheet_name (str or int, optional) – sheet_name identifier (str) or sheet_name index (int) if file is an Excel file. Default: 0 (i.e. first sheet in Excel file)

Returns

dataframe with imported questionnaire data

Return type

DataFrame

Raises

FileExtensionError – if file format is none of [“.xls”, “.xlsx”, “.csv”]

biopsykit.io.io.load_pandas_dict_excel(file_path, index_col='time', timezone=None)[source]¶

Load Excel file containing pandas dataframes with time series data of one subject.

Parameters

file_path (Path or str) – path to file
index_col (str, optional) – name of index columns of dataframe or None if no index column is present. Default: “time”
timezone (str or datetime.tzinfo, optional) – timezone of the acquired data for localization (since Excel does not support localized timestamps), either as string of as tzinfo object. Default: “Europe/Berlin”

Returns

dictionary with multiple pandas dataframes

Return type

dict

Raises

FileExtensionError – if file is no Excel file (“.xls” or “.xlsx”)

See also

apply_codebook(): apply codebook to data

biopsykit.io.io.convert_time_log_datetime(time_log, dataset=None, df=None, date=None, timezone=None)[source]¶

Convert the time log information into datetime objects.

This function converts time log information (containing only time, but no date) into datetime objects, thus, adds the start date of the recording. To specify the recording date, either a NilsPod Dataset or a pandas dataframe with a DatetimeIndex must be supplied from which the recording date can be extracted. As an alternative, the date can be specified explicitly via date parameter.

Parameters

time_log (DataFrame) – pandas dataframe with time log information
dataset (Dataset, optional) – NilsPod Dataset object extract time and date information. Default: None
df (DataFrame, optional) – dataframe with DatetimeIndex to extract time and date information. Default: None
date (str or datetime, optional) – datetime object or date string used to convert time log information into datetime. If date is a string, it must be supplied in a common date format, e.g. “dd.mm.yyyy” or “dd/mm/yyyy”. Default: None
timezone (str or datetime.tzinfo, optional) – timezone of the acquired data to convert, either as string of as tzinfo object. Default: “Europe/Berlin”

Returns

pandas dataframe with log time converted into datetime

Return type

DataFrame

Raises

ValueError – if none of dataset, df and date are supplied as argument, or if index of df is not a DatetimeIndex

biopsykit.io.io.write_pandas_dict_excel(data_dict, file_path, index_col=True)[source]¶

Write a dictionary with pandas dataframes to an Excel file.

Parameters

data_dict (dict) – dictionary with pandas dataframes
file_path (Path or str) – path to exported Excel file
index_col (bool, optional) – True to include dataframe index in Excel export, False otherwise. Default: True

Raises

FileExtensionError – if file_path is not an Excel file

biopsykit.io.io.write_result_dict(result_dict, file_path, index_name='subject')[source]¶

Write dictionary with processing results (e.g. HR, HRV, RSA) to csv file.

The keys in the dictionary should be the subject IDs (or any other identifier), the values should be DataFrame. The index level(s) of the exported dataframe can be specified by the index_col parameter.

The dictionary will be concatenated to one large dataframe which will then be saved as csv file.

Parameters

result_dict (dict) – Dictionary containing processing results for all subjects. The keys in the dictionary should be the Subject IDs (or any other identifier), the values should be pandas dataframes
file_path (Path, str) – path to file
index_name (str, optional) – name of the index resulting from concatenting dataframes. Default: subject

Raises

FileExtensionError – if file_path is not a csv or Excel file

Examples

>>>
>>> from biopsykit.io import write_result_dict
>>>
>>> file_path = "./param_results.csv"
>>>
>>> dict_param_output = {
>>> 'S01' : pd.DataFrame(), # e.g., dataframe from mist_param_subphases,
>>> 'S02' : pd.DataFrame(),
>>> # ...
>>> }
>>>
>>> write_result_dict(dict_param_output, file_path=file_path, index_name="subject")

biopsykit.io.eeg module biopsykit.io.nilspod module