biopsykit.io.io module¶
Module containing different I/O functions to load time log data, subject condition lists, questionnaire data, etc.
- biopsykit.io.io.load_long_format_csv(file_path, index_cols=None)[source]¶
Load dataframe stored as long-format from file.
- Parameters
- Returns
dataframe in long-format
- Return type
- biopsykit.io.io.load_time_log(file_path, subject_col=None, condition_col=None, additional_index_cols=None, phase_cols=None, continuous_time=True, **kwargs)[source]¶
Load time log information from file.
This function can be used to load a file containing “time logs”, i.e., information about start and stop times of recordings or recording phases per subject.
- Parameters
file_path (
Path
or str) – path to time log file. Must either be an Excel or csv filesubject_col (str, optional) – name of column containing subject IDs or
None
to use default column namesubject
. According to BioPsyKit’s convention, the subject ID column is expected to have the namesubject
. If the subject ID column in the file has another name, the column will be renamed in the dataframe returned by this function.condition_col (str, optional) – name of column containing condition assignments or
None
to use default column namecondition
. According to BioPsyKit’s convention, the condition column is expected to have the namecondition
. If the condition column in the file has another name, the column will be renamed in the dataframe returned by this function.additional_index_cols (str, list of str, optional) – additional index levels to be added to the dataframe. Can either be a string or a list strings to indicate column name(s) that should be used as index level(s), or
None
for no additional index levels. Default:None
phase_cols (list of str or dict, optional) – list of column names that contain time log information or
None
to use all columns. If the column names of the time log dataframe should have different names than the columns in the file, a dict specifying the mapping (column_name : new_column_name) can be passed. Default:None
continuous_time (bool, optional) – flag indicating whether phases are continuous, i.e., whether the end of the previous phase is also the beginning of the next phase or not. Default:
True
. Ifcontinuous_time
is set toFalse
, the start and end columns of all phases must have the suffixes “_start” and “_end”, respectively**kwargs – Additional parameters that are passed to
pandas.read_csv()
orpandas.read_excel()
- Returns
dataframe with time log information
- Return type
- Raises
FileExtensionError – if file format is none of [“.xls”, “.xlsx”, “.csv”]
ValidationError – if
continuous_time
isFalse
, but “start” and “end” time columns of each phase do not match or none of these columns were found in the dataframe
Examples
>>> import biopsykit as bp >>> file_path = "./timelog.csv" >>> # Example 1: >>> # load time log file into a pandas dataframe >>> data = bp.io.load_time_log(file_path) >>> # Example 2: >>> # load time log file into a pandas dataframe and specify the "ID" column >>> # (instead of the default "subject" column) in the time log file to be the index of the dataframe >>> data = bp.io.load_time_log(file_path, subject_col="ID") >>> # Example 3: >>> # load time log file into a pandas dataframe and specify the columns "Phase1", "Phase2", and "Phase3" >>> # to be used for extracting time information >>> data = bp.io.load_time_log( >>> file_path, phase_cols=["Phase1", "Phase2", "Phase3"] >>> ) >>> # Example 4: >>> # load time log file into a pandas dataframe and specify the column "ID" as subject column, the column "Group" >>> # as condition column, as well as the column "Time" as additional index column. >>> data = bp.io.load_time_log(file_path, >>> subject_col="ID", >>> condition_col="Group", >>> additional_index_cols=["Time"], >>> phase_cols=["Phase1", "Phase2", "Phase3"] >>> )
- biopsykit.io.io.load_subject_condition_list(file_path, subject_col=None, condition_col=None, return_dict=False, **kwargs)[source]¶
Load subject condition assignment from file.
This function can be used to load a file that contains the assignment of subject IDs to study conditions. It will return a dataframe or a dictionary that complies with BioPsyKit’s naming convention, i.e., the subject ID index will be named
subject
and the condition column will be namedcondition
.- Parameters
file_path (
Path
or str) – path to time log file. Must either be an Excel or csv filesubject_col (str, optional) – name of column containing subject IDs or
None
to use default column namesubject
. According to BioPsyKit’s convention, the subject ID column is expected to have the namesubject
. If the subject ID column in the file has another name, the column will be renamed in the dataframe returned by this function.condition_col (str, optional) – name of column containing condition assignments or
None
to use default column namecondition
. According to BioPsyKit’s convention, the condition column is expected to have the namecondition
. If the condition column in the file has another name, the column will be renamed in the dataframe returned by this function.return_dict (bool, optional) – whether to return a dict with subject IDs per condition (
True
) or a dataframe (False
). Default:False
**kwargs – Additional parameters that are passed tos
pandas.read_csv()
orpandas.read_excel()
- Returns
SubjectConditionDict
– a standardized pandas dataframe with subject IDs and condition assignments (ifreturn_dict
isFalse
) or a standardized dict with subject IDs per group (ifreturn_dict
isTrue
)
- Raises
FileExtensionError – if file is not a csv or Excel file
ValidationError – if result is not a
SubjectConditionDataFrame
or aSubjectConditionDict
- Return type
Union[biopsykit.utils.datatype_helper._SubjectConditionDataFrame, pandas.core.frame.DataFrame, Dict[str, numpy.ndarray]]
- biopsykit.io.io.load_questionnaire_data(file_path, subject_col=None, condition_col=None, additional_index_cols=None, replace_missing_vals=True, remove_nan_rows=True, sheet_name=0, **kwargs)[source]¶
Load questionnaire data from file.
The resulting dataframe will comply with BioPsyKit’s naming conventions, i.e., the subject ID index will be named
subject
and a potential condition index will be namedcondition
.- Parameters
file_path (
Path
or str) – path to time log file. Must either be an Excel or csv filesubject_col (str, optional) – name of column containing subject IDs or
None
to use default column namesubject
. According to BioPsyKit’s convention, the subject ID column is expected to have the namesubject
. If the subject ID column in the file has another name, the column will be renamed in the dataframe returned by this function.condition_col (str, optional) – name of column containing condition assignments or
None
to use default column namecondition
. According to BioPsyKit’s convention, the condition column is expected to have the namecondition
. If the condition column in the file has another name, the column will be renamed in the dataframe returned by this function.additional_index_cols (str, list of str, optional) – additional index levels to be added to the dataframe. Can either be a string or a list strings to indicate column name(s) that should be used as index level(s), or
None
for no additional index levels. Default:None
replace_missing_vals (bool, optional) –
True
to replace encoded “missing values” from software like SPSS (e.g. -77, -99, or -66) to “actual” missing values (NaN). Default:True
remove_nan_rows (bool, optional) –
True
to remove rows that only contain NaN values (except the index cols),False
to keep NaN rows. Default:True
sheet_name (str or int, optional) – sheet_name identifier (str) or sheet_name index (int) if file is an Excel file. Default: 0 (i.e. first sheet in Excel file)
- Returns
dataframe with imported questionnaire data
- Return type
- Raises
FileExtensionError – if file format is none of [“.xls”, “.xlsx”, “.csv”]
- biopsykit.io.io.load_pandas_dict_excel(file_path, index_col='time', timezone=None)[source]¶
Load Excel file containing pandas dataframes with time series data of one subject.
- Parameters
file_path (
Path
or str) – path to fileindex_col (str, optional) – name of index columns of dataframe or
None
if no index column is present. Default: “time”timezone (str or
datetime.tzinfo
, optional) – timezone of the acquired data for localization (since Excel does not support localized timestamps), either as string of as tzinfo object. Default: “Europe/Berlin”
- Returns
dictionary with multiple pandas dataframes
- Return type
- Raises
FileExtensionError – if file is no Excel file (“.xls” or “.xlsx”)
See also
write_pandas_dict_excel
Write dictionary with dataframes to file
- biopsykit.io.io.load_codebook(file_path, **kwargs)[source]¶
Load codebook from file.
A codebook is used to convert numerical values from a dataframe (e.g., from questionnaire data) to categorical values.
- Parameters
file_path (
Path
or str) – file path to codebook**kwargs – additional arguments to pass to
pandas.read_csv()
orpandas.read_excel()
- Returns
CodebookDataFrame
, a dataframe in a standardized format- Return type
See also
apply_codebook()
apply codebook to data
- biopsykit.io.io.convert_time_log_datetime(time_log, dataset=None, df=None, date=None, timezone=None)[source]¶
Convert the time log information into datetime objects.
This function converts time log information (containing only time, but no date) into datetime objects, thus, adds the start date of the recording. To specify the recording date, either a NilsPod
Dataset
or a pandas dataframe with aDatetimeIndex
must be supplied from which the recording date can be extracted. As an alternative, the date can be specified explicitly viadate
parameter.- Parameters
time_log (
DataFrame
) – pandas dataframe with time log informationdataset (
Dataset
, optional) – NilsPod Dataset object extract time and date information. Default:None
df (
DataFrame
, optional) – dataframe withDatetimeIndex
to extract time and date information. Default:None
date (str or datetime, optional) – datetime object or date string used to convert time log information into datetime. If
date
is a string, it must be supplied in a common date format, e.g. “dd.mm.yyyy” or “dd/mm/yyyy”. Default:None
timezone (str or
datetime.tzinfo
, optional) – timezone of the acquired data to convert, either as string of as tzinfo object. Default: “Europe/Berlin”
- Returns
pandas dataframe with log time converted into datetime
- Return type
- Raises
ValueError – if none of
dataset
,df
anddate
are supplied as argument, or if index ofdf
is not aDatetimeIndex
- biopsykit.io.io.write_pandas_dict_excel(data_dict, file_path, index_col=True)[source]¶
Write a dictionary with pandas dataframes to an Excel file.
- Parameters
- Raises
FileExtensionError – if
file_path
is not an Excel file
- biopsykit.io.io.write_result_dict(result_dict, file_path, index_name='subject')[source]¶
Write dictionary with processing results (e.g. HR, HRV, RSA) to csv file.
The keys in the dictionary should be the subject IDs (or any other identifier), the values should be
DataFrame
. The index level(s) of the exported dataframe can be specified by theindex_col
parameter.The dictionary will be concatenated to one large dataframe which will then be saved as csv file.
- Parameters
result_dict (dict) – Dictionary containing processing results for all subjects. The keys in the dictionary should be the Subject IDs (or any other identifier), the values should be pandas dataframes
file_path (
Path
, str) – path to fileindex_name (str, optional) – name of the index resulting from concatenting dataframes. Default:
subject
- Raises
FileExtensionError – if
file_path
is not a csv or Excel file
Examples
>>> >>> from biopsykit.io import write_result_dict >>> >>> file_path = "./param_results.csv" >>> >>> dict_param_output = { >>> 'S01' : pd.DataFrame(), # e.g., dataframe from mist_param_subphases, >>> 'S02' : pd.DataFrame(), >>> # ... >>> } >>> >>> write_result_dict(dict_param_output, file_path=file_path, index_name="subject")