biopsykit.io package¶
Module providing input/output functions.
- biopsykit.io.convert_time_log_datetime(time_log, dataset=None, df=None, date=None, timezone=None)[source]¶
Convert the time log information into datetime objects.
This function converts time log information (containing only time, but no date) into datetime objects, thus, adds the start date of the recording. To specify the recording date, either a NilsPod
Datasetor a pandas dataframe with aDatetimeIndexmust be supplied from which the recording date can be extracted. As an alternative, the date can be specified explicitly viadateparameter.- Parameters
time_log (
DataFrame) – pandas dataframe with time log informationdataset (
Dataset, optional) – NilsPod Dataset object extract time and date information. Default:Nonedf (
DataFrame, optional) – dataframe withDatetimeIndexto extract time and date information. Default:Nonedate (str or datetime, optional) – datetime object or date string used to convert time log information into datetime. If
dateis a string, it must be supplied in a common date format, e.g. “dd.mm.yyyy” or “dd/mm/yyyy”. Default:Nonetimezone (str or
datetime.tzinfo, optional) – timezone of the acquired data to convert, either as string of as tzinfo object. Default: “Europe/Berlin”
- Returns
time log dataframe with datetime objects
- Return type
- Raises
ValueError – if none of
dataset,dfanddateare supplied as argument, or if index ofdfis not aDatetimeIndex
- biopsykit.io.convert_time_log_dict(timelog, time_format='time')[source]¶
Convert time log into dictionary.
The resulting dictionary will have the phase names as keys and a tuple with start and end times as values.
- Parameters
- Returns
dictionary with start and end times of each phase
- Return type
See also
biopsykit.utils.data_processing.split_data()split data based on time intervals
- biopsykit.io.load_atimelogger_file(file_path, timezone=None, handle_multiple='raise')[source]¶
Load time log file exported from the aTimeLogger app.
The resulting dataframe will have one row and start and end times of the single phases as columns.
- Parameters
file_path (
Pathor str) – path to time log file. Must a csv filetimezone (str or
datetime.tzinfo, optional) – timezone of the time logs, either as string or as tzinfo object. Default: ‘Europe/Berlin’handle_multiple (str) –
- Returns
time log dataframe
- Return type
- biopsykit.io.load_codebook(file_path, **kwargs)[source]¶
Load codebook from file.
A codebook is used to convert numerical values from a dataframe (e.g., from questionnaire data) to categorical values.
- Parameters
file_path (
Pathor str) – file path to codebook**kwargs – additional arguments to pass to
pandas.read_csv()orpandas.read_excel()
- Returns
CodebookDataFrame, a dataframe in a standardized format- Return type
See also
apply_codebook()apply codebook to data
- biopsykit.io.load_long_format_csv(file_path, index_cols=None)[source]¶
Load dataframe stored as long-format from file.
- Parameters
- Returns
dataframe in long-format
- Return type
- biopsykit.io.load_pandas_dict_excel(file_path, index_col='time', timezone=None)[source]¶
Load Excel file containing pandas dataframes with time series data of one subject.
- Parameters
file_path (
Pathor str) – path to fileindex_col (str, optional) – name of index columns of dataframe or
Noneif no index column is present. Default: “time”timezone (str or
datetime.tzinfo, optional) – timezone of the acquired data for localization (since Excel does not support localized timestamps), either as string of as tzinfo object. Default: “Europe/Berlin”
- Returns
dictionary with multiple pandas dataframes
- Return type
- Raises
FileExtensionError – if file is no Excel file (“.xls” or “.xlsx”)
See also
write_pandas_dict_excelWrite dictionary with dataframes to file
- biopsykit.io.load_questionnaire_data(file_path, subject_col=None, condition_col=None, additional_index_cols=None, replace_missing_vals=True, remove_nan_rows=True, sheet_name=0, **kwargs)[source]¶
Load questionnaire data from file.
The resulting dataframe will comply with BioPsyKit’s naming conventions, i.e., the subject ID index will be named
subjectand a potential condition index will be namedcondition.- Parameters
file_path (
Pathor str) – path to time log file. Must either be an Excel or csv filesubject_col (str, optional) – name of column containing subject IDs or
Noneto use default column namesubject. According to BioPsyKit’s convention, the subject ID column is expected to have the namesubject. If the subject ID column in the file has another name, the column will be renamed in the dataframe returned by this function.condition_col (str, optional) – name of column containing condition assignments or
Noneto use default column namecondition. According to BioPsyKit’s convention, the condition column is expected to have the namecondition. If the condition column in the file has another name, the column will be renamed in the dataframe returned by this function.additional_index_cols (str, list of str, optional) – additional index levels to be added to the dataframe. Can either be a string or a list strings to indicate column name(s) that should be used as index level(s), or
Nonefor no additional index levels. Default:Nonereplace_missing_vals (bool, optional) –
Trueto replace encoded “missing values” from software like SPSS (e.g. -77, -99, or -66) to “actual” missing values (NaN). Default:Trueremove_nan_rows (bool, optional) –
Trueto remove rows that only contain NaN values (except the index cols),Falseto keep NaN rows. Default:Truesheet_name (str or int, optional) – sheet_name identifier (str) or sheet_name index (int) if file is an Excel file. Default: 0 (i.e. first sheet in Excel file)
- Returns
dataframe with imported questionnaire data
- Return type
- Raises
FileExtensionError – if file format is none of [“.xls”, “.xlsx”, “.csv”]
- biopsykit.io.load_subject_condition_list(file_path, subject_col=None, condition_col=None, return_dict=False, **kwargs)[source]¶
Load subject condition assignment from file.
This function can be used to load a file that contains the assignment of subject IDs to study conditions. It will return a dataframe or a dictionary that complies with BioPsyKit’s naming convention, i.e., the subject ID index will be named
subjectand the condition column will be namedcondition.- Parameters
file_path (
Pathor str) – path to time log file. Must either be an Excel or csv filesubject_col (str, optional) – name of column containing subject IDs or
Noneto use default column namesubject. According to BioPsyKit’s convention, the subject ID column is expected to have the namesubject. If the subject ID column in the file has another name, the column will be renamed in the dataframe returned by this function.condition_col (str, optional) – name of column containing condition assignments or
Noneto use default column namecondition. According to BioPsyKit’s convention, the condition column is expected to have the namecondition. If the condition column in the file has another name, the column will be renamed in the dataframe returned by this function.return_dict (bool, optional) – whether to return a dict with subject IDs per condition (
True) or a dataframe (False). Default:False**kwargs – Additional parameters that are passed tos
pandas.read_csv()orpandas.read_excel()
- Returns
SubjectConditionDict– a standardized pandas dataframe with subject IDs and condition assignments (ifreturn_dictisFalse) or a standardized dict with subject IDs per group (ifreturn_dictisTrue)
- Raises
FileExtensionError – if file is not a csv or Excel file
ValidationError – if result is not a
SubjectConditionDataFrameor aSubjectConditionDict
- Return type
biopsykit.utils.dtypes._SubjectConditionDataFrame | pandas.core.frame.DataFrame | dict[str, numpy.ndarray]
- biopsykit.io.load_time_log(file_path, subject_col=None, condition_col=None, additional_index_cols=None, phase_cols=None, continuous_time=True, **kwargs)[source]¶
Load time log information from file.
This function can be used to load a file containing “time logs”, i.e., information about start and stop times of recordings or recording phases per subject.
- Parameters
file_path (
Pathor str) – path to time log file. Must either be an Excel or csv filesubject_col (str, optional) – name of column containing subject IDs or
Noneto use default column namesubject. According to BioPsyKit’s convention, the subject ID column is expected to have the namesubject. If the subject ID column in the file has another name, the column will be renamed in the dataframe returned by this function.condition_col (str, optional) – name of column containing condition assignments or
Noneto use default column namecondition. According to BioPsyKit’s convention, the condition column is expected to have the namecondition. If the condition column in the file has another name, the column will be renamed in the dataframe returned by this function.additional_index_cols (str, list of str, optional) – additional index levels to be added to the dataframe. Can either be a string or a list strings to indicate column name(s) that should be used as index level(s), or
Nonefor no additional index levels. Default:Nonephase_cols (list of str or dict, optional) – list of column names that contain time log information or
Noneto use all columns. If the column names of the time log dataframe should have different names than the columns in the file, a dict specifying the mapping (column_name : new_column_name) can be passed. Default:Nonecontinuous_time (bool, optional) – flag indicating whether phases are continuous, i.e., whether the end of the previous phase is also the beginning of the next phase or not. Default:
True. Ifcontinuous_timeis set toFalse, the start and end columns of all phases must have the suffixes “_start” and “_end”, respectively**kwargs – Additional parameters that are passed to
pandas.read_csv()orpandas.read_excel()
- Returns
dataframe with time log information
- Return type
- Raises
FileExtensionError – if file format is none of [“.xls”, “.xlsx”, “.csv”]
ValidationError – if
continuous_timeisFalse, but “start” and “end” time columns of each phase do not match or none of these columns were found in the dataframe
Examples
>>> import biopsykit as bp >>> file_path = "./timelog.csv" >>> # Example 1: >>> # load time log file into a pandas dataframe >>> data = bp.io.load_time_log(file_path) >>> # Example 2: >>> # load time log file into a pandas dataframe and specify the "ID" column >>> # (instead of the default "subject" column) in the time log file to be the index of the dataframe >>> data = bp.io.load_time_log(file_path, subject_col="ID") >>> # Example 3: >>> # load time log file into a pandas dataframe and specify the columns "Phase1", "Phase2", and "Phase3" >>> # to be used for extracting time information >>> data = bp.io.load_time_log( >>> file_path, phase_cols=["Phase1", "Phase2", "Phase3"] >>> ) >>> # Example 4: >>> # load time log file into a pandas dataframe and specify the column "ID" as subject column, the column "Group" >>> # as condition column, as well as the column "Time" as additional index column. >>> data = bp.io.load_time_log(file_path, >>> subject_col="ID", >>> condition_col="Group", >>> additional_index_cols=["Time"], >>> phase_cols=["Phase1", "Phase2", "Phase3"] >>> )
- biopsykit.io.write_pandas_dict_excel(data_dict, file_path, index_col=True)[source]¶
Write a dictionary with pandas dataframes to an Excel file.
- Parameters
- Raises
FileExtensionError – if
file_pathis not an Excel file
- biopsykit.io.write_result_dict(result_dict, file_path, index_name='subject')[source]¶
Write dictionary with processing results (e.g. HR, HRV, RSA) to csv file.
The keys in the dictionary should be the subject IDs (or any other identifier), the values should be
DataFrame. The index level(s) of the exported dataframe can be specified by theindex_colparameter.The dictionary will be concatenated to one large dataframe which will then be saved as csv file.
- Parameters
result_dict (dict) – Dictionary containing processing results for all subjects. The keys in the dictionary should be the Subject IDs (or any other identifier), the values should be pandas dataframes
file_path (
Path, str) – path to fileindex_name (str, optional) – name of the index resulting from concatenting dataframes. Default:
subject
- Raises
FileExtensionError – if
file_pathis not a csv or Excel file
Examples
>>> >>> from biopsykit.io import write_result_dict >>> >>> file_path = "./param_results.csv" >>> >>> dict_param_output = { >>> 'S01' : pd.DataFrame(), # e.g., dataframe from mist_param_subphases, >>> 'S02' : pd.DataFrame(), >>> # ... >>> } >>> >>> write_result_dict(dict_param_output, file_path=file_path, index_name="subject")
Submodules¶
- biopsykit.io.biomarker module
- biopsykit.io.biopac module
- biopsykit.io.carwatch_logs module
- biopsykit.io.ecg module
- biopsykit.io.eeg module
- biopsykit.io.fibion module
- biopsykit.io.io module
- biopsykit.io.nilspod module
- biopsykit.io.psg module
- biopsykit.io.saliva module
- biopsykit.io.sleep module
- biopsykit.io.sleep_analyzer module
- biopsykit.io.tfm module