biopsykit.io.biomarker module¶

Module containing different I/O functions for biomarker data (saliva, dried blood spots, IL-6).

biopsykit.io.biomarker.load_biomarker_results(file_path, biomarker_type=None, sample_id_col=None, data_col=None, id_col_names=None, regex_str=None, sample_times=None, condition_list=None, check_number_samples=True, replace_strings_missing=True, **kwargs)[source]¶

Load biomarker results from Excel file.

Parameters

file_path (Path or str) – path to file
biomarker_type (str, optional) – biomarker type to load from file. Example: cortisol
sample_id_col (str, optional) – name of column containing sample IDs or None to use the default column name sample ID.
data_col (str, optional) – name of column containing biomarker data or None to use the default column name.
id_col_names (list of str, optional) – names of the extracted ID column names. None to use the default column names ([‘subject’, ‘day’, ‘sample’])
regex_str (str, optional) – regular expression to extract subject, day, and sample ID from sample ID column. None to use the default regular expression r"(VP\\d+)-(T\\w)-(B\\w)".
sample_times (list of int, optional) – times at which samples were collected or None if no sample times should be specified. Default: None
condition_list (list of str, dict of str to list of str, or Index, optional) – list of condition names or dictionary of condition names to list of condition assignments or Index of condition names or None if no conditions are present. Default: None
skiprows (int, optional, default: 2, passed to pandas.read_excel()) –
check_num_samples (bool, optional, default: True) – True to check that the number of samples is the same for all subjects, False to skip this check
check_number_samples (bool, optional) – True to check that the number of samples is equal for all subjects, False to skip this check. Default: True
replace_strings_missing (bool, optional) – True to replace strings indicating missing in the biomarker data with NaN values, False to keep the strings. Default: True
**kwargs – Additional parameters that are passed to pandas.read_excel()

Returns

data – biomarker data in BiomarkerRawDataFrame format

Return type

BiomarkerRawDataFrame

Raises

FileExtensionError – if file is no Excel file

biopsykit.io.biomarker.load_saliva_plate(file_path, saliva_type, sample_id_col=None, data_col=None, id_col_names=None, regex_str=None, sample_times=None, condition_list=None, **kwargs)[source]¶

Read saliva from an Excel sheet in ‘plate’ format. Wraps load_biomarker_results() for compatibilty.

This function automatically extracts identifier like subject, day and sample IDs from the saliva sample names. To extract them, a regular expression string can be passed via regex_str.

Here are some examples on how sample identifiers might look like and what the corresponding regex_str would output:

“Vp01 S1” => r"(Vp\d+) (S\d)" (this is the default pattern, you can also just set regex_str to None) => data [Vp01, S1] in two columns: subject, sample (unless column names are explicitly specified in data_col_names)

“Vp01 T1 S1” … “Vp01 T1 S5” (only numeric characters in day/sample) => r"(Vp\d+) (T\d) (S\d)" => three columns: subject, sample with data [Vp01, T1, S1] (unless column names are explicitly specified in data_col_names)

“Vp01 T1 S1” … “Vp01 T1 SA” (also letter characters in day/sample) => r"(Vp\d+) (T\w) (S\w)" => three columns: subject, sample with data [Vp01, T1, S1] (unless column names are explicitly specified in data_col_names)

If you don’t want to extract the ‘S’ or ‘T’ prefixes in saliva or day IDs, respectively, you have to move it out of the capture group in the regex_str (round brackets), like this: (S\d) (would give S1, S2, …) => S(\d) (would give 1, 2, …)

Parameters

file_path (Path or str) – path to the Excel sheet in ‘plate’ format containing saliva data
saliva_type (str) – saliva type to load from file
sample_id_col (str, optional) – column name of the Excel sheet containing the sample ID. Default: “sample ID”
data_col (str, optional) – column name of the Excel sheet containing saliva data to be analyzed. Default: Select default column name based on biomarker_type, e.g. cortisol => cortisol (nmol/l)
id_col_names (list of str, optional) – names of the extracted ID column names. None to use the default column names ([‘subject’, ‘day’, ‘sample’])
regex_str (str, optional) – regular expression to extract subject ID, day ID and sample ID from the sample identifier. None to use default regex string (r"(Vp\d+) (S\d)")
sample_times (list of int, optional) – times at which saliva samples were collected
condition_list (1d-array, optional) – list of conditions which subjects were assigned to
**kwargs – Additional parameters that are passed to pandas.read_excel()

Returns

data – saliva data in SalivaRawDataFrame format

Return type

SalivaRawDataFrame

Raises

FileExtensionError – if file is no Excel file (.xls or .xlsx)
ValueError – if any saliva sample can not be converted into a float (e.g. because there was text in one of the columns)
ValidationError – if imported data can not be parsed to a SalivaRawDataFrame

biopsykit.io.biomarker.load_saliva_wide_format(file_path, saliva_type, subject_col=None, condition_col=None, additional_index_cols=None, sample_times=None, **kwargs)[source]¶

Load saliva data that is in wide-format from csv file.

It will return a SalivaRawDataFrame, a long-format dataframe that complies with BioPsyKit’s naming convention, i.e., the subject ID index will be named subject, the sample index will be names sample, and the value column will be named after the saliva biomarker type.

Parameters

file_path (Path or str) – path to file
saliva_type (str) – saliva type to load from file. Example: cortisol
subject_col (str, optional) – name of column containing subject IDs or None to use the default column name subject. According to BioPsyKit’s convention, the subject ID column is expected to have the name subject. If the subject ID column in the file has another name, the column will be renamed in the dataframe returned by this function. Default: None
condition_col (str, optional) – name of the column containing condition assignments or None if no conditions are present. According to BioPsyKit’s convention, the condition column is expected to have the name condition. If the condition column in the file has another name, the column will be renamed in the dataframe returned by this function. Default: None
additional_index_cols (str or list of str, optional) – additional index levels to be added to the dataframe, e.g., “day” index. Can either be a string or a list strings to indicate column name(s) that should be used as index level(s), or None for no additional index levels. Default: None
sample_times (list of int, optional) – times at which saliva samples were collected or None if no sample times should be specified. Default: None
**kwargs – Additional parameters that are passed to pandas.read_csv() or pandas.read_excel()

Returns

data – saliva data in SalivaRawDataFrame format

Return type

SalivaRawDataFrame

Raises

FileExtensionError – if file is no csv or Excel file

biopsykit.io.biomarker.save_saliva(file_path, data, saliva_type='cortisol', as_wide_format=False)[source]¶

Save saliva data to csv file.

Parameters

file_path (Path or str) – file path to export. Must be a csv or an Excel file
data (SalivaRawDataFrame) – saliva data in SalivaRawDataFrame format
saliva_type (str) – type of saliva data in the dataframe
as_wide_format (bool, optional) – True to save data in wide format (and flatten all index levels), False to save data in long-format. Default: False

Raises

ValidationError – if data is not a SalivaRawDataFrame
FileExtensionError – if file_path is not a csv or Excel file

biopsykit.io package biopsykit.io.biopac module