biopsykit.io.saliva module

Module wrapping biopsykit.io.biomarker including only I/O functions for saliva data.

biopsykit.io.saliva.load_saliva_plate(file_path, saliva_type, sample_id_col=None, data_col=None, id_col_names=None, regex_str=None, sample_times=None, condition_list=None, **kwargs)[source]

Read saliva from an Excel sheet in ‘plate’ format.

This function automatically extracts identifier like subject, day and sample IDs from the saliva sample names. To extract them, a regular expression string can be passed via regex_str.

Here are some examples on how sample identifiers might look like and what the corresponding regex_str would output:

  • “Vp01 S1” => r"(Vp\d+) (S\d)" (this is the default pattern, you can also just set regex_str to None) => data [Vp01, S1] in two columns: subject, sample (unless column names are explicitly specified in data_col_names)

  • “Vp01 T1 S1” … “Vp01 T1 S5” (only numeric characters in day/sample) => r"(Vp\d+) (T\d) (S\d)" => three columns: subject, sample with data [Vp01, T1, S1] (unless column names are explicitly specified in data_col_names)

  • “Vp01 T1 S1” … “Vp01 T1 SA” (also letter characters in day/sample) => r"(Vp\d+) (T\w) (S\w)" => three columns: subject, sample with data [Vp01, T1, S1] (unless column names are explicitly specified in data_col_names)

If you don’t want to extract the ‘S’ or ‘T’ prefixes in saliva or day IDs, respectively, you have to move it out of the capture group in the regex_str (round brackets), like this: (S\d) (would give S1, S2, …) => S(\d) (would give 1, 2, …)

Parameters
  • file_path (Path or str) – path to the Excel sheet in ‘plate’ format containing saliva data

  • saliva_type (str) – saliva type to load from file

  • sample_id_col (str, optional) – column name of the Excel sheet containing the sample ID. Default: “sample ID”

  • data_col (str, optional) – column name of the Excel sheet containing saliva data to be analyzed. Default: Select default column name based on biomarker_type, e.g. cortisol => cortisol (nmol/l)

  • id_col_names (list of str, optional) – names of the extracted ID column names. None to use the default column names ([‘subject’, ‘day’, ‘sample’])

  • regex_str (str, optional) – regular expression to extract subject ID, day ID and sample ID from the sample identifier. None to use default regex string (r"(Vp\d+) (S\d)")

  • sample_times (list of int, optional) – times at which saliva samples were collected

  • condition_list (1d-array, optional) – list of conditions which subjects were assigned to

  • **kwargs – Additional parameters that are passed to pandas.read_excel()

Returns

data – saliva data in SalivaRawDataFrame format

Return type

SalivaRawDataFrame

Raises
  • FileExtensionError – if file is no Excel file (.xls or .xlsx)

  • ValueError – if any saliva sample can not be converted into a float (e.g. because there was text in one of the columns)

  • ValidationError – if imported data can not be parsed to a SalivaRawDataFrame

biopsykit.io.saliva.save_saliva(file_path, data, saliva_type='cortisol', as_wide_format=False)[source]

Save saliva data to csv file.

Parameters
  • file_path (Path or str) – file path to export. Must be a csv or an Excel file

  • data (SalivaRawDataFrame) – saliva data in SalivaRawDataFrame format

  • saliva_type (str) – type of saliva data in the dataframe

  • as_wide_format (bool, optional) – True to save data in wide format (and flatten all index levels), False to save data in long-format. Default: False

Raises
biopsykit.io.saliva.load_saliva_wide_format(file_path, saliva_type, subject_col=None, condition_col=None, additional_index_cols=None, sample_times=None, **kwargs)[source]

Load saliva data that is in wide-format from csv file.

It will return a SalivaRawDataFrame, a long-format dataframe that complies with BioPsyKit’s naming convention, i.e., the subject ID index will be named subject, the sample index will be names sample, and the value column will be named after the saliva biomarker type.

Parameters
  • file_path (Path or str) – path to file

  • saliva_type (str) – saliva type to load from file. Example: cortisol

  • subject_col (str, optional) – name of column containing subject IDs or None to use the default column name subject. According to BioPsyKit’s convention, the subject ID column is expected to have the name subject. If the subject ID column in the file has another name, the column will be renamed in the dataframe returned by this function. Default: None

  • condition_col (str, optional) – name of the column containing condition assignments or None if no conditions are present. According to BioPsyKit’s convention, the condition column is expected to have the name condition. If the condition column in the file has another name, the column will be renamed in the dataframe returned by this function. Default: None

  • additional_index_cols (str or list of str, optional) – additional index levels to be added to the dataframe, e.g., “day” index. Can either be a string or a list strings to indicate column name(s) that should be used as index level(s), or None for no additional index levels. Default: None

  • sample_times (list of int, optional) – times at which saliva samples were collected or None if no sample times should be specified. Default: None

  • **kwargs – Additional parameters that are passed to pandas.read_csv() or pandas.read_excel()

Returns

data – saliva data in SalivaRawDataFrame format

Return type

SalivaRawDataFrame

Raises

FileExtensionError – if file is no csv or Excel file