biopsykit.io.biomarker module¶
Module containing different I/O functions for biomarker data (saliva, dried blood spots, IL-6).
- biopsykit.io.biomarker.load_saliva_plate(file_path, saliva_type, sample_id_col=None, data_col=None, id_col_names=None, regex_str=None, sample_times=None, condition_list=None, **kwargs)[source]¶
Read saliva from an Excel sheet in ‘plate’ format. Wraps load_biomarker_results() for compatibilty.
This function automatically extracts identifier like subject, day and sample IDs from the saliva sample names. To extract them, a regular expression string can be passed via
regex_str
.Here are some examples on how sample identifiers might look like and what the corresponding
regex_str
would output:“Vp01 S1” =>
r"(Vp\d+) (S\d)"
(this is the default pattern, you can also just setregex_str
toNone
) => data[Vp01, S1]
in two columns:subject
,sample
(unless column names are explicitly specified indata_col_names
)“Vp01 T1 S1” … “Vp01 T1 S5” (only numeric characters in day/sample) =>
r"(Vp\d+) (T\d) (S\d)"
=> three columns:subject
,sample
with data[Vp01, T1, S1]
(unless column names are explicitly specified indata_col_names
)“Vp01 T1 S1” … “Vp01 T1 SA” (also letter characters in day/sample) =>
r"(Vp\d+) (T\w) (S\w)"
=> three columns:subject
,sample
with data[Vp01, T1, S1]
(unless column names are explicitly specified indata_col_names
)
If you don’t want to extract the ‘S’ or ‘T’ prefixes in saliva or day IDs, respectively, you have to move it out of the capture group in the
regex_str
(round brackets), like this:(S\d)
(would giveS1
,S2
, …) =>S(\d)
(would give1
,2
, …)- Parameters
file_path (
Path
or str) – path to the Excel sheet in ‘plate’ format containing saliva datasaliva_type (str) – saliva type to load from file
sample_id_col (str, optional) – column name of the Excel sheet containing the sample ID. Default: “sample ID”
data_col (str, optional) – column name of the Excel sheet containing saliva data to be analyzed. Default: Select default column name based on
biomarker_type
, e.g.cortisol
=>cortisol (nmol/l)
id_col_names (list of str, optional) – names of the extracted ID column names.
None
to use the default column names ([‘subject’, ‘day’, ‘sample’])regex_str (str, optional) – regular expression to extract subject ID, day ID and sample ID from the sample identifier.
None
to use default regex string (r"(Vp\d+) (S\d)"
)sample_times (list of int, optional) – times at which saliva samples were collected
condition_list (1d-array, optional) – list of conditions which subjects were assigned to
**kwargs – Additional parameters that are passed to
pandas.read_excel()
- Returns
data – saliva data in SalivaRawDataFrame format
- Return type
- Raises
FileExtensionError – if file is no Excel file (.xls or .xlsx)
ValueError – if any saliva sample can not be converted into a float (e.g. because there was text in one of the columns)
ValidationError – if imported data can not be parsed to a SalivaRawDataFrame
- biopsykit.io.biomarker.save_saliva(file_path, data, saliva_type='cortisol', as_wide_format=False)[source]¶
Save saliva data to csv file.
- Parameters
file_path (
Path
or str) – file path to export. Must be a csv or an Excel filedata (
SalivaRawDataFrame
) – saliva data in SalivaRawDataFrame formatsaliva_type (str) – type of saliva data in the dataframe
as_wide_format (bool, optional) –
True
to save data in wide format (and flatten all index levels),False
to save data in long-format. Default:False
- Raises
ValidationError – if
data
is not a SalivaRawDataFrameFileExtensionError – if
file_path
is not a csv or Excel file
- biopsykit.io.biomarker.load_saliva_wide_format(file_path, saliva_type, subject_col=None, condition_col=None, additional_index_cols=None, sample_times=None, **kwargs)[source]¶
Load saliva data that is in wide-format from csv file.
It will return a SalivaRawDataFrame, a long-format dataframe that complies with BioPsyKit’s naming convention, i.e., the subject ID index will be named
subject
, the sample index will be namessample
, and the value column will be named after the saliva biomarker type.- Parameters
file_path (
Path
or str) – path to filesaliva_type (str) – saliva type to load from file. Example:
cortisol
subject_col (str, optional) – name of column containing subject IDs or
None
to use the default column namesubject
. According to BioPsyKit’s convention, the subject ID column is expected to have the namesubject
. If the subject ID column in the file has another name, the column will be renamed in the dataframe returned by this function. Default:None
condition_col (str, optional) – name of the column containing condition assignments or
None
if no conditions are present. According to BioPsyKit’s convention, the condition column is expected to have the namecondition
. If the condition column in the file has another name, the column will be renamed in the dataframe returned by this function. Default:None
additional_index_cols (str or list of str, optional) – additional index levels to be added to the dataframe, e.g., “day” index. Can either be a string or a list strings to indicate column name(s) that should be used as index level(s), or
None
for no additional index levels. Default:None
sample_times (list of int, optional) – times at which saliva samples were collected or
None
if no sample times should be specified. Default:None
**kwargs – Additional parameters that are passed to
pandas.read_csv()
orpandas.read_excel()
- Returns
data – saliva data in SalivaRawDataFrame format
- Return type
- Raises
FileExtensionError – if file is no csv or Excel file
- biopsykit.io.biomarker.load_biomarker_results(file_path, biomarker_type=None, sample_id_col=None, data_col=None, id_col_names=None, regex_str=None, sample_times=None, condition_list=None, **kwargs)[source]¶
Load biomarker results from Excel file.
- Parameters
file_path (
Path
or str) – path to filedbs_type (str, optional) – biomarker type to load from file. Example:
crp
sample_id_col (str, optional) – name of column containing sample IDs or
None
to use the default column namesample ID
.data_col (str, optional) – name of column containing biomarker data or
None
to use the default column name.id_col_names (list of str, optional) – names of the extracted ID column names.
None
to use the default column names ([‘subject’, ‘day’, ‘sample’])regex_str (str, optional) – regular expression to extract subject, day, and sample ID from sample ID column.
None
to use the default regular expressionr"(VP\\d+)-(T\\w)-(B\\w)"
.sample_times (list of int, optional) – times at which samples were collected or
None
if no sample times should be specified. Default:None
condition_list (list of str, dict of str to list of str, or
Index
, optional) – list of condition names or dictionary of condition names to list of condition assignments orIndex
of condition names orNone
if no conditions are present. Default:None
**kwargs – Additional parameters that are passed to
pandas.read_csv()
orpandas.read_excel()
- Returns
data – biomarker data in BiomarkerRawDataFrame format
- Return type
- Raises
FileExtensionError – if file is no Excel file